1. Load data and preliminary checks


In [1]:
# load libraries
import os, sys
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

%matplotlib inline

Now import the data in various format for analysis. First check the folder for data. Commands starting with ! indicates they are system commmands.


In [2]:
!dir


 Volume in drive E is New Volume
 Volume Serial Number is A2BF-3D1C

 Directory of E:\Yichuan\Climate_vulnerability_wh\jupyter_workspace

11/07/2016  16:17    <DIR>          .
11/07/2016  16:17    <DIR>          ..
26/05/2016  10:00               854 .gitignore
26/05/2016  10:04    <DIR>          .ipynb_checkpoints
16/06/2016  15:06           286,727 agg_amp.csv
16/06/2016  15:06           456,560 agg_bird.csv
16/06/2016  15:06            68,377 agg_coral.csv
10/02/2016  11:06         8,536,618 amp.xlsx
10/02/2016  11:06        10,999,911 bird.xlsx
10/02/2016  11:06         4,465,794 coral.xlsx
26/05/2016  09:57            35,815 LICENSE
26/05/2016  10:03               290 README.md
25/05/2016  12:33            14,251 region.csv
11/07/2016  16:17             4,613 report.ipynb
10/05/2016  16:44        54,522,213 result_final.csv
03/03/2016  13:30        40,785,576 rl_attr.csv
15/12/2015  17:54       154,368,653 sis.csv
16/06/2016  15:06         2,036,150 wh_amp.csv
16/06/2016  15:06        22,931,649 wh_bird.csv
16/06/2016  15:06         3,343,584 wh_coral.csv
16/06/2016  17:51         2,431,705 workspace.ipynb
              18 File(s)    305,289,340 bytes
               3 Dir(s)  788,429,307,904 bytes free

In [3]:
# CCV analysis results
amp = pd.read_excel('amp.xlsx')
bird = pd.read_excel('bird.xlsx')
coral = pd.read_excel('coral.xlsx')

In [4]:
# check data for different structure
# the intersection result processed in arcmap, essentially: wdpaid, id_no, per_overlap, + sp attr and wh attr
# Essentially, POS invalid rows removed, duplicated dissolved and percentage overlap calculated
# for more information, consult Evernote notes
result = pd.read_csv('result_final.csv', encoding='latin1') 
sis = pd.read_csv('sis.csv', sep='|')
rl = pd.read_csv('rl_attr.csv')

In [5]:
# no duplicates for bird and amp
(amp.Fullname.size == amp.index.size) and (bird.Fullname.size == bird.index.size)


Out[5]:
True

In [6]:
# coral data has duplicates
print('unique coral names:', coral.Fullname.unique().size)
print('total rows:', coral.index.size)


unique coral names: 797
total rows: 1594

In [7]:
# get only the half
coral_unique = coral.groupby('Fullname').first().reset_index()
coral_unique.index.size


Out[7]:
797

In [8]:
result.per.hist(bins=50)


Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x111137b8>

Note: areakm2_x is the actual overlap between pairs of PA and species, while areakm2_y refers to the size of WH site.


In [9]:
result.head(5)


Out[9]:
Unnamed: 0 wdpaid id_no areakm2_x areakm2_y per en_name fr_name status_yr rep_area ... phylum_name class_name order_name family_name genus_name species_name category biome_marine biome_freshwater biome_terrestrial
0 0 191 2057.0 106497.703640 146962.025607 0.724661 Galápagos Islands Îles Galápagos 1978 140665.14 ... CHORDATA MAMMALIA CARNIVORA OTARIIDAE Arctocephalus galapagoensis EN t f t
1 1 191 2474.0 138918.354698 146962.025607 0.945267 Galápagos Islands Îles Galápagos 1978 140665.14 ... CHORDATA MAMMALIA CETARTIODACTYLA BALAENOPTERIDAE Balaenoptera acutorostrata LC t f f
2 2 2012 2474.0 2129.930772 5853.472599 0.363875 Everglades National Park Parc national des Everglades\r\r\r\n 1979 5929.20 ... CHORDATA MAMMALIA CETARTIODACTYLA BALAENOPTERIDAE Balaenoptera acutorostrata LC t f f
3 3 2018 2474.0 2549.486762 97284.250341 0.026207 Kluane / Wrangell-St Elias / Glacier Bay / Tat... Kluane / Wrangell-St Elias / Glacier Bay / Tat... 1979 98391.21 ... CHORDATA MAMMALIA CETARTIODACTYLA BALAENOPTERIDAE Balaenoptera acutorostrata LC t f f
4 4 2554 2474.0 4.105942 5502.386948 0.000746 Darien National Park Parc national du Darien 1981 5970.00 ... CHORDATA MAMMALIA CETARTIODACTYLA BALAENOPTERIDAE Balaenoptera acutorostrata LC t f f

5 rows × 26 columns


In [10]:
result.dtypes


Out[10]:
Unnamed: 0             int64
wdpaid                 int64
id_no                float64
areakm2_x            float64
areakm2_y            float64
per                  float64
en_name               object
fr_name               object
status_yr              int64
rep_area             float64
gis_area             float64
country               object
crit                  object
areakm2              float64
binomial              object
kingdom_name          object
phylum_name           object
class_name            object
order_name            object
family_name           object
genus_name            object
species_name          object
category              object
biome_marine          object
biome_freshwater      object
biome_terrestrial     object
dtype: object

2. Data preprocessing and exploration

The climate vulnerability analysis was done back using an older version of Red List 2009 (methodology) and the current species range distribution data is the latest 2015-4 version. Therefore it is likely the taxanomies may change. In some cases, changes may be simple for example renaming genus name or other higher taxanomy, while others may be extremely convoluted, involving spliting of one previously recognised species into multiple ones or vice versa merging previously different species. It is very difficult to reconcile such differences.

Here, I explore such mismatches between birds, amphibians and warm water reef-building corals.


In [11]:
print('Total birds in SIS,', (sis['class'] == 'AVES').sum())
# birds SIS
sis_bird = sis[sis['class']=='AVES']


Total birds in SIS, 10424

In [12]:
print('unique birds in CCV analysis:', bird.Fullname.unique().size)
print('unique birds in SIS:', sis_bird.friendly_name.unique().size)


unique birds in CCV analysis: 9856
unique birds in SIS: 10424

Differences between bird numbers, and similarly other taxa groups


In [13]:
sis_amp = sis[sis['class']=='AMPHIBIA']

print('Birds in CCV but not in SIS:', np.setdiff1d(bird.Fullname.unique(), sis_bird.friendly_name.unique()).size)
print('Total birds in CCV:', bird.index.size)
print('Amphibians in CCV but not in SIS:', np.setdiff1d(amp.Fullname.unique(), sis_amp.friendly_name.unique()).size)
print('Total amphibians in CCV:', amp.index.size)
print('Corals in CCV but not in SIS:', np.setdiff1d(coral_unique.Fullname.unique(), sis.friendly_name.unique()).size)
print('Total corals in CCV:', coral_unique.index.size)


Birds in CCV but not in SIS: 673
Total birds in CCV: 9856
Amphibians in CCV but not in SIS: 502
Total amphibians in CCV: 6204
Corals in CCV but not in SIS: 19
Total corals in CCV: 797

Given the relatively low number and proportion of such missing species due to taxonomic changes, it is unlikely the result may change significantly. For the purpose of deriving a globally consistent picture, we simply exclude these species.

Note the datatype in the result table is somehow different (int vs float) and it will impact comparisons. This needs to be explicited addressed, by rounding.


In [14]:
result.ix[176646].id_no


Out[14]:
61623293.999999993

result.id_no.astype('int64') will not work as it simply gets rid of its fraction. Instead round(0) method should be used


In [15]:
result['id_no_int'] = result.id_no.round(0)

The effect can be easily illustrated. The difference has reduced from 1581 to 36.


In [16]:
(result.id_no.isin(sis.taxonid)).sum()


Out[16]:
164315

In [17]:
dif_spatial = np.setdiff1d(result.id_no, sis.taxonid)
dif_spatial.size


Out[17]:
1581

In [18]:
dif_spatial = np.setdiff1d(result.id_no_int, sis.taxonid)
dif_spatial.size


Out[18]:
36

Let's take a look at the differece, i.e., IDS in the intersection result but not in the SIS.


In [19]:
dif_sis = result[result.id_no.isin(dif_spatial)][['id_no', 'class_name', 'binomial']]

In [20]:
dif_sis['class_name'].unique()


Out[20]:
array(['ACTINOPTERYGII', 'MAGNOLIOPSIDA', 'LILIOPSIDA', 'GASTROPODA',
       'MAMMALIA', 'SARCOPTERYGII', 'INSECTA', 'BIVALVIA'], dtype=object)

Now it is clear that the odd missing species should not impact our analysis, which only looks at birds, amphibians and corals. In other words, the IDs in the result table having corresponding SIS entries. Given that the result table contains already all SIS information, there is no need to join the SIS table. What needs to be address now is to compare the result table directly with the climate vulnerability analysis.


In [21]:
dif_amp_result = np.setdiff1d(amp.Fullname, result.binomial)
dif_amp_sis = np.setdiff1d(amp.Fullname, sis.friendly_name)
dif_amp_result.size, dif_amp_sis.size


Out[21]:
(4174, 502)

In [22]:
# get amp_result using CCV, SIS and partialoverlap_result
amp_result = pd.merge(amp, sis_amp, left_on = 'Fullname', right_on='friendly_name')
amp_result = pd.merge(amp_result, result, left_on='taxonid', right_on='id_no_int')

In [23]:
amp_result.index


Out[23]:
Int64Index([   0,    1,    2,    3,    4,    5,    6,    7,    8,    9,
            ...
            4746, 4747, 4748, 4749, 4750, 4751, 4752, 4753, 4754, 4755],
           dtype='int64', length=4756)

In [24]:
amp.index


Out[24]:
RangeIndex(start=0, stop=6204, step=1)

In [25]:
# what happens if a join is made directly without going through SIS and ID_NO
test_amp = pd.merge(amp, result, left_on='Fullname', right_on='binomial')

In [26]:
test_amp.index


Out[26]:
Int64Index([   0,    1,    2,    3,    4,    5,    6,    7,    8,    9,
            ...
            4762, 4763, 4764, 4765, 4766, 4767, 4768, 4769, 4770, 4771],
           dtype='int64', length=4772)

In [27]:
# the difference
test_dif = np.setdiff1d(test_amp.id_no_int.unique(), amp_result.id_no_int.unique())
test_dif.size


Out[27]:
11

There might be something peculiar happening - as in theory the number should match. The fact there are 11 mismatching names suggest there is a possibility that the name in SIS still mismatches. See below the missing species using the CCV-SIS join.


In [28]:
test_amp[test_amp.id_no_int.isin(test_dif)]


Out[28]:
SVP_ID SIS_GAA_ID GAA Family Genus Species Fullname Threatened SUSC_A_Habitats SUSC_A_aquatic larvae SUSC_B_Temperature Range ... class_name order_name family_name genus_name species_name category biome_marine biome_freshwater biome_terrestrial id_no_int
398 3260 9380 MICROHYLIDAE Anodonthyla boulengerii Anodonthyla boulengerii NaN L L L ... AMPHIBIA ANURA MICROHYLIDAE Anodonthyla boulengeri LC f f t 57674.0
797 2771 32142 MICROHYLIDAE Chiasmocleis panamensis Chiasmocleis panamensis NaN L H H ... AMPHIBIA ANURA MICROHYLIDAE Elachistocleis panamensis LC f t t 57761.0
798 2771 32142 MICROHYLIDAE Chiasmocleis panamensis Chiasmocleis panamensis NaN L H H ... AMPHIBIA ANURA MICROHYLIDAE Elachistocleis panamensis LC f t t 57761.0
978 4258 51160 SALAMANDRIDAE Cynops cyanurus Cynops cyanurus NaN L L L ... AMPHIBIA CAUDATA SALAMANDRIDAE Hypselotriton cyanurus LC f t t 59440.0
979 4258 51160 SALAMANDRIDAE Cynops cyanurus Cynops cyanurus NaN L L L ... AMPHIBIA CAUDATA SALAMANDRIDAE Hypselotriton cyanurus LC f t t 59440.0
980 3529 51161 SALAMANDRIDAE Cynops orientalis Cynops orientalis NaN L L L ... AMPHIBIA CAUDATA SALAMANDRIDAE Hypselotriton orientalis LC f t t 59442.0
981 3529 51161 SALAMANDRIDAE Cynops orientalis Cynops orientalis NaN L L L ... AMPHIBIA CAUDATA SALAMANDRIDAE Hypselotriton orientalis LC f t t 59442.0
982 3529 51161 SALAMANDRIDAE Cynops orientalis Cynops orientalis NaN L L L ... AMPHIBIA CAUDATA SALAMANDRIDAE Hypselotriton orientalis LC f t t 59442.0
983 3529 51161 SALAMANDRIDAE Cynops orientalis Cynops orientalis NaN L L L ... AMPHIBIA CAUDATA SALAMANDRIDAE Hypselotriton orientalis LC f t t 59442.0
1306 2789 20162 MICROHYLIDAE Gastrophryne pictiventris Gastrophryne pictiventris NaN L L H ... AMPHIBIA ANURA MICROHYLIDAE Hypopachus pictiventris LC f t t 57816.0
1307 2790 19796 MICROHYLIDAE Gastrophryne usta Gastrophryne usta NaN L L L ... AMPHIBIA ANURA MICROHYLIDAE Hypopachus ustus LC f t t 57817.0
1393 3897 100217 HELEOPHRYNIDAE Heleophryne natalensis Heleophryne natalensis NaN L L L ... AMPHIBIA ANURA HELEOPHRYNIDAE Hadromophryne natalensis LC f t t 55273.0
1751 2792 29294 MICROHYLIDAE Hyophryne histrio Hyophryne histrio NaN L L H ... AMPHIBIA ANURA MICROHYLIDAE Stereocyclops histrio DD f t t 10634.0
2093 2137 29203 BRACHYCEPHALIDAE Ischnocnema paulodutrai Ischnocnema paulodutrai NaN H L L ... AMPHIBIA ANURA CRAUGASTORIDAE Pristimantis paulodutrai LC f f t 56835.0
4522 5278 9562 MICROHYLIDAE Stumpffia grandis Stumpffia grandis NaN L L L ... AMPHIBIA ANURA MICROHYLIDAE Rhombophryne grandis DD f f t 58009.0
4523 5282 9566 MICROHYLIDAE Stumpffia tridactyla Stumpffia tridactyla NaN L L L ... AMPHIBIA ANURA MICROHYLIDAE Rhombophryne tridactyla DD f f t 58015.0

16 rows × 58 columns

As a result, use direct join between RL intersection and CCV.


In [29]:
## remove test variables 
del amp_result, test_amp

3. Analysis

There are several tasks outstanding:

  1. first of all, a reasonable and defendable cut-off percentage overlap value needs to be calculated as to determine what constitutes species found within World Heritage sites.
  2. once the value is decided, the wdpaid, id_no look up table is then filtered, and it can be further refined by variables such as IUCN Red List category, CCV traits etc.
  3. because the first question needs answering before further steps, and it may change. It is advisable to create a function to facilitate such process. i.e. a function such as:
    def f(input_result_table, cut-off-value):
     new_table= input_result_table[input_result_table.per >= cut-off-value]
     # function to produces outputs based on the result table
     result = process_result(new_table)
     return result
    

3.1 The cut-off value

The EOO (extent of occurence) nature of the Red List species distribution polygon, an overlap between a species and a WH site does not necessarily suggest that a species is present. This is made worse if only part of the distribution polygon intersects with the WH site - this could be a digitisation error or simply inaccuracy due to a scale mismatch. Such 'species within WH sites' as a result must be removed. Quantiles of can be a useful way to look at.


In [30]:
# what happens if a join is made directly without going through SIS and ID_NO
result_amp = pd.merge(amp, result, left_on='Fullname', right_on='binomial')
result_bird = pd.merge(bird, result, left_on='Fullname', right_on='binomial')
result_coral = pd.merge(coral_unique, result, left_on='Fullname', right_on='binomial')

In [31]:
all_per_abc = pd.concat([result_amp.per, result_bird.per, result_coral.per])
all_per_abc.hist(bins=50)


Out[31]:
<matplotlib.axes._subplots.AxesSubplot at 0xc051dd8>

The large majority of overlaps are fairely big, suggesting sites WH are generally entirely covered by species. This is not surprising as most species distributions polygons are a few magnitudes bigger than WH sites. Yet, it is possible that range-restricted and/or threatened species may have much smaller range.


In [32]:
all_per_abc.quantile(0.5)


Out[32]:
0.9582745841568836

For illustration purposes, note the difference of choosing a cut-off value between 10% and 15% is not signicant (~2%, or ~2000 pa-species pair less). This should have minimal effects on the result.


In [33]:
(all_per_abc>0.15).sum(), (all_per_abc>0.1).sum(), (all_per_abc>0.05).sum()


Out[33]:
(56508, 58029, 60301)

In [34]:
((all_per_abc>0.1).sum() - (all_per_abc>0.15).sum())/all_per_abc.index.size


Out[34]:
0.022358440642088553

However, if we look at amphibians, birds and mammals separately, it is easily noticeable that corals have significant number of low percentage overlaps. This is because marine World Heritage sites tend to be rather extensive and huge compared to coral distributions which are coastal. Thus it is important to examine the absolute overlap as well for corals.


In [35]:
result_amp.per.hist(bins=50)


Out[35]:
<matplotlib.axes._subplots.AxesSubplot at 0xc051f28>

In [36]:
result_bird.per.hist(bins=50)


Out[36]:
<matplotlib.axes._subplots.AxesSubplot at 0xb7b1d68>

In [37]:
result_coral.per.hist(bins=50)


Out[37]:
<matplotlib.axes._subplots.AxesSubplot at 0xd7be4a8>

In [38]:
result_amp.areakm2_x.hist(bins =50, log=True)


Out[38]:
<matplotlib.axes._subplots.AxesSubplot at 0x10ca4f60>

In [39]:
result_bird.areakm2_x.hist(bins =50, log=True)


Out[39]:
<matplotlib.axes._subplots.AxesSubplot at 0x1d1343c8>

In [40]:
result_coral.areakm2_x.hist(bins =50, log=True)


Out[40]:
<matplotlib.axes._subplots.AxesSubplot at 0x1d267f60>

The histograms show all three taxa have a bimodal distribution, with significant number of values near two poles (very large overlaps or very small ones)


In [41]:
result_coral.sort_values('per', ascending=True)[['en_name', 'binomial', 'areakm2_x', 'per']].head(20)


Out[41]:
en_name binomial areakm2_x per
5363 Darien National Park Millepora intricata 0.033701 0.000006
7471 Tropical Rainforest Heritage of Sumatra Podabacia motuporensis 0.874951 0.000034
2550 Tropical Rainforest Heritage of Sumatra Ctenactis albitentaculata 0.999503 0.000038
169 Tropical Rainforest Heritage of Sumatra Acropora abrolhosensis 0.999503 0.000038
4692 Henderson Island Leptastrea pruinosa 0.002691 0.000065
430 Henderson Island Acropora cytherea 0.002691 0.000065
777 Henderson Island Acropora hyacinthus 0.002691 0.000065
1577 Henderson Island Acropora subulata 0.002691 0.000065
3214 Henderson Island Favia matthaii 0.002691 0.000065
755 Henderson Island Acropora humilis 0.002691 0.000065
6285 Henderson Island Montipora venosa 0.002691 0.000065
7714 Henderson Island Porites lobata 0.002691 0.000065
281 Henderson Island Acropora austera 0.002691 0.000065
1347 Henderson Island Acropora retusa 0.002691 0.000065
2389 Henderson Island Caulastrea furcata 0.002691 0.000065
1362 Henderson Island Acropora robusta 0.002691 0.000065
5587 Henderson Island Montipora australiensis 0.002691 0.000065
4715 Henderson Island Leptastrea purpurea 0.002691 0.000065
5133 Henderson Island Lobophyllia hemprichii 0.002691 0.000065
7743 Henderson Island Porites lutea 0.002691 0.000065

In [42]:
result_coral[['en_name', 'binomial', 'areakm2_x', 'per']][result_coral.en_name.isin(['Henderson Island'])].head()


Out[42]:
en_name binomial areakm2_x per
195 Henderson Island Acropora aculeus 0.002691 0.000065
281 Henderson Island Acropora austera 0.002691 0.000065
350 Henderson Island Acropora cerealis 0.002691 0.000065
430 Henderson Island Acropora cytherea 0.002691 0.000065
480 Henderson Island Acropora digitifera 0.002691 0.000065

The threshold value can be reasonably justified by using both the perentage overlap and actual overlap. It is obvious if the entire WH site is covered by a species, then it should be counted. If the percentage overlap is too small, it could be either due to an inaccurate boundary, in which case, the species should not be counted; or a genunie overlap if the WH is considerably larger. Therefore, by adding an additional test to the absolute overlap value in $km^2$, ommissions due to small percentage overlap can be reduced.

Let's test the effect of using various emperical values of per and abskm2 to remove artifical overlaps caused by the inaccuracy of boundaries. This represents an optimistic estimate of the number of species in WH (reinforced by the nature of Red List EOO)


In [43]:
def test_params(per=None, abkm2=None):
    if per is None and abkm2 is None:
        result = None
    if per and abkm2: # both conditions are applied
        result = [(result_taxa[(result_taxa.per>per)|(result_taxa.areakm2_x>abkm2)].index.size, result_taxa.index.size) for \
 result_taxa in [result_amp, result_bird, result_coral]]
    elif per and abkm2 is None: # only per
        result = [(result_taxa[(result_taxa.per>per)].index.size, result_taxa.index.size) for \
 result_taxa in [result_amp, result_bird, result_coral]]
    elif per is None and abkm2:
        result = [(result_taxa[(result_taxa.areakm2_x>abkm2)].index.size, result_taxa.index.size) for \
 result_taxa in [result_amp, result_bird, result_coral]]
    else:
        return None
        
    print(per, abkm2, result)

# # check the number of rows after applying criteria
# for each in zip(['after', 'before'], *test_params(0.15, 1)):
#     print(each)

# per, abkm2, (after and before) for amp, bird and coral
print('---only per---')
test_params(0.05)
test_params(0.15)
test_params(0.25)
test_params(0.50)
print('---only abs km2---')
test_params(abkm2=1)
test_params(abkm2=5)
test_params(abkm2=10)
print('--- per or abs km2---')
test_params(0.15, 1)
test_params(0.15, 2)
test_params(0.25, 1)
test_params(0.25, 2)
test_params(0.25, 5)


---only per---
0.05 None [(4140, 4772), (50190, 54446), (5971, 8810)]
0.15 None [(3751, 4772), (47361, 54446), (5396, 8810)]
0.25 None [(3504, 4772), (45570, 54446), (5293, 8810)]
0.5 None [(3047, 4772), (41125, 54446), (3795, 8810)]
---only abs km2---
None 1 [(4689, 4772), (53712, 54446), (8445, 8810)]
None 5 [(4575, 4772), (52187, 54446), (7628, 8810)]
None 10 [(4523, 4772), (51560, 54446), (7326, 8810)]
--- per or abs km2---
0.15 1 [(4731, 4772), (54144, 54446), (8445, 8810)]
0.15 2 [(4711, 4772), (53852, 54446), (7694, 8810)]
0.25 1 [(4731, 4772), (54139, 54446), (8445, 8810)]
0.25 2 [(4701, 4772), (53704, 54446), (7694, 8810)]
0.25 5 [(4661, 4772), (53298, 54446), (7628, 8810)]

The result corroborates the distributions that percentage overlaps are sensitive and their change has a considerable impact on the number of filtered rows, while s small increase in absolute area in $km^2$ seems to have little effect. It is observed that an increase of per from 15% to 25% poses little change.

The threshold values chosen here may present a rather insigificant factor compared to the assumption made by the EOO intersection, i.e., even though a species overlaps 100% with a given WH, it may still be absent. However this approach remains a popular method in estimating global species conservation and offer a consistent view for comprehensively assessed species, in the absence of true area of occupancy (AOO) data.

I use 15% and 1km^2 as cut-off values

3.2 Climate vulnerability species inside WH sites

To be policy relavant at a site level, it's imperative that climate vulnerability information on individual World Heritage sites be obtained. This analysis illustrates how many amphibians, birds and coral species are affected by climate change in each World Heritage site, by using both the total number of climate vulnerable species and their proportion. Further aggretations can be used to reveal each component within sensitivity, low adaptability and exposure.


In [44]:
# get filtered result
result_amp_f, result_bird_f, result_coral_f = [result_taxa[(result_taxa.per>.15)|(result_taxa.areakm2_x>1)] for \
 result_taxa in [result_amp, result_bird, result_coral]]

# get species numbers
print('Filtered', 'Total-intersect', 'Total')
print(result_amp_f.Fullname.unique().size, result_amp.Fullname.unique().size, amp.Fullname.unique().size)
print(result_bird_f.Fullname.unique().size, result_bird.Fullname.unique().size, bird.Fullname.unique().size)
print(result_coral_f.Fullname.unique().size, result_coral.Fullname.unique().size, coral.Fullname.unique().size)


Filtered Total-intersect Total
2013 2030 6204
6914 6924 9856
724 727 797

Using a lambda function as an argument to get

  1. total number of high vulnerability species per WH site
  2. total number of species per WH site
  3. percentage of the high vulnerability species per WH site

In [45]:
## agg_dict passed to do multiple aggregation at the same time
## x is a group of FINAL_SCOREs
agg_dict = {'FINAL_SCORE':\
            {'H_vul_per': lambda x: (x=='H').sum()/x.size, \
             'total_H_vul': lambda x: (x=='H').sum(),\
             'total_vul':len}}

## amp
amp_v = result_amp_f.groupby(['wdpaid', 'en_name']).agg(agg_dict).reset_index()
amp_v.columns = amp_v.columns.droplevel()

# birds
bird_v = result_bird_f.groupby(['wdpaid', 'en_name']).agg(agg_dict).reset_index()
bird_v.columns = bird_v.columns.droplevel()

# corals
coral_v = result_coral_f.groupby(['wdpaid', 'en_name']).agg(agg_dict).reset_index()
coral_v.columns = coral_v.columns.droplevel()

In [46]:
bird_v.head()


Out[46]:
H_vul_per total_vul total_H_vul
0 191 Galápagos Islands 0.129412 85 11
1 197 Tikal National Park 0.179661 295 53
2 2004 Dinosaur Provincial Park 0.126866 134 17
3 2005 Nahanni National Park 0.141667 120 17
4 2006 Simien National Park 0.096886 289 28

It is necessary to add the missing labels for the two columns


In [47]:
# rename column names. One cannot do: amp_v.columns[1] = newcolumn_name
new_columns = amp_v.columns.values
new_columns[:2] = ['wdpaid', 'en_name']

amp_v.columns = new_columns
bird_v.columns = new_columns
coral_v.columns = new_columns

Graphs to show both the percentage (the number of species with FINAL SCORE =='H' against all species), and the total number of species that are highly vulnerable.


In [48]:
# percentage of H distribution
amp_v.H_vul_per.hist(bins=50)


Out[48]:
<matplotlib.axes._subplots.AxesSubplot at 0x1d0b0be0>

In [49]:
# total number of H distribution
amp_v.total_H_vul.hist(bins=50)


Out[49]:
<matplotlib.axes._subplots.AxesSubplot at 0x1e1715f8>

In [50]:
# percentage of H distribution
bird_v.H_vul_per.hist(bins=50)


Out[50]:
<matplotlib.axes._subplots.AxesSubplot at 0x10d58b38>

In [51]:
# total number of H distribution
bird_v.total_H_vul.hist(bins=50)


Out[51]:
<matplotlib.axes._subplots.AxesSubplot at 0x1ee95c50>

In [52]:
# percentage of H distribution
coral_v.H_vul_per.hist(bins=50)


Out[52]:
<matplotlib.axes._subplots.AxesSubplot at 0x1eec0cf8>

In [53]:
# total number of H distribution
coral_v.total_H_vul.hist(bins=50)


Out[53]:
<matplotlib.axes._subplots.AxesSubplot at 0x1dc93898>

In [54]:
print('Top WHs with high number of CV birds\n', bird_v.sort_values('total_H_vul', ascending=False).head(10))
print('==================')
print('Top WHs with high percentage of CV birds\n', bird_v.sort_values('H_vul_per', ascending=False).head(10))


Top WHs with high number of CV birds
      wdpaid                                            en_name  H_vul_per  \
92    61612                              Canaima National Park   0.366987   
72    17760                                 Manú National Park   0.266350   
149  220296                Central Amazon Conservation Complex   0.370307   
151  220298                    Central Suriname Nature Reserve   0.411157   
41     9614                               Sangay National Park   0.268730   
148  220295                  Noel Kempff Mercado National Park   0.227979   
52    10903  Talamanca Range-La Amistad Reserves / La Amist...   0.225750   
15     2554                               Darien National Park   0.240310   
90    61610                           Los Katíos National Park   0.241228   
82    26651                           Río Abiseo National Park   0.211240   

     total_vul  total_H_vul  
92         624          229  
72         841          224  
149        586          217  
151        484          199  
41         614          165  
148        579          132  
52         567          128  
15         516          124  
90         456          110  
82         516          109  
==================
Top WHs with high percentage of CV birds
         wdpaid                                            en_name  H_vul_per  \
120     145576                         Heard and McDonald Islands   0.512195   
177     902373                                 Ilulissat Icefjord   0.500000   
204  555512003                                   Putorana Plateau   0.472222   
198     903139                                            Surtsey   0.431373   
128     168239                  New Zealand Sub-Antarctic Islands   0.423077   
151     220298                    Central Suriname Nature Reserve   0.411157   
61       12207                Giant's Causeway and Causeway Coast   0.400000   
118     124388                                      Laponian Area   0.396947   
184     902489  West Norwegian Fjords – Geirangerfjord and Nær...   0.395522   
224  555577556                                       Stevns Klint   0.394161   

     total_vul  total_H_vul  
120         41           21  
177         36           18  
204         72           34  
198         51           22  
128         78           33  
151        484          199  
61         100           40  
118        131           52  
184        134           53  
224        137           54  

In [55]:
print('Top WHs with high number of CV amphibians\n', amp_v.sort_values('total_H_vul', ascending=False).head(10))
print('==================')
print('Top WHs with high percentage of CV amphibian\n', amp_v.sort_values('H_vul_per', ascending=False).head(10))


Top WHs with high number of CV amphibians
      wdpaid                                            en_name  H_vul_per  \
49    10903  Talamanca Range-La Amistad Reserves / La Amist...   0.325926   
133  220298                    Central Suriname Nature Reserve   0.445783   
67    17760                                 Manú National Park   0.342593   
83    61612                              Canaima National Park   0.360465   
131  220296                Central Amazon Conservation Complex   0.318681   
152  902347                 Cape Floral Region Protected Areas   0.425532   
162  903062                      Rainforests of the Atsinanana   0.120253   
127  220292                                      Kinabalu Park   0.260870   
128  220293                          Gunung Mulu National Park   0.293103   
120  198296                    Area de Conservación Guanacaste   0.326923   

     total_vul  total_H_vul  
49         135           44  
133         83           37  
67         108           37  
83          86           31  
131         91           29  
152         47           20  
162        158           19  
127         69           18  
128         58           17  
120         52           17  
==================
Top WHs with high percentage of CV amphibian
         wdpaid                             en_name  H_vul_per  total_vul  \
153     902367              Pitons Management Area   1.000000          1   
181  555547992        Rock Islands Southern Lagoon   1.000000          1   
173  555512003                    Putorana Plateau   1.000000          1   
106     124387              Volcanoes of Kamchatka   1.000000          1   
110     145583    Morne Trois Pitons National Park   0.666667          3   
180  555547991            Lena Pillars Nature Park   0.500000          2   
148     900880                      Uvs Nuur Basin   0.500000          2   
96       68918      Whale Sanctuary of El Vizcaino   0.500000          2   
133     220298     Central Suriname Nature Reserve   0.445783         83   
152     902347  Cape Floral Region Protected Areas   0.425532         47   

     total_H_vul  
153            1  
181            1  
173            1  
106            1  
110            2  
180            1  
148            1  
96             1  
133           37  
152           20  

In [56]:
amp_v.quantile(0.9)


Out[56]:
wdpaid         5.555120e+08
H_vul_per      3.320513e-01
total_vul      5.800000e+01
total_H_vul    4.800000e+00
dtype: float64

In [57]:
amp_v.quantile(0.1)


Out[57]:
wdpaid         2574.2
H_vul_per         0.0
total_vul         3.0
total_H_vul       0.0
dtype: float64

In [58]:
bird_v.quantile(0.9)


Out[58]:
wdpaid         5.555120e+08
H_vul_per      3.503704e-01
total_vul      4.516000e+02
total_H_vul    6.720000e+01
dtype: float64

In [59]:
bird_v.quantile(0.1)


Out[59]:
wdpaid         2577.800000
H_vul_per         0.058983
total_vul        56.600000
total_H_vul       9.000000
dtype: float64

For communications purposes, it may be convenient to have a single indicator/index to indicate whether a given WH site is considered climate vulnerable for some taxon (in this case amphibians). The questions then becomes how such a cut-off value can be reasonable calculated and can be meaningfully applied. One way is to, again, use quantile cut-off values for total number of highly vulnerable species and percentage overlap. However, any ranking created using quantile only compares within WH sites. Ideally, this should be derived using a global metric. Perhaps this is an important caveat. One needs to know about other taxa before making a decision as they should be consistent.

Questions:

  • make sense to rank? 10% sounds a good cut-off value?

Top10, top 50, top200

WH sites with high proportion seem to have a low total number of CV amphibians. Thus using proportion alone may not paint a full picture. It is conceivable that a combination of both number and propertion is needed to classify/rank WH sites.

3.4 Climate vulnerability species inside WH sites, new methods

While the above methods work well, similar stats for other columns need considerable manual coding. In this study, depending on the taxon of interest, there may be as high as 20 replications (a total of ~60 if all three taxa are conisidered). Therefore it is necessary to find a way through which such aggregations could be calculated simply and more scalable.


In [60]:
# get all fields and pick those we need for the stats
print(result_amp_f.columns)
print(result_bird_f.columns)
print(result_coral_f.columns)


Index(['SVP_ID', 'SIS_GAA_ID', 'GAA Family', 'Genus', 'Species', 'Fullname',
       'Threatened', 'SUSC_A_Habitats', 'SUSC_A_aquatic larvae',
       'SUSC_B_Temperature Range', 'SUSC_B_Precipitation Range',
       'SUSC_C_explosive breeder', 'SUSC_D_disease', 'SENSITIVITY',
       'ADAPT_A_barriers', 'ADAPT_A_dispersal_distance',
       'ADAPT_C_Slow_Gen_Turnpover', 'LOW_ADAPTABILITY', 'EXP_Sea Level',
       'EXP_MeanTemperature', 'EXP_MeanRainfall', 'EXP_AADTemperature',
       'EXP_AADRainfall', 'EXPOSURE', 'FINAL_SCORE', 'Final_Pessimistic',
       'Score_HorS', 'SensXLAdaptability', 'SensXExposure', 'LAdaptXExposure',
       'Threatened_and_Vulnerable', 'Unnamed: 0', 'wdpaid', 'id_no',
       'areakm2_x', 'areakm2_y', 'per', 'en_name', 'fr_name', 'status_yr',
       'rep_area', 'gis_area', 'country', 'crit', 'areakm2', 'binomial',
       'kingdom_name', 'phylum_name', 'class_name', 'order_name',
       'family_name', 'genus_name', 'species_name', 'category', 'biome_marine',
       'biome_freshwater', 'biome_terrestrial', 'id_no_int'],
      dtype='object')
Index(['SVP_ID', 'Fullname', 'Common name', 'New seq', 'SPcRecID',
       '2008 RDL Cat', 'Threatened', 'FINAL_SCORE', '__hab_specialisation',
       '__microhabitat', '__ForestDependence', '__TemperatureRange',
       '__PrecipRange', '__Species dependence', '__PopnSize',
       '__EffectivePopnSize', 'SENSITIVITY', '__Dispersal distance limited',
       '__Dispersal barriers', '__Low genetic diversity', '__clutch size',
       '__Gen_Length', 'LOW_ADAPTABILITY', '__EXP_Sea Inundation',
       '__EXP_MeanTemperature', '__EXP_AADTemperature', '__EXP_MeanPrecip',
       '__EXP_AADPrecip', 'EXPOSURE', 'Unnamed: 0', 'wdpaid', 'id_no',
       'areakm2_x', 'areakm2_y', 'per', 'en_name', 'fr_name', 'status_yr',
       'rep_area', 'gis_area', 'country', 'crit', 'areakm2', 'binomial',
       'kingdom_name', 'phylum_name', 'class_name', 'order_name',
       'family_name', 'genus_name', 'species_name', 'category', 'biome_marine',
       'biome_freshwater', 'biome_terrestrial', 'id_no_int'],
      dtype='object')
Index(['Fullname', 'SVProject_ID', 'genus', 'species', 'Threatened',
       'FINAL_SCORE', '_Few habitats', '_narrow depth rng',
       '_Larval vulnerabilityH', '_ShallowH', '_PastMortalityH', '_zooxH',
       '_Rare', 'SENSITIVITY', '_Short dispersal distance',
       '_dispersal barriers', '_ColonyMaxAge', '_SlowGrowthRate',
       'LOW_ADAPTABILITY', '_ProportionArag<3', '_MeanDHM', 'EXPOSURE',
       'Unnamed: 0', 'wdpaid', 'id_no', 'areakm2_x', 'areakm2_y', 'per',
       'en_name', 'fr_name', 'status_yr', 'rep_area', 'gis_area', 'country',
       'crit', 'areakm2', 'binomial', 'kingdom_name', 'phylum_name',
       'class_name', 'order_name', 'family_name', 'genus_name', 'species_name',
       'category', 'biome_marine', 'biome_freshwater', 'biome_terrestrial',
       'id_no_int'],
      dtype='object')

In [61]:
# Selected columns, based on the df.columns for each taxon
## amphibian
a_columns = ['SUSC_A_Habitats', 'SUSC_A_aquatic larvae',
 'SUSC_B_Temperature Range', 'SUSC_B_Precipitation Range',
 'SUSC_C_explosive breeder', 'SUSC_D_disease', 'SENSITIVITY',
 'ADAPT_A_barriers', 'ADAPT_A_dispersal_distance',
 'ADAPT_C_Slow_Gen_Turnpover', 'LOW_ADAPTABILITY', 
 'EXP_Sea Level',
 'EXP_MeanTemperature', 'EXP_MeanRainfall', 'EXP_AADTemperature',
'EXP_AADRainfall', 'EXPOSURE', 'FINAL_SCORE']

## birds
b_columns = ['FINAL_SCORE', '__hab_specialisation', 
'__microhabitat', '__ForestDependence', '__TemperatureRange',
'__PrecipRange', '__Species dependence', '__PopnSize',
'__EffectivePopnSize', 'SENSITIVITY', '__Dispersal distance limited',
'__Dispersal barriers', '__Low genetic diversity', '__clutch size',
'__Gen_Length', 'LOW_ADAPTABILITY', '__EXP_Sea Inundation',
'__EXP_MeanTemperature', '__EXP_AADTemperature', '__EXP_MeanPrecip',
'__EXP_AADPrecip', 'EXPOSURE']

## corals
c_columns = ['FINAL_SCORE', '_Few habitats', '_narrow depth rng',
       '_Larval vulnerabilityH', '_ShallowH', '_PastMortalityH', '_zooxH',
       '_Rare', 'SENSITIVITY', '_Short dispersal distance',
       '_dispersal barriers', '_ColonyMaxAge', '_SlowGrowthRate',
       'LOW_ADAPTABILITY', '_ProportionArag<3', '_MeanDHM', 'EXPOSURE']

In [62]:
# apply summation of HLU to each column, count how many occurences of each H, L, U and percentage
def get_hlu(x):
    return pd.Series({'H': (x=='H').sum(), 'L': (x=='L').sum(), 'U': (x=='U').sum(), 'per_H': (x=='H').sum()/x.size})

# apply get_hlu to each df
def apply_func(df, select_columns):
    # for each column of the df apply a function
    return df[select_columns].apply(get_hlu).T

# ======= GENERIC VERSION ======
# get unique value of the series and count how many occurences of each unique value
def generic_count(x):
    # x is a pd.Series object
    unique_values = x.unique()
    return pd.Series({value: (x==value).sum() for value in unique_values})

def apply_func_mk2(df):
    return df[summary_columnsy_columns].apply(generic_count).T

# ======= about 'apply' =======
## dfgroupby.apply: takes a df as argument(for each sub df defined by groupby criteria)
## dataframe.apply: takes a series as argument (for each of the columns in the df)
## series.apply: takes a value as argument (for each value of the series)

In [63]:
## split into df for each unique wdpaid-en_name combinations
## for each df apply the logic
amp_vv =  result_amp_f.groupby(['wdpaid', 'en_name']).apply(apply_func, a_columns).reset_index()
bird_vv = result_bird_f.groupby(['wdpaid', 'en_name']).apply(apply_func, b_columns).reset_index()
coral_vv = result_coral_f.groupby(['wdpaid', 'en_name']).apply(apply_func, c_columns).reset_index()

# amp_vvv = groups.apply(apply_func_mk2).reset_index()

In [64]:
## spot checks
bird_vv[bird_vv.wdpaid == 191]


Out[64]:
wdpaid en_name level_2 H L U per_H
0 191 Galápagos Islands FINAL_SCORE 11.0 74.0 0.0 0.129412
1 191 Galápagos Islands __hab_specialisation 4.0 81.0 0.0 0.047059
2 191 Galápagos Islands __microhabitat 1.0 84.0 0.0 0.011765
3 191 Galápagos Islands __ForestDependence 8.0 77.0 0.0 0.094118
4 191 Galápagos Islands __TemperatureRange 2.0 69.0 14.0 0.023529
5 191 Galápagos Islands __PrecipRange 20.0 51.0 14.0 0.235294
6 191 Galápagos Islands __Species dependence 0.0 85.0 0.0 0.000000
7 191 Galápagos Islands __PopnSize 12.0 56.0 17.0 0.141176
8 191 Galápagos Islands __EffectivePopnSize 15.0 53.0 17.0 0.176471
9 191 Galápagos Islands SENSITIVITY 32.0 39.0 14.0 0.376471
10 191 Galápagos Islands __Dispersal distance limited 24.0 61.0 0.0 0.282353
11 191 Galápagos Islands __Dispersal barriers 18.0 67.0 0.0 0.211765
12 191 Galápagos Islands __Low genetic diversity 6.0 79.0 0.0 0.070588
13 191 Galápagos Islands __clutch size 27.0 52.0 6.0 0.317647
14 191 Galápagos Islands __Gen_Length 47.0 38.0 0.0 0.552941
15 191 Galápagos Islands LOW_ADAPTABILITY 59.0 25.0 1.0 0.694118
16 191 Galápagos Islands __EXP_Sea Inundation 8.0 77.0 0.0 0.094118
17 191 Galápagos Islands __EXP_MeanTemperature 19.0 51.0 15.0 0.223529
18 191 Galápagos Islands __EXP_AADTemperature 5.0 65.0 15.0 0.058824
19 191 Galápagos Islands __EXP_MeanPrecip 12.0 58.0 15.0 0.141176
20 191 Galápagos Islands __EXP_AADPrecip 6.0 64.0 15.0 0.070588
21 191 Galápagos Islands EXPOSURE 30.0 40.0 15.0 0.352941

In [65]:
bird_vv[bird_vv.wdpaid == 2008]


Out[65]:
wdpaid en_name level_2 H L U per_H
132 2008 Bia?owie?a Forest FINAL_SCORE 60.0 119.0 0.0 0.335196
133 2008 Bia?owie?a Forest __hab_specialisation 0.0 179.0 0.0 0.000000
134 2008 Bia?owie?a Forest __microhabitat 12.0 167.0 0.0 0.067039
135 2008 Bia?owie?a Forest __ForestDependence 13.0 166.0 0.0 0.072626
136 2008 Bia?owie?a Forest __TemperatureRange 0.0 176.0 3.0 0.000000
137 2008 Bia?owie?a Forest __PrecipRange 152.0 24.0 3.0 0.849162
138 2008 Bia?owie?a Forest __Species dependence 0.0 179.0 0.0 0.000000
139 2008 Bia?owie?a Forest __PopnSize 0.0 179.0 0.0 0.000000
140 2008 Bia?owie?a Forest __EffectivePopnSize 0.0 179.0 0.0 0.000000
141 2008 Bia?owie?a Forest SENSITIVITY 161.0 18.0 0.0 0.899441
142 2008 Bia?owie?a Forest __Dispersal distance limited 5.0 174.0 0.0 0.027933
143 2008 Bia?owie?a Forest __Dispersal barriers 4.0 175.0 0.0 0.022346
144 2008 Bia?owie?a Forest __Low genetic diversity 1.0 178.0 0.0 0.005587
145 2008 Bia?owie?a Forest __clutch size 14.0 164.0 1.0 0.078212
146 2008 Bia?owie?a Forest __Gen_Length 66.0 113.0 0.0 0.368715
147 2008 Bia?owie?a Forest LOW_ADAPTABILITY 76.0 103.0 0.0 0.424581
148 2008 Bia?owie?a Forest __EXP_Sea Inundation 0.0 179.0 0.0 0.000000
149 2008 Bia?owie?a Forest __EXP_MeanTemperature 161.0 15.0 3.0 0.899441
150 2008 Bia?owie?a Forest __EXP_AADTemperature 0.0 176.0 3.0 0.000000
151 2008 Bia?owie?a Forest __EXP_MeanPrecip 0.0 176.0 3.0 0.000000
152 2008 Bia?owie?a Forest __EXP_AADPrecip 0.0 176.0 3.0 0.000000
153 2008 Bia?owie?a Forest EXPOSURE 161.0 15.0 3.0 0.899441

In [66]:
result_amp_f[result_amp_f.wdpaid == 191]


Out[66]:
SVP_ID SIS_GAA_ID GAA Family Genus Species Fullname Threatened SUSC_A_Habitats SUSC_A_aquatic larvae SUSC_B_Temperature Range ... class_name order_name family_name genus_name species_name category biome_marine biome_freshwater biome_terrestrial id_no_int

0 rows × 58 columns


In [67]:
amp_vv[amp_vv.wdpaid == 191]
coral_vv[coral_vv.wdpaid == 191]


Out[67]:
wdpaid en_name level_2 H L U per_H
0 191 Galápagos Islands FINAL_SCORE 2.0 19.0 0.0 0.095238
1 191 Galápagos Islands _Few habitats 4.0 17.0 0.0 0.190476
2 191 Galápagos Islands _narrow depth rng 4.0 16.0 1.0 0.190476
3 191 Galápagos Islands _Larval vulnerabilityH 2.0 19.0 0.0 0.095238
4 191 Galápagos Islands _ShallowH 4.0 16.0 1.0 0.190476
5 191 Galápagos Islands _PastMortalityH 11.0 10.0 0.0 0.523810
6 191 Galápagos Islands _zooxH 15.0 6.0 0.0 0.714286
7 191 Galápagos Islands _Rare 3.0 18.0 0.0 0.142857
8 191 Galápagos Islands SENSITIVITY 21.0 0.0 0.0 1.000000
9 191 Galápagos Islands _Short dispersal distance 0.0 18.0 3.0 0.000000
10 191 Galápagos Islands _dispersal barriers 3.0 18.0 0.0 0.142857
11 191 Galápagos Islands _ColonyMaxAge 0.0 21.0 0.0 0.000000
12 191 Galápagos Islands _SlowGrowthRate 7.0 14.0 0.0 0.333333
13 191 Galápagos Islands LOW_ADAPTABILITY 8.0 13.0 0.0 0.380952
14 191 Galápagos Islands _ProportionArag<3 3.0 18.0 0.0 0.142857
15 191 Galápagos Islands _MeanDHM 1.0 19.0 1.0 0.047619
16 191 Galápagos Islands EXPOSURE 3.0 18.0 0.0 0.142857

Having more CCV species, having more CCV species in proportion indicates WH sites, whose species are more susceptible to adverse impact due to climate change.

Questions and statements that can be made?

  • within WH sites, the top sites re CCV species number and proportion, by taxa. Narrative: more consideration to mitigate adverse effect of CCV to species

In [68]:
# dump for database
amp_vv.to_csv('agg_amp.csv')
bird_vv.to_csv('agg_bird.csv')
coral_vv.to_csv('agg_coral.csv')

result_amp_f.to_csv('wh_amp.csv')
result_bird_f.to_csv('wh_bird.csv')
result_coral_f.to_csv('wh_coral.csv')

3.5 Climate vulnerability inside the entire WH network

This section looks at individual taxon within the entire network. A comparison is then made to examine whether or not the WH network house species that are more vulnerable to climate change. Because the result contains many duplicate species rows (due to different sites having the same species), the first step is to get a unique list of species within all WH sites.


In [69]:
# combine species in WH result and all results, by taxa
## get a unique set of species within each taxon
amp_selected = result_amp_f.groupby('id_no_int').first().reset_index()
bird_selected = result_bird_f.groupby('id_no_int').first().reset_index()
coral_selected = result_coral_f.groupby('id_no_int').first().reset_index()

## function to aggregate result and apply hlu function to all columns
def get_agg_result(unique_taxon_result, taxon_columns):
    return unique_taxon_result[taxon_columns].apply(get_hlu).T


amp_total = pd.merge(get_agg_result(amp_selected, a_columns), \
                    get_agg_result(amp, a_columns),
                    left_index = True, right_index=True, suffixes=('_wh', '_all'))

bird_total = pd.merge(get_agg_result(bird_selected, b_columns), \
                    get_agg_result(bird, b_columns),
                    left_index = True, right_index=True, suffixes=('_wh', '_all'))

coral_total = pd.merge(get_agg_result(coral_selected, c_columns), \
                    get_agg_result(coral, c_columns),
                    left_index = True, right_index=True, suffixes=('_wh', '_all'))

In [70]:
amp_total


Out[70]:
H_wh L_wh U_wh per_H_wh H_all L_all U_all per_H_all
SUSC_A_Habitats 220.0 1783.0 10.0 0.109290 1509.0 4539.0 156.0 0.243230
SUSC_A_aquatic larvae 330.0 1673.0 10.0 0.163934 955.0 5085.0 164.0 0.153933
SUSC_B_Temperature Range 386.0 1626.0 1.0 0.191754 1519.0 4557.0 128.0 0.244842
SUSC_B_Precipitation Range 479.0 1533.0 1.0 0.237953 1519.0 4557.0 128.0 0.244842
SUSC_C_explosive breeder 162.0 1629.0 222.0 0.080477 316.0 4113.0 1775.0 0.050935
SUSC_D_disease 436.0 1577.0 0.0 0.216592 1307.0 4897.0 0.0 0.210671
SENSITIVITY 1340.0 606.0 67.0 0.665673 4453.0 1365.0 386.0 0.717763
ADAPT_A_barriers 183.0 1661.0 169.0 0.090909 745.0 3900.0 1559.0 0.120084
ADAPT_A_dispersal_distance 501.0 1493.0 19.0 0.248882 1569.0 4522.0 113.0 0.252901
ADAPT_C_Slow_Gen_Turnpover 609.0 586.0 818.0 0.302534 2073.0 899.0 3232.0 0.334139
LOW_ADAPTABILITY 997.0 1007.0 9.0 0.495281 3233.0 2898.0 73.0 0.521115
EXP_Sea Level 0.0 2003.0 10.0 0.000000 4.0 6044.0 156.0 0.000645
EXP_MeanTemperature 315.0 1697.0 1.0 0.156483 1515.0 4544.0 145.0 0.244197
EXP_MeanRainfall 503.0 1509.0 1.0 0.249876 1515.0 4544.0 145.0 0.244197
EXP_AADTemperature 212.0 1800.0 1.0 0.105315 1498.0 4561.0 145.0 0.241457
EXP_AADRainfall 371.0 1641.0 1.0 0.184302 1515.0 4544.0 145.0 0.244197
EXPOSURE 864.0 1145.0 4.0 0.429210 3356.0 2642.0 206.0 0.540941
FINAL_SCORE 301.0 1712.0 0.0 0.149528 1368.0 4836.0 0.0 0.220503

In [71]:
a = amp_total[['H_wh', 'L_wh', 'U_wh']].sum(1)/amp_total[['H_all', 'L_all', 'U_all']].sum(1)
b = pd.concat([amp_total.H_wh/amp_total.H_all, a], 1)
b.columns = ['H_per', 'total_per']
b
# (amp_total.H_wh + amp_total.L_wh + amp_total.U_wh)/(amp_total.H_all + amp_total.L_all + amp_total.U_all)


Out[71]:
H_per total_per
SUSC_A_Habitats 0.145792 0.324468
SUSC_A_aquatic larvae 0.345550 0.324468
SUSC_B_Temperature Range 0.254115 0.324468
SUSC_B_Precipitation Range 0.315339 0.324468
SUSC_C_explosive breeder 0.512658 0.324468
SUSC_D_disease 0.333588 0.324468
SENSITIVITY 0.300921 0.324468
ADAPT_A_barriers 0.245638 0.324468
ADAPT_A_dispersal_distance 0.319312 0.324468
ADAPT_C_Slow_Gen_Turnpover 0.293777 0.324468
LOW_ADAPTABILITY 0.308382 0.324468
EXP_Sea Level 0.000000 0.324468
EXP_MeanTemperature 0.207921 0.324468
EXP_MeanRainfall 0.332013 0.324468
EXP_AADTemperature 0.141522 0.324468
EXP_AADRainfall 0.244884 0.324468
EXPOSURE 0.257449 0.324468
FINAL_SCORE 0.220029 0.324468

It appears that also fewer proportion of amphibians in WH sites are climate vulnerable


In [72]:
bird_total


Out[72]:
H_wh L_wh U_wh per_H_wh H_all L_all U_all per_H_all
FINAL_SCORE 1336.0 5578.0 0.0 0.193231 2323.0 7533.0 0.0 0.235694
__hab_specialisation 849.0 6059.0 6.0 0.122794 1530.0 8306.0 20.0 0.155235
__microhabitat 703.0 6211.0 0.0 0.101678 1001.0 8855.0 0.0 0.101562
__ForestDependence 1527.0 5386.0 1.0 0.220856 2575.0 7277.0 4.0 0.261262
__TemperatureRange 1068.0 4401.0 1445.0 0.154469 1974.0 6118.0 1764.0 0.200284
__PrecipRange 1414.0 4055.0 1445.0 0.204513 2095.0 5997.0 1764.0 0.212561
__Species dependence 66.0 6848.0 0.0 0.009546 89.0 9767.0 0.0 0.009030
__PopnSize 393.0 1794.0 4727.0 0.056841 1084.0 2319.0 6453.0 0.109984
__EffectivePopnSize 558.0 1629.0 4727.0 0.080706 1410.0 1993.0 6453.0 0.143060
SENSITIVITY 4022.0 585.0 2307.0 0.581718 6290.0 719.0 2847.0 0.638190
__Dispersal distance limited 1180.0 5734.0 0.0 0.170668 1993.0 7863.0 0.0 0.202212
__Dispersal barriers 314.0 6600.0 0.0 0.045415 700.0 9156.0 0.0 0.071023
__Low genetic diversity 41.0 6873.0 0.0 0.005930 69.0 9787.0 0.0 0.007001
__clutch size 1760.0 3213.0 1941.0 0.254556 2414.0 3946.0 3496.0 0.244927
__Gen_Length 1730.0 5184.0 0.0 0.250217 2500.0 7356.0 0.0 0.253653
LOW_ADAPTABILITY 3576.0 2092.0 1246.0 0.517211 5337.0 2507.0 2012.0 0.541498
__EXP_Sea Inundation 100.0 6808.0 6.0 0.014463 163.0 9673.0 20.0 0.016538
__EXP_MeanTemperature 1227.0 4222.0 1465.0 0.177466 1921.0 6066.0 1869.0 0.194907
__EXP_AADTemperature 944.0 4505.0 1465.0 0.136535 1925.0 6062.0 1869.0 0.195312
__EXP_MeanPrecip 1180.0 4269.0 1465.0 0.170668 1998.0 5989.0 1869.0 0.202719
__EXP_AADPrecip 1070.0 4379.0 1465.0 0.154758 2152.0 5835.0 1869.0 0.218344
EXPOSURE 3053.0 2406.0 1455.0 0.441568 4920.0 3082.0 1854.0 0.499188

In [73]:
coral_total


Out[73]:
H_wh L_wh U_wh per_H_wh H_all L_all U_all per_H_all
FINAL_SCORE 111.0 582.0 31.0 0.153315 242.0 1198.0 154.0 0.151819
_Few habitats 160.0 564.0 0.0 0.220994 384.0 1210.0 0.0 0.240903
_narrow depth rng 169.0 529.0 26.0 0.233425 384.0 1140.0 70.0 0.240903
_Larval vulnerabilityH 133.0 589.0 2.0 0.183702 274.0 1316.0 4.0 0.171895
_ShallowH 172.0 527.0 25.0 0.237569 376.0 1156.0 62.0 0.235885
_PastMortalityH 312.0 412.0 0.0 0.430939 644.0 950.0 0.0 0.404015
_zooxH 669.0 54.0 1.0 0.924033 1478.0 114.0 2.0 0.927227
_Rare 144.0 575.0 5.0 0.198895 392.0 1190.0 12.0 0.245922
SENSITIVITY 723.0 1.0 0.0 0.998619 1592.0 2.0 0.0 0.998745
_Short dispersal distance 65.0 491.0 168.0 0.089779 144.0 1042.0 408.0 0.090339
_dispersal barriers 83.0 635.0 6.0 0.114641 234.0 1338.0 22.0 0.146801
_ColonyMaxAge 11.0 707.0 6.0 0.015193 26.0 1546.0 22.0 0.016311
_SlowGrowthRate 263.0 455.0 6.0 0.363260 586.0 990.0 18.0 0.367629
LOW_ADAPTABILITY 366.0 356.0 2.0 0.505525 840.0 746.0 8.0 0.526976
_ProportionArag<3 161.0 521.0 42.0 0.222376 354.0 1058.0 182.0 0.222083
_MeanDHM 170.0 513.0 41.0 0.234807 368.0 1036.0 190.0 0.230866
EXPOSURE 249.0 445.0 30.0 0.343923 542.0 894.0 158.0 0.340025

Amphibians and birds in WH sites have consistently lower number of CCV species in proportion, compared to all ampbibians (14.9% to 22%) and birds (19.3% and 23.5%). Lower proportions are also observed in sensitivity, low adaptability and exposure.

Question: what does it mean? what statements can be realistically made?

  1. is it because of WH sites, therefore these species are less vulnerable? Probably no, as these species are not restricted inside WH sites. Thus no statement can made about to justify the positive effect of WH sites.
  2. WH contributes a good deal, as it covers % of H CCV species globally?
  3. what others?

Corals show a consistent trend in proportion in terms of vulnerability, and SAE.

Test statistics for significance

per region (IUCN/UNESCO regions?) / biodiversity sites


In [74]:
amp_total


Out[74]:
H_wh L_wh U_wh per_H_wh H_all L_all U_all per_H_all
SUSC_A_Habitats 220.0 1783.0 10.0 0.109290 1509.0 4539.0 156.0 0.243230
SUSC_A_aquatic larvae 330.0 1673.0 10.0 0.163934 955.0 5085.0 164.0 0.153933
SUSC_B_Temperature Range 386.0 1626.0 1.0 0.191754 1519.0 4557.0 128.0 0.244842
SUSC_B_Precipitation Range 479.0 1533.0 1.0 0.237953 1519.0 4557.0 128.0 0.244842
SUSC_C_explosive breeder 162.0 1629.0 222.0 0.080477 316.0 4113.0 1775.0 0.050935
SUSC_D_disease 436.0 1577.0 0.0 0.216592 1307.0 4897.0 0.0 0.210671
SENSITIVITY 1340.0 606.0 67.0 0.665673 4453.0 1365.0 386.0 0.717763
ADAPT_A_barriers 183.0 1661.0 169.0 0.090909 745.0 3900.0 1559.0 0.120084
ADAPT_A_dispersal_distance 501.0 1493.0 19.0 0.248882 1569.0 4522.0 113.0 0.252901
ADAPT_C_Slow_Gen_Turnpover 609.0 586.0 818.0 0.302534 2073.0 899.0 3232.0 0.334139
LOW_ADAPTABILITY 997.0 1007.0 9.0 0.495281 3233.0 2898.0 73.0 0.521115
EXP_Sea Level 0.0 2003.0 10.0 0.000000 4.0 6044.0 156.0 0.000645
EXP_MeanTemperature 315.0 1697.0 1.0 0.156483 1515.0 4544.0 145.0 0.244197
EXP_MeanRainfall 503.0 1509.0 1.0 0.249876 1515.0 4544.0 145.0 0.244197
EXP_AADTemperature 212.0 1800.0 1.0 0.105315 1498.0 4561.0 145.0 0.241457
EXP_AADRainfall 371.0 1641.0 1.0 0.184302 1515.0 4544.0 145.0 0.244197
EXPOSURE 864.0 1145.0 4.0 0.429210 3356.0 2642.0 206.0 0.540941
FINAL_SCORE 301.0 1712.0 0.0 0.149528 1368.0 4836.0 0.0 0.220503

Statistical testing on the finding

Is it possible that the statistical differences observed simply reflect the fact that WH network happens to draw a sample from the pool of all species and that the difference is only due to random sampling and nothing else. In other words, the difference is not because of WH capturing species of specific traits but random sampling.


In [75]:
print('amp:', amp.index.size, 'in WH', amp_selected.index.size)
print('bird:', bird.index.size, 'in WH', bird_selected.index.size)
print('coral:', coral_unique.index.size, 'in WH', coral_selected.index.size)


amp: 6204 in WH 2013
bird: 9856 in WH 6914
coral: 797 in WH 724

In [76]:
amp_selected.head()


Out[76]:
id_no_int SVP_ID SIS_GAA_ID GAA Family Genus Species Fullname Threatened SUSC_A_Habitats SUSC_A_aquatic larvae ... phylum_name class_name order_name family_name genus_name species_name category biome_marine biome_freshwater biome_terrestrial
0 520.0 4556 2661 LIMNODYNASTIDAE Adelotus brevis Adelotus brevis NaN L L ... CHORDATA AMPHIBIA ANURA LIMNODYNASTIDAE Adelotus brevis NT f t t
1 596.0 4760 100011 HYPEROLIIDAE Afrixalus aureus Afrixalus aureus NaN L H ... CHORDATA AMPHIBIA ANURA HYPEROLIIDAE Afrixalus aureus LC f t t
2 1272.0 3629 51121 CRYPTOBRANCHIDAE Andrias davidianus Andrias davidianus Y L L ... CHORDATA AMPHIBIA CAUDATA CRYPTOBRANCHIDAE Andrias davidianus CR f t f
3 1282.0 3644 13093 PLETHODONTIDAE Aneides aeneus Aneides aeneus NaN H L ... CHORDATA AMPHIBIA CAUDATA PLETHODONTIDAE Aneides aeneus NT f f t
4 2865.0 3109 15108 BOMBINATORIDAE Bombina bombina Bombina bombina NaN L L ... CHORDATA AMPHIBIA ANURA BOMBINATORIDAE Bombina bombina LC f t t

5 rows × 58 columns

If I get repeated random samples from the pool, calculate the H_per value and then plot them...


In [151]:
# get a random sample of 2013 repeatedly and calculate H_per
# ===== explanation
# ## get a random sample without replacement
# a = amp[amp.SVP_ID.isin(np.random.choice(amp.SVP_ID, 2013, replace=False))]
# ## apply HLU and get the final score
# a[a_columns].apply(get_hlu).T.ix['FINAL_SCORE'].per_H

test_a = [amp[amp.SVP_ID.isin(np.random.choice(amp.SVP_ID, 2013, replace=True))][a_columns].apply(get_hlu).T.ix['FINAL_SCORE'].per_H\
          for i in range(1000)]

In [152]:
sns.distplot(test_a)


C:\Users\yichuans\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\nonparametric\kdetools.py:20: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  y = X[:m/2+1] + np.r_[0,X[m/2+1:],0]*1j
Out[152]:
<matplotlib.axes._subplots.AxesSubplot at 0x26eb8208>

In [153]:
amp_total.ix['FINAL_SCORE'].per_H_wh


Out[153]:
0.14952806756085443

It is quite clear the WH result is an outlier - a signficant difference from the empirical distribution.

Try birds and corals


In [154]:
test_b = [bird[bird.SVP_ID.isin(np.random.choice(bird.SVP_ID, 6914, replace=True))][b_columns].apply(get_hlu).T.ix['FINAL_SCORE'].per_H\
          for i in range(1000)]

In [155]:
sns.distplot(test_b)


C:\Users\yichuans\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\nonparametric\kdetools.py:20: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  y = X[:m/2+1] + np.r_[0,X[m/2+1:],0]*1j
Out[155]:
<matplotlib.axes._subplots.AxesSubplot at 0x1deba358>

In [156]:
bird_total.ix['FINAL_SCORE'].per_H_wh


Out[156]:
0.19323112525310962

Significant result for birds too


In [157]:
test_c = [coral_unique[coral_unique.Fullname.isin(np.random.choice(coral_unique.Fullname, 724, replace=True))][c_columns].apply(get_hlu).T.ix['FINAL_SCORE'].per_H\
          for i in range(1000)]

In [158]:
sns.distplot(test_c)


C:\Users\yichuans\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\nonparametric\kdetools.py:20: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  y = X[:m/2+1] + np.r_[0,X[m/2+1:],0]*1j
Out[158]:
<matplotlib.axes._subplots.AxesSubplot at 0x20ed1908>

In [159]:
coral_total.ix['FINAL_SCORE'].per_H_wh


Out[159]:
0.15331491712707182

In [160]:
sns.distplot(test_c)


C:\Users\yichuans\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\nonparametric\kdetools.py:20: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  y = X[:m/2+1] + np.r_[0,X[m/2+1:],0]*1j
Out[160]:
<matplotlib.axes._subplots.AxesSubplot at 0x26efcb00>

WH vs all species


In [235]:
# a function to plot all columns using stacked horizontal barchart
def plot_comparison(wh_species, all_species, columns):
    my_colors = ['#b2182b', '#ffffff', '#e0e0e0']
    col = len(columns)
    fig, axes = plt.subplots(col, figsize=(15, 2*col)) # len and heihgt
    
    fig.subplots_adjust(hspace = 0.5)
    
    # get agg count of HLUso
    a = get_agg_result(wh_species, columns)
    b = get_agg_result(all_species, columns)
        
    for i, column in enumerate(columns):
        data = pd.concat([a[['H', 'L', 'U']].loc[column], 
                      b[['H', 'L', 'U']].loc[column]], 1).T
        data.index = ['Species in WH sites', 'All species']
        c = data.plot.barh(ax=axes[i], stacked=True, color=my_colors)
        axes[i].set_xlim([0, all_species.index.size*1.2])
        axes[i].set_title(column)
    
        # annotation
        for bar, y_axis in zip([a,b], range(2)):
            c.annotate('{0:.0f}'.format(bar.loc[column].H) + ', ' + '{0:.0f}'.format(bar.loc[column].H + bar.loc[column].L +bar.loc[column].U) + \
                           ', ' + '{0:.1%}'.format(bar.loc[column].per_H),
                          (bar.loc[column].sum() + 10, y_axis))
    return fig

# a function to plot all columns using stacked horizontal barchart
def plot_comparison_sle(wh_species, all_species, columns = ['FINAL_SCORE', 'SENSITIVITY', 'LOW_ADAPTABILITY', 'EXPOSURE']):
    my_colors = ['#b2182b', '#ffffff', '#e0e0e0']
    col = len(columns)
    fig, axes = plt.subplots(col, figsize=(15, 2*col)) # len and heihgt
    
    fig.subplots_adjust(hspace = 0.5)
    
    # get agg count of HLUso
    a = get_agg_result(wh_species, columns)
    b = get_agg_result(all_species, columns)
        
    for i, column in enumerate(columns):
        data = pd.concat([a[['H', 'L', 'U']].loc[column], 
                      b[['H', 'L', 'U']].loc[column]], 1).T
        data.index = ['Species in WH sites', 'All species']
        c = data.plot.barh(ax=axes[i], stacked=True, color=my_colors)
        axes[i].set_xlim([0, all_species.index.size*1.2])
        axes[i].set_title(column)
        
        # annotation
        for bar, y_axis in zip([a,b], range(2)):
            c.annotate('{0:.0f}'.format(bar.loc[column].H) + ', ' + '{0:.0f}'.format(bar.loc[column].H + bar.loc[column].L +bar.loc[column].U) + \
                           ', ' + '{0:.1%}'.format(bar.loc[column].per_H),
                          (bar.loc[column].sum() + 10, y_axis))

        
    return fig

Detailed comparisons across traits and exposures for amphibians


In [260]:
bird[['FINAL_SCORE', 'SENSITIVITY', 'LOW_ADAPTABILITY', 'EXPOSURE']]


Out[260]:
FINAL_SCORE SENSITIVITY LOW_ADAPTABILITY EXPOSURE
0 H H H H
1 L H L U
2 L H H U
3 L H L U
4 L H U L
5 L H H L
6 H H H H
7 L U L H
8 L H L L
9 L H L L
10 L H L H
11 L H H U
12 L U L U
13 L H H L
14 H H H H
15 H H H H
16 L U L U
17 L H H L
18 L U L U
19 L U L U
20 L U L U
21 H H H H
22 L H L L
23 L H L H
24 H H H H
25 L L H L
26 L L H L
27 H H H H
28 H H H H
29 H H H H
... ... ... ... ...
9826 L H U H
9827 H H H H
9828 H H H H
9829 H H H H
9830 L H H U
9831 H H H H
9832 L H U U
9833 L H H U
9834 L H H U
9835 H H H H
9836 L H L H
9837 L U L L
9838 L U L L
9839 H H H H
9840 H H H H
9841 L H H U
9842 H H H H
9843 L H H U
9844 L H U H
9845 L H U H
9846 L U L L
9847 H H H H
9848 L H H U
9849 L H U H
9850 L H H U
9851 L H U L
9852 H H H H
9853 H H H H
9854 L H U H
9855 H H H H

9856 rows × 4 columns


In [253]:
# amp
plot_comparison(amp_selected, amp, a_columns);



In [268]:
a = plot_comparison_sle(amp_selected, amp);
a.savefig('wh_network_amp.png', dpi=100, bbox_inches='tight')


Detailed comparisons across traits and exposures for birds


In [238]:
# bird
plot_comparison(bird_selected, bird, b_columns);



In [267]:
a = plot_comparison_sle(bird_selected, bird);
a.savefig('wh_network_bird.png', dpi=100, bbox_inches='tight')


[!Insignificant findings] Detailed comparisons across traits and exposures for amphibians


In [240]:
# coral
plot_comparison(coral_selected, coral_unique, c_columns);


WH vs bioWH

It is quite interesting to further examine subsets of WH sites, for example, by regions and by criteria, with a view to answering questions such as a) do biodiversity WH sites have more climate vulnerable species b) do WH sites in asian regions host more CV species?


In [91]:
type(amp_selected.crit)


Out[91]:
pandas.core.series.Series

In [168]:
# try testing regular expression and `pd.series.str.match`
pattern = r'.*x.*' # crit having character `x` indicates a biodiversity WH site
test_matching = pd.concat([amp_selected.crit, amp_selected.crit.str.match(pattern)], 1)

# get a filter for biodi
amp_wh_bio_filter = amp_selected.crit.str.match(pattern)
bird_wh_bio_filter = bird_selected.crit.str.match(pattern)
coral_wh_bio_filter = coral_selected.crit.str.match(pattern)

# reduction in the number of WH sites
print('all WH', 'only bWH', 'non bWH')
print(amp_selected.wdpaid.unique().size, amp_selected[amp_wh_bio_filter].wdpaid.unique().size, amp_selected[~amp_wh_bio_filter].wdpaid.unique().size,)
print(bird_selected.wdpaid.unique().size, bird_selected[bird_wh_bio_filter].wdpaid.unique().size, bird_selected[~bird_wh_bio_filter].wdpaid.unique().size)
print(coral_selected.wdpaid.unique().size, coral_selected[coral_wh_bio_filter].wdpaid.unique().size)


all WH only bWH non bWH
150 116 34
184 147 37
26 23

In [163]:
# do comparison between subset of 
amp_wh_bio = pd.merge(amp_selected[amp_wh_bio_filter][a_columns].apply(get_hlu).T, \
         amp_selected[~amp_wh_bio_filter][a_columns].apply(get_hlu).T, \
         left_index = True, right_index=True, suffixes=('_bio', '_wh'))

bird_wh_bio = pd.merge(bird_selected[bird_wh_bio_filter][b_columns].apply(get_hlu).T, \
         bird_selected[~bird_wh_bio_filter][b_columns].apply(get_hlu).T, \
         left_index = True, right_index=True, suffixes=('_bio', '_wh'))

coral_wh_bio = pd.merge(coral_selected[coral_wh_bio_filter][c_columns].apply(get_hlu).T, \
         coral_selected[c_columns].apply(get_hlu).T, \
         left_index = True, right_index=True, suffixes=('_bio', '_wh'))

In [241]:
def plot_comparison_sle_bio(wh_species, all_species, columns = ['FINAL_SCORE', 'SENSITIVITY', 'LOW_ADAPTABILITY', 'EXPOSURE']):
    """<non-bio>, <bio WH sites>"""
    my_colors = ['#b2182b', '#ffffff', '#e0e0e0']
    col = len(columns)
    fig, axes = plt.subplots(col, figsize=(15, 2*col)) # len and heihgt
    
    fig.subplots_adjust(hspace = 0.5)
    
    # get agg count of HLUso
    a = get_agg_result(wh_species, columns)
    b = get_agg_result(all_species, columns)
        
    for i, column in enumerate(columns):
        data = pd.concat([a[['H', 'L', 'U']].loc[column], 
                      b[['H', 'L', 'U']].loc[column]], 1).T
        data.index = ['Other natural WH sites', 'Biodiversity WH sites']
        c = data.plot.barh(ax=axes[i], stacked=True, color=my_colors)
        axes[i].set_xlim([0, all_species.index.size*1.2])
        axes[i].set_title(column)
        
        # annotation
        for bar, y_axis in zip([a,b], range(2)):
            c.annotate('{0:.0f}'.format(bar.loc[column].H) + ', ' + '{0:.0f}'.format(bar.loc[column].H + bar.loc[column].L +bar.loc[column].U) + \
                           ', ' + '{0:.1%}'.format(bar.loc[column].per_H),
                          (bar.loc[column].sum() + 10, y_axis))

        
    return fig

In [266]:
a = plot_comparison_sle_bio(amp_selected[~amp_wh_bio_filter], amp_selected[amp_wh_bio_filter]);
a.savefig('network_bio_amp.png', dpi=100, bbox_inches='tight')



In [243]:
amp_wh_bio


Out[243]:
H_bio L_bio U_bio per_H_bio H_wh L_wh U_wh per_H_wh
SUSC_A_Habitats 215.0 1664.0 9.0 0.113877 5.0 119.0 1.0 0.040
SUSC_A_aquatic larvae 293.0 1586.0 9.0 0.155191 37.0 87.0 1.0 0.296
SUSC_B_Temperature Range 385.0 1502.0 1.0 0.203919 1.0 124.0 0.0 0.008
SUSC_B_Precipitation Range 423.0 1464.0 1.0 0.224047 56.0 69.0 0.0 0.448
SUSC_C_explosive breeder 151.0 1526.0 211.0 0.079979 11.0 103.0 11.0 0.088
SUSC_D_disease 412.0 1476.0 0.0 0.218220 24.0 101.0 0.0 0.192
SENSITIVITY 1260.0 563.0 65.0 0.667373 80.0 43.0 2.0 0.640
ADAPT_A_barriers 174.0 1548.0 166.0 0.092161 9.0 113.0 3.0 0.072
ADAPT_A_dispersal_distance 484.0 1385.0 19.0 0.256356 17.0 108.0 0.0 0.136
ADAPT_C_Slow_Gen_Turnpover 584.0 532.0 772.0 0.309322 25.0 54.0 46.0 0.200
LOW_ADAPTABILITY 957.0 922.0 9.0 0.506886 40.0 85.0 0.0 0.320
EXP_Sea Level 0.0 1879.0 9.0 0.000000 0.0 124.0 1.0 0.000
EXP_MeanTemperature 273.0 1614.0 1.0 0.144597 42.0 83.0 0.0 0.336
EXP_MeanRainfall 500.0 1387.0 1.0 0.264831 3.0 122.0 0.0 0.024
EXP_AADTemperature 201.0 1686.0 1.0 0.106462 11.0 114.0 0.0 0.088
EXP_AADRainfall 364.0 1523.0 1.0 0.192797 7.0 118.0 0.0 0.056
EXPOSURE 814.0 1070.0 4.0 0.431144 50.0 75.0 0.0 0.400
FINAL_SCORE 291.0 1597.0 0.0 0.154131 10.0 115.0 0.0 0.080

In [264]:
a = plot_comparison_sle_bio(bird_selected[~bird_wh_bio_filter], bird_selected[bird_wh_bio_filter]);
a.savefig('network_bio_bird.png', dpi=100, bbox_inches='tight')



In [245]:
bird_wh_bio


Out[245]:
H_bio L_bio U_bio per_H_bio H_wh L_wh U_wh per_H_wh
FINAL_SCORE 1208.0 4834.0 0.0 0.199934 128.0 744.0 0.0 0.146789
__hab_specialisation 777.0 5259.0 6.0 0.128600 72.0 800.0 0.0 0.082569
__microhabitat 644.0 5398.0 0.0 0.106587 59.0 813.0 0.0 0.067661
__ForestDependence 1419.0 4622.0 1.0 0.234856 108.0 764.0 0.0 0.123853
__TemperatureRange 1062.0 3779.0 1201.0 0.175770 6.0 622.0 244.0 0.006881
__PrecipRange 1046.0 3795.0 1201.0 0.173121 368.0 260.0 244.0 0.422018
__Species dependence 66.0 5976.0 0.0 0.010924 0.0 872.0 0.0 0.000000
__PopnSize 348.0 1446.0 4248.0 0.057597 45.0 348.0 479.0 0.051606
__EffectivePopnSize 493.0 1301.0 4248.0 0.081595 65.0 328.0 479.0 0.074541
SENSITIVITY 3494.0 521.0 2027.0 0.578285 528.0 64.0 280.0 0.605505
__Dispersal distance limited 1084.0 4958.0 0.0 0.179411 96.0 776.0 0.0 0.110092
__Dispersal barriers 269.0 5773.0 0.0 0.044522 45.0 827.0 0.0 0.051606
__Low genetic diversity 31.0 6011.0 0.0 0.005131 10.0 862.0 0.0 0.011468
__clutch size 1657.0 2585.0 1800.0 0.274247 103.0 628.0 141.0 0.118119
__Gen_Length 1512.0 4530.0 0.0 0.250248 218.0 654.0 0.0 0.250000
LOW_ADAPTABILITY 3238.0 1669.0 1135.0 0.535915 338.0 423.0 111.0 0.387615
__EXP_Sea Inundation 91.0 5945.0 6.0 0.015061 9.0 863.0 0.0 0.010321
__EXP_MeanTemperature 967.0 3854.0 1221.0 0.160046 260.0 368.0 244.0 0.298165
__EXP_AADTemperature 892.0 3929.0 1221.0 0.147633 52.0 576.0 244.0 0.059633
__EXP_MeanPrecip 1092.0 3729.0 1221.0 0.180735 88.0 540.0 244.0 0.100917
__EXP_AADPrecip 986.0 3835.0 1221.0 0.163191 84.0 544.0 244.0 0.096330
EXPOSURE 2674.0 2156.0 1212.0 0.442569 379.0 250.0 243.0 0.434633

It appears that subdivision by separating biodiversity sites from all natural sites does not indicate a different picture. The percentage difference is minute.

WH by regions

Check differences according to regions


In [97]:
# load regions
region = pd.read_csv('region.csv')

In [98]:
region.unesco_reg.unique()


Out[98]:
array(['Africa', 'Arab States', 'Asia and the Pacific',
       'Europe and North America', 'Latin America and the Caribbean'], dtype=object)

In [99]:
# convenient variables
regions_list = ['_africa', '_asia', '_euna', '_lac', '_arab']
regions_name_list = ['Africa', 'Asia and the Pacific', 'Europe and North America', 'Latin America and the Caribbean', 'Arab States']

In [100]:
# africa_ids = region[region.unesco_reg == 'Africa'].wdpaid
# asia_ids = region[region.unesco_reg == 'Asia and the Pacific'].wdpaid
# euna_ids = region[region.unesco_reg == 'Europe and North America'].wdpaid
# lac_ids = region[region.unesco_reg == 'Latin America and the Caribbean'].wdpaid
# arab_ids = region[region.unesco_reg == 'Arab States'].wdpaid

# more concise version using list comprehension
africa_ids, asia_ids, euna_ids, lac_ids, arab_ids = [region[region.unesco_reg == region_name].wdpaid for region_name in \
                                                     regions_name_list]

In [101]:
print([len(x) for x in [africa_ids, asia_ids, euna_ids, lac_ids, arab_ids]])


[41, 70, 71, 41, 6]

In [102]:
# get a function that returns the result of aggregates
def get_agg_result_by_wdpaidlist(taxon_result, taxon_columns, filtered_wdpaids):
    # get a unique list of species for sites, based on the filtered wdpaids
    unique_taxon_result = taxon_result[taxon_result.wdpaid.isin(filtered_wdpaids)].groupby('id_no_int').first().reset_index()
    # apply the stangdard aggregation methods
    return unique_taxon_result[taxon_columns].apply(get_hlu).T

In [103]:
# amp in wh sites by region
amp_africa, amp_asia, amp_euna, amp_lac, amp_arab = \
[get_agg_result_by_wdpaidlist(result_amp_f, a_columns, a_region) \
 for a_region in [africa_ids, asia_ids, euna_ids, lac_ids, arab_ids]]
amp_regions = [amp_africa, amp_asia, amp_euna, amp_lac, amp_arab]

# bird in wh sites by region
bird_africa, bird_asia, bird_euna, bird_lac, bird_arab = \
[get_agg_result_by_wdpaidlist(result_bird_f, b_columns, a_region) \
 for a_region in [africa_ids, asia_ids, euna_ids, lac_ids, arab_ids]]
bird_regions = [bird_africa, bird_asia, bird_euna, bird_lac, bird_arab]

# coral in wh sites by region
coral_africa, coral_asia, coral_euna, coral_lac, coral_arab = \
[get_agg_result_by_wdpaidlist(result_coral_f, c_columns, a_region) \
 for a_region in [africa_ids, asia_ids, euna_ids, lac_ids, arab_ids]]
coral_regions = [coral_africa, coral_asia, coral_euna, coral_lac, coral_arab]

In [104]:
# join all the regional tables together
from functools import reduce

# use reduce function to merge any given number of dfs; here i used a tuple in order to get suffix information
# the first element is the df, the second a dummy value used in the reduce process
amp_region_final = reduce(lambda left, right: (pd.merge(left[0], right[0], left_index=True, right_index=True, \
                                              suffixes = ('', right[1])), 0), \
                  zip(amp_regions, regions_list),
                 (amp_selected[a_columns].apply(get_hlu).T, 0))[0]

bird_region_final = reduce(lambda left, right: (pd.merge(left[0], right[0], left_index=True, right_index=True, \
                                              suffixes = ('', right[1])), 0), \
                  zip(bird_regions, regions_list),
                 (bird_selected[b_columns].apply(get_hlu).T, 0))[0]

coral_region_final = reduce(lambda left, right: (pd.merge(left[0], right[0], left_index=True, right_index=True, \
                                              suffixes = ('', right[1])), 0), \
                  zip(coral_regions, regions_list),
                 (coral_selected[c_columns].apply(get_hlu).T, 0))[0]

In [105]:
amp_region_final[['per_H' + each for each in ['', '_africa', '_asia', '_euna', '_lac', '_arab']]]


Out[105]:
per_H per_H_africa per_H_asia per_H_euna per_H_lac per_H_arab
SUSC_A_Habitats 0.109290 0.089524 0.067434 0.068966 0.165975 0.000000
SUSC_A_aquatic larvae 0.163934 0.278095 0.139803 0.183908 0.095436 0.666667
SUSC_B_Temperature Range 0.191754 0.241905 0.069079 0.000000 0.300138 0.000000
SUSC_B_Precipitation Range 0.237953 0.184762 0.231908 0.839080 0.147994 1.000000
SUSC_C_explosive breeder 0.080477 0.160000 0.059211 0.045977 0.052559 0.333333
SUSC_D_disease 0.216592 0.019048 0.185855 0.126437 0.410788 0.000000
SENSITIVITY 0.665673 0.636190 0.483553 0.885057 0.791148 1.000000
ADAPT_A_barriers 0.090909 0.093333 0.144737 0.045977 0.053942 0.166667
ADAPT_A_dispersal_distance 0.248882 0.264762 0.199013 0.143678 0.302905 0.000000
ADAPT_C_Slow_Gen_Turnpover 0.302534 0.346667 0.250000 0.327586 0.302905 0.000000
LOW_ADAPTABILITY 0.495281 0.533333 0.447368 0.442529 0.515906 0.166667
EXP_Sea Level 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
EXP_MeanTemperature 0.156483 0.034286 0.197368 0.189655 0.219917 0.166667
EXP_MeanRainfall 0.249876 0.268571 0.100329 0.028736 0.408022 0.333333
EXP_AADTemperature 0.105315 0.045714 0.088816 0.017241 0.181189 0.000000
EXP_AADRainfall 0.184302 0.100952 0.208882 0.074713 0.246196 0.333333
EXPOSURE 0.429210 0.306667 0.417763 0.270115 0.576763 0.500000
FINAL_SCORE 0.149528 0.091429 0.101974 0.097701 0.246196 0.166667

In [106]:
amp_region_final.loc['FINAL_SCORE']


Out[106]:
H                301.000000
L               1712.000000
U                  0.000000
per_H              0.149528
H_africa          48.000000
L_africa         477.000000
U_africa           0.000000
per_H_africa       0.091429
H_asia            62.000000
L_asia           546.000000
U_asia             0.000000
per_H_asia         0.101974
H_euna            17.000000
L_euna           157.000000
U_euna             0.000000
per_H_euna         0.097701
H_lac            178.000000
L_lac            545.000000
U_lac              0.000000
per_H_lac          0.246196
H_arab             1.000000
L_arab             5.000000
U_arab             0.000000
per_H_arab         0.166667
Name: FINAL_SCORE, dtype: float64

In [107]:
bird_region_final.loc['FINAL_SCORE']


Out[107]:
H               1336.000000
L               5578.000000
U                  0.000000
per_H              0.193231
H_africa         209.000000
L_africa        1410.000000
U_africa           0.000000
per_H_africa       0.129092
H_asia           457.000000
L_asia          2117.000000
U_asia             0.000000
per_H_asia         0.177545
H_euna           212.000000
L_euna           868.000000
U_euna             0.000000
per_H_euna         0.196296
H_lac            703.000000
L_lac           2120.000000
U_lac              0.000000
per_H_lac          0.249026
H_arab            71.000000
L_arab           226.000000
U_arab             0.000000
per_H_arab         0.239057
Name: FINAL_SCORE, dtype: float64

In [108]:
coral_region_final.loc['FINAL_SCORE']


Out[108]:
H               111.000000
L               582.000000
U                31.000000
per_H             0.153315
H_africa         20.000000
L_africa        281.000000
U_africa          4.000000
per_H_africa      0.065574
H_asia           60.000000
L_asia          543.000000
U_asia            5.000000
per_H_asia        0.098684
H_euna           43.000000
L_euna          395.000000
U_euna            4.000000
per_H_euna        0.097285
H_lac            30.000000
L_lac            54.000000
U_lac             1.000000
per_H_lac         0.352941
H_arab           12.000000
L_arab          197.000000
U_arab           22.000000
per_H_arab        0.051948
Name: FINAL_SCORE, dtype: float64

It is very difficult to find patterns in a table full of numbers like the above. Graphs may present a much better view in terms of the underlying differences.


In [109]:
# amp
# a = pd.concat([each[['H', 'L', 'U']][-1:] for each in amp_regions])
## a better way to slice columns and rows, loc is a label based indxing
## .loc returns a series, concatenate on axis 1, transpose and then replace index, so that they'll show up as labels

# get df for all regions, with HLU
a = pd.concat([each[['H', 'L', 'U']].loc['FINAL_SCORE'] for each in amp_regions], 1).T

a.index = regions_name_list
b = a.plot.barh(stacked=True, figsize=(15,5))

# for each bar, annotate num of 'H' and percentage of 'H'
for region_name, region_data, y_axis in zip(regions_name_list, amp_regions, range(5)):
    b.annotate('{0:.0f}'.format(a.loc[region_name].H) + ', ' + '{0:.0f}'.format(a.loc[region_name].sum()) + ', ' + '{0:.2%}'.format(region_data.loc['FINAL_SCORE'].per_H),
              (a.loc[region_name].sum() + 10, y_axis)) # add a buf-space for x



In [110]:
## find amphibians species in arab states WH sites
pd.set_option('display.max_columns', 60)
example = result_amp_f[result_amp_f.wdpaid.isin(arab_ids)]
example[example.FINAL_SCORE=='H']


Out[110]:
SVP_ID SIS_GAA_ID GAA Family Genus Species Fullname Threatened SUSC_A_Habitats SUSC_A_aquatic larvae SUSC_B_Temperature Range SUSC_B_Precipitation Range SUSC_C_explosive breeder SUSC_D_disease SENSITIVITY ADAPT_A_barriers ADAPT_A_dispersal_distance ADAPT_C_Slow_Gen_Turnpover LOW_ADAPTABILITY EXP_Sea Level EXP_MeanTemperature EXP_MeanRainfall EXP_AADTemperature EXP_AADRainfall EXPOSURE FINAL_SCORE Final_Pessimistic Score_HorS SensXLAdaptability SensXExposure LAdaptXExposure Threatened_and_Vulnerable Unnamed: 0 wdpaid id_no areakm2_x areakm2_y per en_name fr_name status_yr rep_area gis_area country crit areakm2 binomial kingdom_name phylum_name class_name order_name family_name genus_name species_name category biome_marine biome_freshwater biome_terrestrial id_no_int
1132 3152 6187 ALYTIDAE Discoglossus pictus Discoglossus pictus NaN L L L H L L H H L L H L L H L H H H H H H NaN NaN NaN 126160 4322 55270.0 124.424076 124.424076 1.0 Ichkeul National Park Parc national de l'Ichkeul 1980 126.0 124.424076 TUN (x) 124.424076 Discoglossus pictus ANIMALIA CHORDATA AMPHIBIA ANURA ALYTIDAE Discoglossus pictus LC f t t 55270.0

Latin America and the Caribbean region contains the highest number of amphibians that are climate vulnerable (178, out of a total of 723, 24.62%), compared to about 10% for WH sites in other regions. Notably on the other end of the spectrum, WH sites in the Arab States region host very few ampbibans and only one of them is climate vulnerable (Discoglossus pictus in Ichkeul National Park, based on the IUCN Red List data).


In [111]:
# bird
a = pd.concat([each[['H', 'L', 'U']].loc['FINAL_SCORE'] for each in bird_regions], 1).T
a.index = regions_name_list
b = a.plot.barh(stacked=True, figsize=(15,5))
# # testing annotation, note y axis is from bottom up. unit based on data

# for each bar, annotate num of 'H' and percentage of 'H'
for region_name, region_data, y_axis in zip(regions_name_list, bird_regions, range(5)):
    b.annotate('{0:.0f}'.format(a.loc[region_name].H) + ', ' + '{0:.2%}'.format(region_data.loc['FINAL_SCORE'].per_H),
              (a.loc[region_name].sum() + 10, y_axis)) # add a buf-space for x


Similarly from the point of view of climate vulnerable birds in global WH network, Latin America and the Caribbean region has the largest number of birds, and a quarter of them are climate vulnerable, compared to other regions.


In [112]:
# coral
a = pd.concat([each[['H', 'L', 'U']].loc['FINAL_SCORE'] for each in coral_regions], 1).T
a.index = regions_name_list
b = a.plot.barh(stacked=True, figsize=(15,5))

# for each bar, annotate num of 'H' and percentage of 'H'
for region_name, region_data, y_axis in zip(regions_name_list, coral_regions, range(5)):
    b.annotate('{0:.0f}'.format(a.loc[region_name].H) + ', ' + '{0:.2%}'.format(region_data.loc['FINAL_SCORE'].per_H),
              (a.loc[region_name].sum() + 10, y_axis)) # add a buf-space for x


The above regional comparisons across amphibians, birds and corals indicate a large variation in the number and proportion of cimate vulnerable species in the World Heritage network.

More visualisations on the proportions beyond just the final vulnerability score...

Need to apply the same methodology across all attributes, i.e. sensitivity, low-adaptability and each of the traits...


In [113]:
# enumerate to get index number, which is needed in the plot
for i, column in enumerate(a_columns):
    print(i, column)


0 SUSC_A_Habitats
1 SUSC_A_aquatic larvae
2 SUSC_B_Temperature Range
3 SUSC_B_Precipitation Range
4 SUSC_C_explosive breeder
5 SUSC_D_disease
6 SENSITIVITY
7 ADAPT_A_barriers
8 ADAPT_A_dispersal_distance
9 ADAPT_C_Slow_Gen_Turnpover
10 LOW_ADAPTABILITY
11 EXP_Sea Level
12 EXP_MeanTemperature
13 EXP_MeanRainfall
14 EXP_AADTemperature
15 EXP_AADRainfall
16 EXPOSURE
17 FINAL_SCORE

In [246]:
# a function to plot all columns using stacked horizontal barchart
def plot_all_attributes(taxon_regions, region_namelist, columns):
    my_colors = ['#b2182b', '#ffffff', '#e0e0e0']
    
    # create a figure with correct number of axes 
    col = len(columns)
    fig, axes = plt.subplots(nrows = col, figsize=(15, 5*col)) # width and heihgt
    fig.subplots_adjust(hspace = 0.3)
    
    # for each exes, plot a column result for all regions, for comparisons
    for i, column in enumerate(columns):
        # a df of all regions HLUs
        a = pd.concat([each[['H', 'L', 'U']].loc[column] for each in taxon_regions], 1).T
        a.index = region_namelist
        
        # so it also works for col=1. If col=1, axes is not an array and thus does not support indexing
        if col>1:
            b = a.plot.barh(ax=axes[i], stacked=True, color=my_colors)
            axes[i].set_title(column)
        else:
            b = a.plot.barh(ax=axes, stacked=True, color=my_colors)
            axes.set_title(column)            
    
        # add num of H, total num and percentage as annotation
        for region_name, region_data, y_axis in zip(region_namelist, taxon_regions, range(len(region_namelist))):
            b.annotate('{0:.0f}'.format(a.loc[region_name].H) + ', ' + '{0:.0f}'.format(a.loc[region_name].H + a.loc[region_name].L + a.loc[region_name].U) + \
                       ', ' + '{0:.2%}'.format(region_data.loc[column].per_H),
                      (a.loc[region_name].sum() + 10, y_axis))
    
    return fig

In [263]:
# need to add `;` to suppress the automatic behaviour that plots the same graph twice
a = plot_all_attributes(amp_regions, regions_name_list, ['FINAL_SCORE', 'SENSITIVITY', 'LOW_ADAPTABILITY', 'EXPOSURE']);
a.savefig('region_amp.png', dpi=100, bbox_inches='tight')


Detailed regional comparisons across traits and exposures for amphibians


In [248]:
# need to add `;` to suppress the automatic behaviour that plots the same graph twice
plot_all_attributes(amp_regions, regions_name_list, a_columns);



In [249]:
# test for single column and it also works now
plot_all_attributes(amp_regions, regions_name_list, ['FINAL_SCORE']);



In [269]:
a = plot_all_attributes(bird_regions, regions_name_list, ['FINAL_SCORE', 'SENSITIVITY', 'LOW_ADAPTABILITY', 'EXPOSURE']);
a.savefig('region_bird.png', dpi=100, bbox_inches='tight')


Detailed regional comparisons across traits and exposures for birds


In [251]:
plot_all_attributes(bird_regions, regions_name_list, b_columns);



In [252]:
plot_all_attributes(coral_regions, regions_name_list, c_columns);


Sensitivity seems rather weird - all species across all regions have sensitivity high?


In [119]:
(coral_unique.SENSITIVITY == 'H').sum(), coral_unique.index.size


Out[119]:
(796, 797)

ranking to determine whether a site should be High or Low in climate vulnerability

The top 25% of WH sites both in terms of percentage and total number of highly vulnerable species for a given taxon are labelled as 'highly vulnerable' for that taxon.


In [120]:
# theshold value according to the number of H species
thres_total = [np.percentile(each[each.level_2 == 'FINAL_SCORE'].H, 75) for each in [amp_vv, bird_vv, coral_vv]]

In [121]:
# treshold value according to the percentage of H in relation to total HLU
thres_per = [np.percentile(each[each.level_2 == 'FINAL_SCORE'].per_H, 75) for each in [amp_vv, bird_vv, coral_vv]]

In [122]:
thres_per, thres_total


Out[122]:
([0.14285714285714285, 0.28225806451612906, 0.14732142857142858],
 [1.0, 47.0, 27.0])

In [123]:
# the number of sites as 'highly vulnerable' for amp
((amp_vv[amp_vv.level_2=='FINAL_SCORE'].H > thres_total[0]) & (amp_vv[amp_vv.level_2=='FINAL_SCORE'].per_H > thres_per[0])).sum()


Out[123]:
30

In [124]:
(amp_vv[amp_vv.level_2=='FINAL_SCORE'].H > thres_total[0]).sum(), (amp_vv[amp_vv.level_2=='FINAL_SCORE'].per_H > thres_per[0]).sum()


Out[124]:
(42, 47)

In [125]:
# amp, bird, coral number of cv sites
[((each[each.level_2=='FINAL_SCORE'].H > thres_total[i]) & (each[each.level_2=='FINAL_SCORE'].per_H > thres_per[i])).sum() for i, each in enumerate([amp_vv, bird_vv, coral_vv])]


Out[125]:
[30, 28, 1]

In [ ]: