Wilderness World Heritage analysis for the marine environment


Based on the discussion with Bastian.

The spatial analysis was done outside of this notebook. In a nutshell, the spatial component dealt with the question of how much of cumulative marine pressure there is in each unit (see below for such a hypothetical biogeographic classification). The analysis was carried out in such a way that the aggregation happens in the later stage and if thresholds are to be changed (very likely due to the explorative nature of such exercise), it requires minimum efforts without having to re-run any spatial analysis, which are time-consuming and prone to error.


In [1]:
# load default libraries
import os, sys
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# make sure gdal is correctly installed
from osgeo import gdal
import gc

%matplotlib inline


C:\Users\yichuans\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\computation\__init__.py:19: UserWarning: The installed version of numexpr 2.4.4 is not supported in pandas and will be not be used

  UserWarning)

Get quantiles from the input raster data

It is necessary to load the original raster in order to calculateits quantiles.


In [2]:
def raster2array(rasterfn):
    raster = gdal.Open(rasterfn)
    band = raster.GetRasterBand(1)
    return band.ReadAsArray()

In [3]:
g_array = raster2array('global_cumul_impact_2013_all_layers.tif')

In [4]:
g_array_f = g_array.flatten()

In [5]:
print('The total number of non-zero values in the raw raster dataset:', g_array_f.size - (g_array_f==0).sum())

## in fact the following should be used for testing equality of float dtypes. Because the result remains\
## the same thus the simpler option is used.

## (np.isclose(g_array_f, 0.0)).sum()


The total number of non-zero values in the raw raster dataset: 414635567

The number of non-zero values is notably different from esri's calculation, which stands at 414,347,791, less than what's calculated here and is 300,000 fewer zeros. This suggests esri may be using a bigger tolerence value, i.e. what is considered small enough to be regarded as zero .

Now, get the quantiles... this threshold is subject to change. For the time being, arbitary values of 1%, 3%, 5% and 10% are used.


In [6]:
## the percentile function applied to the sliced array, i.e., those with values greater than 0
quantiles = [np.percentile(g_array_f[~(g_array_f == 0)], quantile) for quantile in [1,3,5,10]]

In [7]:
quantiles


Out[7]:
[0.67987793684005737,
 1.2613298869132996,
 1.5064566135406494,
 1.8049463033676147]

Analyse intersection result

The hypothetical biogeographical classification of the marine environmental within EEZ is described as a combination of MEOW (Marine Ecoregional of the World), its visual representation (called hereafter MEOW visual) up to 200 nautical miles and the World's pelagic provinces. The spatial data was prepared in a way such that from the coastline outwards disjoint polygons represents: MEOW (up to 200 meter depth, inner/red), MEOW visual overlaps with pelagic provinces (middle/green), pelagic provinces that do not overlap with MEOW visual (outer/blue). This is purely a spatial aggregation based on the above data and the World Vector Shoreline EEZ. See below for example.

Load the input_data table, which describes the intersection between the marine pressure layer and the marine ecoregion/pelagic provinces classification. The input_attr table contains information on the relationship between OBJECTID and each raster pixel value.

  • OBJECTID (one) - pixel value (many)
  • OBJECTID (many) - attr: Province, Ecoregion, and Realm, categories (one) Each pixel is of height and width: 934.478 meter, making each pixel in area 0.873 $km^2$

In [8]:
# calculate cell-size in sqkm2
cell_size = 934.478*934.478/1000000
print(cell_size)


0.8732491324839999

In [9]:
# the OBJECTID - ras_val table
input_data = pd.read_csv('result.csv')

In [10]:
# the attribute table containing information about province etc
input_attr = pd.read_csv('attr.csv')

In [11]:
print('\n'.join(['Threshold cut-off value: '+ str(threshold) for threshold in quantiles]))


Threshold cut-off value: 0.67987793684
Threshold cut-off value: 1.26132988691
Threshold cut-off value: 1.50645661354
Threshold cut-off value: 1.80494630337

In [12]:
# total count of pixels per OBJECTID
result_count = input_data.groupby('OBJECTID').count().reset_index()

Here I created four result tables containing only pixels that meet the criteria as specified by different thresholds


In [13]:
# filter result only in the top 1, 3, 5, 10 percentile (of least impacted marine areas)
result_1, result_3, result_5, result_10 = \
[input_data[input_data.ras_val <= threshold].groupby('OBJECTID').count().reset_index() for threshold in quantiles]

The next step will be to apply the categorisations in the input_attr table. Replace result10 result table if other threshold is used


In [14]:
# join base to the attribute
attr_merge = pd.merge(input_attr, result_count, on = 'OBJECTID')

In [15]:
# join result to the above table
attr_merge_10 = pd.merge(attr_merge, result_10, how = 'left', on ='OBJECTID', suffixes = ('_base', '_result'))

In [16]:
# fill ras_val_result's NaN with 0, province and realms with None. This should happen earlier
attr_merge_10['ras_val_result'].fillna(0, inplace=True)
attr_merge_10['PROVINCE'].fillna('None', inplace=True)
attr_merge_10['PROVINCE_P'].fillna('None', inplace=True)

In [17]:
# apply an aggregate function to each sub dataframe
def apply_func(group):
    overlap = group['ras_val_result'].sum()*cell_size # in sqkm
    base = group['ras_val_base'].sum()*cell_size
    per = overlap/base
    # can have multiple columns as a result, if returened as pd.series
    return pd.Series([overlap, per, base], index=['less_than_threshold', 'per_ltt', 'base'])

In [18]:
# final dataframe
result_agg_10 = attr_merge_10.groupby(['PROVINCE', 'PROVINCE_P', 'category']).apply(apply_func).reset_index()
print(result_agg_10.head(20))


             PROVINCE                  PROVINCE_P        category  \
0             Agulhas             Agulhas Current  pelagic_meow_v   
1             Agulhas            Benguela Current  pelagic_meow_v   
2             Agulhas                        None        meow200m   
3             Agulhas      South Central Atlantic  pelagic_meow_v   
4   Amsterdam-St Paul                        None        meow200m   
5   Amsterdam-St Paul       Southern Indian Ocean  pelagic_meow_v   
6   Amsterdam-St Paul  Southern Subtropical Front  pelagic_meow_v   
7             Andaman                        None        meow200m   
8             Andaman       Northern Indian Ocean  pelagic_meow_v   
9              Arctic                      Arctic  pelagic_meow_v   
10             Arctic                        None        meow200m   
11             Arctic          Subarctic Atlantic  pelagic_meow_v   
12             Arctic           Subarctic Pacific  pelagic_meow_v   
13      Bay of Bengal                        None        meow200m   
14      Bay of Bengal       Northern Indian Ocean  pelagic_meow_v   
15           Benguela            Benguela Current  pelagic_meow_v   
16           Benguela         Equatorial Atlantic  pelagic_meow_v   
17           Benguela                        None        meow200m   
18           Benguela      South Central Atlantic  pelagic_meow_v   
19          Black Sea                   Black Sea  pelagic_meow_v   

    less_than_threshold   per_ltt          base  
0          4.366246e+00  0.000008  5.390357e+05  
1          0.000000e+00  0.000000  2.824088e+04  
2          1.244380e+03  0.010160  1.224828e+05  
3          0.000000e+00  0.000000  6.505706e+02  
4          3.999481e+02  0.442512  9.038129e+02  
5          6.390088e+04  0.167804  3.808065e+05  
6          4.591544e+03  0.328358  1.398334e+04  
7          2.069600e+03  0.006710  3.084360e+05  
8          6.841034e+03  0.004711  1.452095e+06  
9          1.989650e+06  0.573653  3.468383e+06  
10         3.042845e+06  0.437750  6.951103e+06  
11         2.476709e+04  0.023379  1.059357e+06  
12         1.109026e+03  0.013927  7.963072e+04  
13         1.846922e+03  0.006514  2.835335e+05  
14         0.000000e+00  0.000000  6.244395e+05  
15         1.630356e+03  0.002477  6.583294e+05  
16         0.000000e+00  0.000000  1.454134e+04  
17         4.329569e+03  0.026932  1.607599e+05  
18         0.000000e+00  0.000000  8.828549e+02  
19         0.000000e+00  0.000000  2.921892e+05  

Further aggregation could be applied here.


Visualisation and exploring results


In [19]:
import seaborn as sns

In [20]:
g = sns.FacetGrid(result_agg_10, col="category")
g.map(plt.hist, 'per_ltt', bins=50, log=True)


Out[20]:
<seaborn.axisgrid.FacetGrid at 0xe09f400>

In [21]:
# MEOW province (200m and 200 nautical combined)
result_agg_10_province = attr_merge_10.groupby(['PROVINCE']).apply(apply_func).reset_index()
sns.distplot(result_agg_10_province.per_ltt)


Out[21]:
<matplotlib.axes._subplots.AxesSubplot at 0xeeb67b8>

In [22]:
result_agg_10_province.sort_values('per_ltt', ascending=False).head(20)


Out[22]:
PROVINCE less_than_threshold per_ltt base
12 Continental High Antarctic 2.803854e+06 0.570354 4.915992e+06
46 Subantarctic New Zealand 4.179728e+05 0.538576 7.760705e+05
45 Subantarctic Islands 9.810474e+05 0.478338 2.050950e+06
3 Arctic 5.058372e+06 0.437633 1.155847e+07
36 Scotia Sea 6.042116e+05 0.325349 1.857116e+06
25 Marquesas 2.148245e+05 0.289451 7.421788e+05
42 Southern New Zealand 3.443431e+05 0.238514 1.443703e+06
28 None 3.091177e+06 0.209077 1.478489e+07
1 Amsterdam-St Paul 6.889237e+04 0.174105 3.956936e+05
8 Central Polynesia 5.474678e+05 0.114987 4.761110e+06
35 Sahul Shelf 1.502678e+05 0.108221 1.388526e+06
30 Northeast Australian Shelf 4.158587e+04 0.105917 3.926277e+05
29 North Brazil Shelf 7.344986e+04 0.073539 9.987900e+05
24 Magellanic 1.149755e+05 0.060529 1.899525e+06
34 Red Sea and Gulf of Aden 5.466889e+04 0.054509 1.002934e+06
55 Warm Temperate Northwest Atlantic 4.769949e+04 0.050981 9.356349e+05
41 Southeast Polynesia 2.738265e+05 0.047491 5.765861e+06
16 Galapagos 3.363057e+04 0.040044 8.398404e+05
58 Warm Temperate Southwestern Atlantic 4.109510e+04 0.039821 1.031990e+06
49 Tropical East Pacific 1.269870e+05 0.036277 3.500438e+06

In [23]:
# pelagic province
result_agg_10_pelagic = attr_merge_10.groupby(['PROVINCE_P']).apply(apply_func).reset_index()
sns.distplot(result_agg_10_pelagic.per_ltt)


Out[23]:
<matplotlib.axes._subplots.AxesSubplot at 0xecbfa90>

In [24]:
result_agg_10_pelagic.sort_values('per_ltt', ascending=False).head(20)


Out[24]:
PROVINCE_P less_than_threshold per_ltt base
3 Arctic 2.733379e+06 0.549666 4.972797e+06
1 Antarctic 4.841607e+06 0.493214 9.816438e+06
2 Antarctic Polar Front 9.189585e+05 0.408765 2.248135e+06
35 Subantarctic 7.000664e+05 0.325244 2.152437e+06
20 None 4.515665e+06 0.159273 2.835179e+07
33 Southern Subtropical Front 4.747445e+05 0.131447 3.611677e+06
30 South Central Pacific 8.924580e+05 0.051248 1.741448e+07
10 Equatorial Pacific 2.230339e+05 0.043027 5.183571e+06
8 Eastern Tropical Pacific 1.470141e+05 0.036281 4.052131e+06
32 Southern Indian Ocean 6.851251e+04 0.028446 2.408515e+06
37 Subarctic Pacific 9.883259e+04 0.021963 4.499939e+06
18 Malvinas Current 5.910150e+03 0.018438 3.205383e+05
36 Subarctic Atlantic 2.476709e+04 0.009721 2.547889e+06
13 Humboldt Current 1.832251e+04 0.008549 2.143314e+06
4 Benguela Current 1.630356e+03 0.002149 7.586754e+05
23 North Central Pacific 2.097020e+04 0.001788 1.172530e+07
6 California Current 2.119376e+03 0.001672 1.267364e+06
14 Indonesian Through-Flow 3.537532e+03 0.001118 3.165450e+06
26 Red Sea 2.314110e+02 0.001006 2.299606e+05
25 Northern Indian Ocean 8.208542e+03 0.000937 8.761430e+06

In [25]:
# MEOW province (200m only)
result_agg_10_m200 = attr_merge_10[attr_merge_10.category == 'meow200m'].groupby(['PROVINCE']).apply(apply_func).reset_index()
sns.distplot(result_agg_10_m200.per_ltt)


Out[25]:
<matplotlib.axes._subplots.AxesSubplot at 0xedeaf28>

In [26]:
result_agg_10_m200.sort_values('per_ltt', ascending = False).head(20)


Out[26]:
PROVINCE less_than_threshold per_ltt base
12 Continental High Antarctic 3.343252e+05 0.829469 4.030595e+05
24 Marquesas 2.958568e+03 0.651538 4.540895e+03
43 Subantarctic Islands 4.628220e+04 0.503764 9.187279e+04
1 Amsterdam-St Paul 3.999481e+02 0.442512 9.038129e+02
3 Arctic 3.042845e+06 0.437750 6.951103e+06
34 Scotia Sea 7.053670e+04 0.437160 1.613520e+05
46 Tristan Gough 4.104271e+02 0.223172 1.839063e+03
44 Subantarctic New Zealand 7.815580e+03 0.215398 3.628437e+04
40 Southern New Zealand 5.081087e+04 0.212072 2.395925e+05
32 Red Sea and Gulf of Aden 5.443748e+04 0.192227 2.831938e+05
8 Central Polynesia 2.491380e+03 0.153958 1.618218e+04
27 North Brazil Shelf 7.344986e+04 0.148297 4.952877e+05
28 Northeast Australian Shelf 3.956692e+04 0.135789 2.913849e+05
39 Southeast Polynesia 6.000968e+03 0.130460 4.599840e+04
53 Warm Temperate Northwest Atlantic 4.769949e+04 0.129744 3.676423e+05
9 Cold Temperate Northeast Pacific 6.809335e+04 0.124519 5.468513e+05
48 Tropical Northwestern Atlantic 1.243996e+05 0.123876 1.004228e+06
7 Central Indian Ocean Islands 9.148158e+03 0.116027 7.884479e+04
33 Sahul Shelf 1.498216e+05 0.113811 1.316408e+06
35 Somali/Arabian 3.821164e+04 0.098220 3.890430e+05

New threshold for within EEZ


In [52]:
input_data.OBJECTID.unique().size


Out[52]:
627

In [42]:
# no zeros in the result data
input_data.ras_val.size


Out[42]:
170445372

In [43]:
input_data.ras_val.min()


Out[43]:
2.9654e-06

In [49]:
# percentage of EEZ water in relation to the entire ocean
input_data.ras_val.size/g_array_f[~(g_array_f==0)].size


Out[49]:
0.41107272401453204

In [38]:
# all input_data are non-zero (zero indicates land and no_data)
input_data[~(input_data.ras_val == 0)].ras_val.count() == input_data.ras_val.count()


Out[38]:
True

In [32]:
# get 10% threshold value
np.percentile(input_data.ras_val, 10)


Out[32]:
1.7644400000000002

In [50]:
sns.distplot(g_array_f[~(g_array_f==0)])


Out[50]:
<matplotlib.axes._subplots.AxesSubplot at 0xd4a5438>

In [51]:
sns.distplot(input_data.ras_val)


Out[51]:
<matplotlib.axes._subplots.AxesSubplot at 0xd4a1a90>

It appears both the high seas and EEZ waters share similar characteristics in terms of marine cumulative pressure.