0. Standard setup for logging and plotting inside a notebook


In [1]:
import logging
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
root = logging.getLogger()
root.addHandler(logging.StreamHandler())
%matplotlib inline

1. Choose a representative species for a case study


In [2]:
# download from Google Drive: https://drive.google.com/open?id=0B9cazFzBtPuCOFNiUHYwcVFVODQ
# Representative example with multiple polygons in the shapefile, and a lot of point-records (also outside rangemaps)
from iSDM.species import IUCNSpecies
salmo_trutta = IUCNSpecies(name_species='Salmo trutta')
salmo_trutta.load_shapefile("../data/fish/selection/salmo_trutta")


Enabled Shapely speedups for performance.
Loading data from: ../data/fish/selection/salmo_trutta
The shapefile contains data on 3 species areas.

2. Rasterize the species, to get a matrix of pixels


In [3]:
rasterized = salmo_trutta.rasterize(raster_file="./salmo_trutta_full.tif", pixel_size=0.5, all_touched=True)


RASTERIO: Data rasterized into file ./salmo_trutta_full.tif 
RASTERIO: Resolution: x_res=720 y_res=360

2.1 Plot to get an idea


In [4]:
plt.figure(figsize=(25,20))
plt.imshow(rasterized, cmap="hot", interpolation="none")


Out[4]:
<matplotlib.image.AxesImage at 0x7f886ede7f28>

3. Load the biogeographical regons raster layer


In [5]:
from iSDM.environment import RasterEnvironmentalLayer
biomes_adf = RasterEnvironmentalLayer(file_path="../data/rebioms/w001001.adf", name_layer="Biomes")
biomes_adf.load_data()


Loaded raster data from ../data/rebioms/w001001.adf 
Driver name: AIG 
Metadata: {'affine': Affine(0.5, 0.0, -180.0,
       0.0, -0.5, 90.0),
 'count': 1,
 'crs': {'init': 'epsg:4326'},
 'driver': 'AIG',
 'dtype': 'uint8',
 'height': 360,
 'nodata': 255.0,
 'transform': (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5),
 'width': 720} 
Resolution: x_res=720 y_res=360.
Bounds: BoundingBox(left=-180.0, bottom=-90.0, right=180.0, top=90.0) 
Coordinate reference system: {'init': 'epsg:4326'} 
Affine transformation: (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5) 
Number of layers: 1 
Dataset loaded. Use .read() or .read_masks() to access the layers.
Out[5]:
<open RasterReader name='../data/rebioms/w001001.adf' mode='r'>

3.1 Plot to get an idea


In [6]:
biomes_adf.plot()


3.2 Load the continents vector layer (for further clipping of pseudo-absence area), rasterize


In [7]:
from iSDM.environment import ContinentsLayer
from iSDM.environment import Source
continents = ContinentsLayer(file_path="../data/continents/continent.shp", source=Source.ARCGIS)
continents.load_data()
fig, ax = plt.subplots(1,1, figsize=(30,20))
continents.data_full.plot(column="continent", colormap="hsv")


Loading data from ../data/continents/continent.shp 
The shapefile contains data on 8 environmental regions.
Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f886d13cda0>

In [8]:
continents_rasters = continents.rasterize(raster_file="../data/continents/continents_raster.tif", pixel_size=0.5, all_touched=True)


Will rasterize continent-by-continent.
Rasterizing continent Asia 
Rasterizing continent North America 
Rasterizing continent Europe 
Rasterizing continent Africa 
Rasterizing continent South America 
Rasterizing continent Oceania 
Rasterizing continent Australia 
Rasterizing continent Antarctica 
RASTERIO: Data rasterized into file ../data/continents/continents_raster.tif 
RASTERIO: Resolution: x_res=720 y_res=360

In [9]:
continents_rasters.shape # stacked raster with 8 bands, one for each continent.


Out[9]:
(8, 360, 720)

4. Sample pseudo-absence pixels, taking into account all the distinct biomes that fall in the species region.


In [10]:
selected_layers, pseudo_absences = biomes_adf.sample_pseudo_absences(species_raster_data=rasterized, continents_raster_data=continents_rasters, number_of_pseudopoints=1000)


Succesfully loaded existing raster data from ../data/rebioms/w001001.adf.
Will use the continents/biogeographic raster data for further clipping of the pseudo-absence regions. 
Sampling 1000 pseudo-absence points from environmental layer.
The following unique (pixel) values will be taken into account for sampling pseudo-absences
[ 8  9 10 11 12 13 14 15 17 21]
There are 17842 pixels to sample from...
Filling 1000 random pixel positions...
Sampled 975 unique pixels as pseudo-absences.

4.1 Plot the biomes taken into account for sampling pseudo-absences, to get an idea


In [11]:
plt.figure(figsize=(25,20))
plt.imshow(selected_layers, cmap="hot", interpolation="none")


Out[11]:
<matplotlib.image.AxesImage at 0x7f8868562978>

4.2 Plot the sampled pseudo-absences, to get an idea


In [12]:
plt.figure(figsize=(25,20))
plt.imshow(pseudo_absences, cmap="hot", interpolation="none")


Out[12]:
<matplotlib.image.AxesImage at 0x7f886838aa58>

5. Construct a convenient dataframe for testing with different SDM models

For the Example 2 datasheet, all cells of a global raster map are needed, one pixel per row.

5.1 Get arrays of coordinates (latitude/longitude) for each cell (middle point) in a "base" (zeroes) raster map.


In [13]:
all_coordinates = biomes_adf.pixel_to_world_coordinates(raster_data=np.zeros_like(rasterized), filter_no_data_value=False)


Transforming to world coordinates...
Affine transformation T0:
 |-0.50, 0.00, 90.00|
| 0.00, 0.50,-180.00|
| 0.00, 0.00, 1.00| 
Raster data shape: (360, 720) 
Affine transformation T1:
 |-0.50, 0.00, 89.75|
| 0.00, 0.50,-179.75|
| 0.00, 0.00, 1.00| 
Not filtering any no_data pixels.
Transformation to world coordinates completed.

In [14]:
all_coordinates


Out[14]:
(array([ 89.75,  89.75,  89.75, ..., -89.75, -89.75, -89.75]),
 array([-179.75, -179.25, -178.75, ...,  178.75,  179.25,  179.75]))

In [19]:
base_dataframe = pd.DataFrame([all_coordinates[0], all_coordinates[1]]).T
base_dataframe.columns=['decimallatitude', 'decimallongitude']

In [20]:
base_dataframe.set_index(['decimallatitude', 'decimallongitude'], inplace=True, drop=True)

In [22]:
base_dataframe.head()


Out[22]:
decimallatitude decimallongitude
89.75 -179.75
-179.25
-178.75
-178.25
-177.75

In [23]:
base_dataframe.tail()


Out[23]:
decimallatitude decimallongitude
-89.75 177.75
178.25
178.75
179.25
179.75

5.2 Get arrays of coordinates (latitude/longitude) for each cell (middle point) in a presences pixel map


In [24]:
presence_coordinates = salmo_trutta.pixel_to_world_coordinates()


No raster data provided, attempting to load default...
Loaded raster data from ./salmo_trutta_full.tif 
Driver name: GTiff 
Metadata: {'affine': Affine(0.5, 0.0, -180.0,
       0.0, -0.5, 90.0),
 'count': 1,
 'crs': {'init': 'epsg:4326'},
 'driver': 'GTiff',
 'dtype': 'uint8',
 'height': 360,
 'nodata': 0.0,
 'transform': (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5),
 'width': 720} 
Resolution: x_res=720 y_res=360.
Bounds: BoundingBox(left=-180.0, bottom=-90.0, right=180.0, top=90.0) 
Coordinate reference system: {'init': 'epsg:4326'} 
Affine transformation: (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5) 
Number of layers: 1 
Dataset loaded. Use .read() or .read_masks() to access the layers.
Succesfully loaded existing raster data from ./salmo_trutta_full.tif.
Affine transformation T0:
 |-0.50, 0.00, 90.00|
| 0.00, 0.50,-180.00|
| 0.00, 0.00, 1.00| 
Raster data shape: (360, 720) 
Affine transformation T1:
 |-0.50, 0.00, 89.75|
| 0.00, 0.50,-179.75|
| 0.00, 0.00, 1.00| 
Filtering out no_data pixels.

In [25]:
presence_coordinates


Out[25]:
(array([ 71.25,  71.25,  71.25, ...,  30.75,  30.25,  30.25]),
 array([ 23.75,  24.25,  24.75, ...,  79.25,  78.25,  78.75]))

In [26]:
presences_dataframe = pd.DataFrame([presence_coordinates[0], presence_coordinates[1]]).T
presences_dataframe.columns=['decimallatitude', 'decimallongitude']
presences_dataframe[salmo_trutta.name_species] = 1 # fill presences with 1's
presences_dataframe.set_index(['decimallatitude', 'decimallongitude'], inplace=True, drop=True)

In [27]:
presences_dataframe.head()


Out[27]:
Salmo trutta
decimallatitude decimallongitude
71.25 23.75 1
24.25 1
24.75 1
25.25 1
25.75 1

In [28]:
presences_dataframe.tail()


Out[28]:
Salmo trutta
decimallatitude decimallongitude
30.75 78.25 1
78.75 1
79.25 1
30.25 78.25 1
78.75 1

5.3 Get arrays of coordinates (latitude/longitude) for each cell (middle point) in a pseudo_absences pixel map


In [29]:
pseudo_absence_coordinates = biomes_adf.pixel_to_world_coordinates(raster_data=pseudo_absences)


Transforming to world coordinates...
Affine transformation T0:
 |-0.50, 0.00, 90.00|
| 0.00, 0.50,-180.00|
| 0.00, 0.00, 1.00| 
Raster data shape: (360, 720) 
Affine transformation T1:
 |-0.50, 0.00, 89.75|
| 0.00, 0.50,-179.75|
| 0.00, 0.00, 1.00| 
Filtering out no_data pixels.
Transformation to world coordinates completed.

In [30]:
pseudo_absences_dataframe = pd.DataFrame([pseudo_absence_coordinates[0], pseudo_absence_coordinates[1]]).T
pseudo_absences_dataframe.columns=['decimallatitude', 'decimallongitude']
pseudo_absences_dataframe[salmo_trutta.name_species] = 0
pseudo_absences_dataframe.set_index(['decimallatitude', 'decimallongitude'], inplace=True, drop=True)

In [31]:
pseudo_absences_dataframe.head()


Out[31]:
Salmo trutta
decimallatitude decimallongitude
79.25 95.75 0
77.75 91.75 0
76.75 105.25 0
76.25 66.75 0
93.25 0

In [32]:
pseudo_absences_dataframe.tail()


Out[32]:
Salmo trutta
decimallatitude decimallongitude
15.25 44.75 0
14.75 76.25 0
79.25 0
14.25 44.25 0
45.75 0

5.4 Get arrays of coordinates (latitude/longitude) for each cell (middle point) in a minimum temperature pixel map


In [33]:
from iSDM.environment import ClimateLayer
water_min_layer =  ClimateLayer(file_path="../data/watertemp/min_wt_2000.tif") 
water_min_reader = water_min_layer.load_data()
# HERE: should we ignore cells with no-data values for temperature? They are set to a really big negative number
# for now we keep them, otherwise could be NaN
water_min_coordinates = water_min_layer.pixel_to_world_coordinates(filter_no_data_value=False)


Loaded raster data from ../data/watertemp/min_wt_2000.tif 
Driver name: GTiff 
Metadata: {'affine': Affine(0.5, 0.0, -180.0,
       0.0, -0.5, 90.0),
 'count': 1,
 'crs': {'init': 'epsg:4326'},
 'driver': 'GTiff',
 'dtype': 'float32',
 'height': 360,
 'nodata': -3.402823e+38,
 'transform': (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5),
 'width': 720} 
Resolution: x_res=720 y_res=360.
Bounds: BoundingBox(left=-180.0, bottom=-90.0, right=180.0, top=90.0) 
Coordinate reference system: {'init': 'epsg:4326'} 
Affine transformation: (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5) 
Number of layers: 1 
Dataset loaded. Use .read() or .read_masks() to access the layers.
No raster data provided, attempting to load default...
Loaded raster data from ../data/watertemp/min_wt_2000.tif 
Driver name: GTiff 
Metadata: {'affine': Affine(0.5, 0.0, -180.0,
       0.0, -0.5, 90.0),
 'count': 1,
 'crs': {'init': 'epsg:4326'},
 'driver': 'GTiff',
 'dtype': 'float32',
 'height': 360,
 'nodata': -3.402823e+38,
 'transform': (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5),
 'width': 720} 
Resolution: x_res=720 y_res=360.
Bounds: BoundingBox(left=-180.0, bottom=-90.0, right=180.0, top=90.0) 
Coordinate reference system: {'init': 'epsg:4326'} 
Affine transformation: (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5) 
Number of layers: 1 
Dataset loaded. Use .read() or .read_masks() to access the layers.
Succesfully loaded existing raster data from ../data/watertemp/min_wt_2000.tif.
Transforming to world coordinates...
Affine transformation T0:
 |-0.50, 0.00, 90.00|
| 0.00, 0.50,-180.00|
| 0.00, 0.00, 1.00| 
Raster data shape: (360, 720) 
Affine transformation T1:
 |-0.50, 0.00, 89.75|
| 0.00, 0.50,-179.75|
| 0.00, 0.00, 1.00| 
Not filtering any no_data pixels.
Transformation to world coordinates completed.

In [34]:
water_min_coordinates


Out[34]:
(array([ 89.75,  89.75,  89.75, ..., -89.75, -89.75, -89.75]),
 array([-179.75, -179.25, -178.75, ...,  178.75,  179.25,  179.75]))

In [35]:
mintemp_dataframe = pd.DataFrame([water_min_coordinates[0], water_min_coordinates[1]]).T
mintemp_dataframe.columns=['decimallatitude', 'decimallongitude']
water_min_matrix = water_min_reader.read(1)
mintemp_dataframe['MinT'] = water_min_matrix.reshape(np.product(water_min_matrix.shape))
mintemp_dataframe.set_index(['decimallatitude', 'decimallongitude'], inplace=True, drop=True)
mintemp_dataframe.head()


Out[35]:
MinT
decimallatitude decimallongitude
89.75 -179.75 -3.402823e+38
-179.25 -3.402823e+38
-178.75 -3.402823e+38
-178.25 -3.402823e+38
-177.75 -3.402823e+38

In [36]:
mintemp_dataframe.tail()


Out[36]:
MinT
decimallatitude decimallongitude
-89.75 177.75 -3.402823e+38
178.25 -3.402823e+38
178.75 -3.402823e+38
179.25 -3.402823e+38
179.75 -3.402823e+38

5.5 Get arrays of coordinates (latitude/longitude) for each cell (middle point) in a maximum temperature pixel map


In [37]:
water_max_layer =  ClimateLayer(file_path="../data/watertemp/max_wt_2000.tif") 
water_max_reader = water_max_layer.load_data()
# HERE: should we ignore cells with no-data values for temperature? They are set to a really big negative number
# for now we keep them, otherwise could be NaN
water_max_coordinates = water_max_layer.pixel_to_world_coordinates(filter_no_data_value=False)


Loaded raster data from ../data/watertemp/max_wt_2000.tif 
Driver name: GTiff 
Metadata: {'affine': Affine(0.5, 0.0, -180.0,
       0.0, -0.5, 90.0),
 'count': 1,
 'crs': {'init': 'epsg:4326'},
 'driver': 'GTiff',
 'dtype': 'float32',
 'height': 360,
 'nodata': -3.402823e+38,
 'transform': (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5),
 'width': 720} 
Resolution: x_res=720 y_res=360.
Bounds: BoundingBox(left=-180.0, bottom=-90.0, right=180.0, top=90.0) 
Coordinate reference system: {'init': 'epsg:4326'} 
Affine transformation: (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5) 
Number of layers: 1 
Dataset loaded. Use .read() or .read_masks() to access the layers.
No raster data provided, attempting to load default...
Loaded raster data from ../data/watertemp/max_wt_2000.tif 
Driver name: GTiff 
Metadata: {'affine': Affine(0.5, 0.0, -180.0,
       0.0, -0.5, 90.0),
 'count': 1,
 'crs': {'init': 'epsg:4326'},
 'driver': 'GTiff',
 'dtype': 'float32',
 'height': 360,
 'nodata': -3.402823e+38,
 'transform': (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5),
 'width': 720} 
Resolution: x_res=720 y_res=360.
Bounds: BoundingBox(left=-180.0, bottom=-90.0, right=180.0, top=90.0) 
Coordinate reference system: {'init': 'epsg:4326'} 
Affine transformation: (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5) 
Number of layers: 1 
Dataset loaded. Use .read() or .read_masks() to access the layers.
Succesfully loaded existing raster data from ../data/watertemp/max_wt_2000.tif.
Transforming to world coordinates...
Affine transformation T0:
 |-0.50, 0.00, 90.00|
| 0.00, 0.50,-180.00|
| 0.00, 0.00, 1.00| 
Raster data shape: (360, 720) 
Affine transformation T1:
 |-0.50, 0.00, 89.75|
| 0.00, 0.50,-179.75|
| 0.00, 0.00, 1.00| 
Not filtering any no_data pixels.
Transformation to world coordinates completed.

In [38]:
maxtemp_dataframe = pd.DataFrame([water_max_coordinates[0], water_max_coordinates[1]]).T
maxtemp_dataframe.columns=['decimallatitude', 'decimallongitude']
water_max_matrix = water_max_reader.read(1)
maxtemp_dataframe['MaxT'] = water_max_matrix.reshape(np.product(water_max_matrix.shape))
maxtemp_dataframe.set_index(['decimallatitude', 'decimallongitude'], inplace=True, drop=True)
maxtemp_dataframe.head()


Out[38]:
MaxT
decimallatitude decimallongitude
89.75 -179.75 -3.402823e+38
-179.25 -3.402823e+38
-178.75 -3.402823e+38
-178.25 -3.402823e+38
-177.75 -3.402823e+38

In [39]:
maxtemp_dataframe.tail()


Out[39]:
MaxT
decimallatitude decimallongitude
-89.75 177.75 -3.402823e+38
178.25 -3.402823e+38
178.75 -3.402823e+38
179.25 -3.402823e+38
179.75 -3.402823e+38

5.6 Get arrays of coordinates (latitude/longitude) for each cell (middle point) in a mean temperature pixel map


In [40]:
water_mean_layer =  ClimateLayer(file_path="../data/watertemp/mean_wt_2000.tif") 
water_mean_reader = water_mean_layer.load_data()
# HERE: should we ignore cells with no-data values for temperature? They are set to a really big negative number
# for now we keep them, otherwise could be NaN
water_mean_coordinates = water_mean_layer.pixel_to_world_coordinates(filter_no_data_value=False)


Loaded raster data from ../data/watertemp/mean_wt_2000.tif 
Driver name: GTiff 
Metadata: {'affine': Affine(0.5, 0.0, -180.0,
       0.0, -0.5, 90.0),
 'count': 1,
 'crs': {'init': 'epsg:4326'},
 'driver': 'GTiff',
 'dtype': 'float32',
 'height': 360,
 'nodata': -3.402823e+38,
 'transform': (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5),
 'width': 720} 
Resolution: x_res=720 y_res=360.
Bounds: BoundingBox(left=-180.0, bottom=-90.0, right=180.0, top=90.0) 
Coordinate reference system: {'init': 'epsg:4326'} 
Affine transformation: (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5) 
Number of layers: 1 
Dataset loaded. Use .read() or .read_masks() to access the layers.
No raster data provided, attempting to load default...
Loaded raster data from ../data/watertemp/mean_wt_2000.tif 
Driver name: GTiff 
Metadata: {'affine': Affine(0.5, 0.0, -180.0,
       0.0, -0.5, 90.0),
 'count': 1,
 'crs': {'init': 'epsg:4326'},
 'driver': 'GTiff',
 'dtype': 'float32',
 'height': 360,
 'nodata': -3.402823e+38,
 'transform': (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5),
 'width': 720} 
Resolution: x_res=720 y_res=360.
Bounds: BoundingBox(left=-180.0, bottom=-90.0, right=180.0, top=90.0) 
Coordinate reference system: {'init': 'epsg:4326'} 
Affine transformation: (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5) 
Number of layers: 1 
Dataset loaded. Use .read() or .read_masks() to access the layers.
Succesfully loaded existing raster data from ../data/watertemp/mean_wt_2000.tif.
Transforming to world coordinates...
Affine transformation T0:
 |-0.50, 0.00, 90.00|
| 0.00, 0.50,-180.00|
| 0.00, 0.00, 1.00| 
Raster data shape: (360, 720) 
Affine transformation T1:
 |-0.50, 0.00, 89.75|
| 0.00, 0.50,-179.75|
| 0.00, 0.00, 1.00| 
Not filtering any no_data pixels.
Transformation to world coordinates completed.

In [41]:
meantemp_dataframe = pd.DataFrame([water_mean_coordinates[0], water_mean_coordinates[1]]).T
meantemp_dataframe.columns=['decimallatitude', 'decimallongitude']
water_mean_matrix = water_mean_reader.read(1)
meantemp_dataframe['MeanT'] = water_mean_matrix.reshape(np.product(water_mean_matrix.shape))
meantemp_dataframe.set_index(['decimallatitude', 'decimallongitude'], inplace=True, drop=True)
meantemp_dataframe.head()


Out[41]:
MeanT
decimallatitude decimallongitude
89.75 -179.75 -3.402823e+38
-179.25 -3.402823e+38
-178.75 -3.402823e+38
-178.25 -3.402823e+38
-177.75 -3.402823e+38

In [42]:
meantemp_dataframe.tail()


Out[42]:
MeanT
decimallatitude decimallongitude
-89.75 177.75 -3.402823e+38
178.25 -3.402823e+38
178.75 -3.402823e+38
179.25 -3.402823e+38
179.75 -3.402823e+38

In [45]:
# merge base with presences
merged = base_dataframe.combine_first(presences_dataframe)

In [46]:
merged.head()


Out[46]:
Salmo trutta
decimallatitude decimallongitude
-89.75 -179.75 NaN
-179.25 NaN
-178.75 NaN
-178.25 NaN
-177.75 NaN

In [47]:
merged.tail()


Out[47]:
Salmo trutta
decimallatitude decimallongitude
89.75 177.75 NaN
178.25 NaN
178.75 NaN
179.25 NaN
179.75 NaN

In [48]:
# merge based+presences with pseudo-absences
# merged2 = pd.merge(merged1, pseudo_absences_dataframe, on=["decimallatitude", "decimallongitude", salmo_trutta.name_species], how="outer")

merged = merged.combine_first(pseudo_absences_dataframe)

http://pandas.pydata.org/pandas-docs/stable/merging.html

For this, use the combine_first method.

Note that this method only takes values from the right DataFrame if they are missing in the left DataFrame. A related method, update, alters non-NA values inplace


In [49]:
merged.head()


Out[49]:
Salmo trutta
decimallatitude decimallongitude
-89.75 -179.75 NaN
-179.25 NaN
-178.75 NaN
-178.25 NaN
-177.75 NaN

In [50]:
merged.tail()


Out[50]:
Salmo trutta
decimallatitude decimallongitude
89.75 177.75 NaN
178.25 NaN
178.75 NaN
179.25 NaN
179.75 NaN

In [51]:
# merge base+presences+pseudo-absences with min temperature
#merged3 = pd.merge(merged2, mintemp_dataframe, on=["decimallatitude", "decimallongitude"], how="outer")

merged = merged.combine_first(mintemp_dataframe)

In [52]:
merged.head()


Out[52]:
MinT Salmo trutta
decimallatitude decimallongitude
-89.75 -179.75 -3.402823e+38 NaN
-179.25 -3.402823e+38 NaN
-178.75 -3.402823e+38 NaN
-178.25 -3.402823e+38 NaN
-177.75 -3.402823e+38 NaN

In [53]:
merged.tail()


Out[53]:
MinT Salmo trutta
decimallatitude decimallongitude
89.75 177.75 -3.402823e+38 NaN
178.25 -3.402823e+38 NaN
178.75 -3.402823e+38 NaN
179.25 -3.402823e+38 NaN
179.75 -3.402823e+38 NaN

In [54]:
# merged4 = pd.merge(merged3, maxtemp_dataframe, on=["decimallatitude", "decimallongitude"], how="outer")
merged = merged.combine_first(maxtemp_dataframe)

In [55]:
merged.head()


Out[55]:
MaxT MinT Salmo trutta
decimallatitude decimallongitude
-89.75 -179.75 -3.402823e+38 -3.402823e+38 NaN
-179.25 -3.402823e+38 -3.402823e+38 NaN
-178.75 -3.402823e+38 -3.402823e+38 NaN
-178.25 -3.402823e+38 -3.402823e+38 NaN
-177.75 -3.402823e+38 -3.402823e+38 NaN

In [56]:
merged.tail()


Out[56]:
MaxT MinT Salmo trutta
decimallatitude decimallongitude
89.75 177.75 -3.402823e+38 -3.402823e+38 NaN
178.25 -3.402823e+38 -3.402823e+38 NaN
178.75 -3.402823e+38 -3.402823e+38 NaN
179.25 -3.402823e+38 -3.402823e+38 NaN
179.75 -3.402823e+38 -3.402823e+38 NaN

In [57]:
# merged5 = pd.merge(merged4, meantemp_dataframe, on=["decimallatitude", "decimallongitude"], how="outer")
merged = merged.combine_first(meantemp_dataframe)

In [58]:
merged.tail()


Out[58]:
MaxT MeanT MinT Salmo trutta
decimallatitude decimallongitude
89.75 177.75 -3.402823e+38 -3.402823e+38 -3.402823e+38 NaN
178.25 -3.402823e+38 -3.402823e+38 -3.402823e+38 NaN
178.75 -3.402823e+38 -3.402823e+38 -3.402823e+38 NaN
179.25 -3.402823e+38 -3.402823e+38 -3.402823e+38 NaN
179.75 -3.402823e+38 -3.402823e+38 -3.402823e+38 NaN

In [59]:
merged.to_csv("../data/fish/selection/salmo_trutta_again.csv")

In [60]:
merged[merged['Salmo trutta']==0].shape[0] # should be equal to number of pseudo absences below


Out[60]:
975

In [61]:
pseudo_absence_coordinates[0].shape[0]


Out[61]:
975

In [62]:
merged[merged['Salmo trutta']==1].shape[0]  # should be equal to number of presences below


Out[62]:
6089

In [63]:
presence_coordinates[0].shape[0]


Out[63]:
6089

In [64]:
merged[merged['Salmo trutta'].isnull()].shape[0] # all that's left


Out[64]:
252136

In [65]:
360 * 720 == merged[merged['Salmo trutta']==0].shape[0] + merged[merged['Salmo trutta']==1].shape[0] + merged[merged['Salmo trutta'].isnull()].shape[0]


Out[65]:
True

In [66]:
# == all pixels in 360 x 720 matrix

In [69]:
merged[merged['Salmo trutta']==0.0]


Out[69]:
MaxT MeanT MinT Salmo trutta
decimallatitude decimallongitude
14.25 44.25 294.348785 290.497498 286.225769 0.0
45.75 295.938843 291.549133 287.073181 0.0
14.75 76.25 305.077118 300.666901 297.389282 0.0
79.25 306.705841 302.743439 299.312866 0.0
15.25 44.75 295.634644 291.429962 286.902283 0.0
17.75 74.75 302.301208 299.151672 296.922211 0.0
77.25 304.959717 301.225250 298.016754 0.0
77.75 305.229645 300.634857 297.521271 0.0
20.25 73.25 302.265137 300.006073 297.970367 0.0
20.75 77.75 303.288269 300.978668 299.323792 0.0
21.25 73.25 306.433197 300.999573 296.525177 0.0
21.75 73.75 307.357910 300.695160 294.550110 0.0
101.75 297.350006 294.245880 290.456543 0.0
22.25 69.25 304.384857 300.280365 295.579895 0.0
93.75 296.323669 294.439453 292.090576 0.0
106.25 299.653900 294.457214 287.353485 0.0
108.75 300.579224 295.935547 289.976166 0.0
112.75 301.917999 296.965271 290.311584 0.0
114.25 300.020142 296.552429 291.980164 0.0
22.75 72.75 307.711853 301.609802 294.764954 0.0
97.25 297.301697 294.515808 290.998840 0.0
102.25 296.893372 293.027710 288.674133 0.0
115.75 299.413208 296.564117 293.529358 0.0
23.25 108.25 299.629974 294.765747 287.241119 0.0
116.75 301.512939 295.446381 288.745514 0.0
23.75 68.25 308.485565 301.814178 292.111267 0.0
78.75 307.470917 300.047699 294.140656 0.0
109.75 299.573059 294.064758 285.925385 0.0
110.25 301.049469 294.014618 284.467163 0.0
113.75 299.573303 294.927246 289.687592 0.0
... ... ... ... ... ...
73.75 115.75 286.288361 275.725464 273.250031 0.0
116.75 283.255646 275.114563 273.250031 0.0
117.75 283.487091 275.273346 273.250031 0.0
124.25 288.070862 275.615814 273.250031 0.0
74.25 55.25 280.299927 276.084808 273.272705 0.0
87.25 277.674500 274.462982 273.358612 0.0
94.75 280.017822 274.560333 273.250031 0.0
102.75 274.750580 273.510254 273.250031 0.0
106.25 285.034943 275.336487 273.250031 0.0
112.25 286.482025 275.288757 273.250031 0.0
74.75 56.75 277.572540 275.066864 273.254852 0.0
87.25 279.660889 276.205383 273.394745 0.0
99.25 280.808075 274.522644 273.250031 0.0
112.75 281.576904 274.822754 273.250031 0.0
75.25 60.75 280.011536 274.293488 273.250031 0.0
99.25 282.532745 274.964752 273.250031 0.0
101.25 280.831268 274.709930 273.250031 0.0
103.25 279.939087 274.600677 273.250031 0.0
140.25 284.235931 274.903748 273.250031 0.0
75.75 102.75 280.237244 274.754822 273.250031 0.0
104.75 279.629669 274.309540 273.250061 0.0
113.75 284.064880 275.061584 273.250031 0.0
146.75 282.761536 275.033936 273.250031 0.0
76.25 66.75 276.715729 274.978271 273.259796 0.0
93.25 279.426910 274.270721 273.250031 0.0
100.25 281.042206 274.694336 273.250031 0.0
112.75 282.890137 275.888458 273.272400 0.0
76.75 105.25 279.569733 274.468933 273.250031 0.0
77.75 91.75 276.046143 273.623474 273.250031 0.0
79.25 95.75 278.719238 273.940613 273.250031 0.0

975 rows × 4 columns


In [70]:
pseudo_absences_dataframe


Out[70]:
Salmo trutta
decimallatitude decimallongitude
79.25 95.75 0
77.75 91.75 0
76.75 105.25 0
76.25 66.75 0
93.25 0
100.25 0
112.75 0
75.75 102.75 0
104.75 0
113.75 0
146.75 0
75.25 60.75 0
99.25 0
101.25 0
103.25 0
140.25 0
74.75 56.75 0
87.25 0
99.25 0
112.75 0
74.25 55.25 0
87.25 0
94.75 0
102.75 0
106.25 0
112.25 0
73.75 80.75 0
84.75 0
85.75 0
94.25 0
... ... ...
23.75 68.25 0
78.75 0
109.75 0
110.25 0
113.75 0
23.25 108.25 0
116.75 0
22.75 72.75 0
97.25 0
102.25 0
115.75 0
22.25 69.25 0
93.75 0
106.25 0
108.75 0
112.75 0
114.25 0
21.75 73.75 0
101.75 0
21.25 73.25 0
20.75 77.75 0
20.25 73.25 0
17.75 74.75 0
77.25 0
77.75 0
15.25 44.75 0
14.75 76.25 0
79.25 0
14.25 44.25 0
45.75 0

975 rows × 1 columns


In [ ]: