0. Standard setup for logging and plotting inside a notebook



In [1]:

    
import logging
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
root = logging.getLogger()
root.addHandler(logging.StreamHandler())
%matplotlib inline

1. Choose a representative species for a case study



In [2]:

    
# download from Google Drive: https://drive.google.com/open?id=0B9cazFzBtPuCOFNiUHYwcVFVODQ
# Representative example with multiple polygons in the shapefile, and a lot of point-records (also outside rangemaps)
from iSDM.species import IUCNSpecies
salmo_trutta = IUCNSpecies(name_species='Salmo trutta')
salmo_trutta.load_shapefile("../data/fish/selection/salmo_trutta")









    



Enabled Shapely speedups for performance.
Loading data from: ../data/fish/selection/salmo_trutta
The shapefile contains data on 3 species areas.

2. Rasterize the species, to get a matrix of pixels



In [3]:

    
rasterized = salmo_trutta.rasterize(raster_file="./salmo_trutta_full.tif", pixel_size=0.5, all_touched=True)









    



RASTERIO: Data rasterized into file ./salmo_trutta_full.tif 
RASTERIO: Resolution: x_res=720 y_res=360

2.1 Plot to get an idea



In [4]:

    
plt.figure(figsize=(25,20))
plt.imshow(rasterized, cmap="hot", interpolation="none")









    Out[4]:





<matplotlib.image.AxesImage at 0x7f886ede7f28>

3. Load the biogeographical regons raster layer



In [5]:

    
from iSDM.environment import RasterEnvironmentalLayer
biomes_adf = RasterEnvironmentalLayer(file_path="../data/rebioms/w001001.adf", name_layer="Biomes")
biomes_adf.load_data()









    



Loaded raster data from ../data/rebioms/w001001.adf 
Driver name: AIG 
Metadata: {'affine': Affine(0.5, 0.0, -180.0,
       0.0, -0.5, 90.0),
 'count': 1,
 'crs': {'init': 'epsg:4326'},
 'driver': 'AIG',
 'dtype': 'uint8',
 'height': 360,
 'nodata': 255.0,
 'transform': (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5),
 'width': 720} 
Resolution: x_res=720 y_res=360.
Bounds: BoundingBox(left=-180.0, bottom=-90.0, right=180.0, top=90.0) 
Coordinate reference system: {'init': 'epsg:4326'} 
Affine transformation: (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5) 
Number of layers: 1 
Dataset loaded. Use .read() or .read_masks() to access the layers.






    Out[5]:





<open RasterReader name='../data/rebioms/w001001.adf' mode='r'>

3.1 Plot to get an idea



In [6]:

    
biomes_adf.plot()

3.2 Load the continents vector layer (for further clipping of pseudo-absence area), rasterize



In [7]:

    
from iSDM.environment import ContinentsLayer
from iSDM.environment import Source
continents = ContinentsLayer(file_path="../data/continents/continent.shp", source=Source.ARCGIS)
continents.load_data()
fig, ax = plt.subplots(1,1, figsize=(30,20))
continents.data_full.plot(column="continent", colormap="hsv")









    



Loading data from ../data/continents/continent.shp 
The shapefile contains data on 8 environmental regions.






    Out[7]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f886d13cda0>



In [8]:

    
continents_rasters = continents.rasterize(raster_file="../data/continents/continents_raster.tif", pixel_size=0.5, all_touched=True)









    



Will rasterize continent-by-continent.
Rasterizing continent Asia 
Rasterizing continent North America 
Rasterizing continent Europe 
Rasterizing continent Africa 
Rasterizing continent South America 
Rasterizing continent Oceania 
Rasterizing continent Australia 
Rasterizing continent Antarctica 
RASTERIO: Data rasterized into file ../data/continents/continents_raster.tif 
RASTERIO: Resolution: x_res=720 y_res=360



In [9]:

    
continents_rasters.shape # stacked raster with 8 bands, one for each continent.









    Out[9]:





(8, 360, 720)

4. Sample pseudo-absence pixels, taking into account all the distinct biomes that fall in the species region.



In [10]:

    
selected_layers, pseudo_absences = biomes_adf.sample_pseudo_absences(species_raster_data=rasterized, continents_raster_data=continents_rasters, number_of_pseudopoints=1000)









    



Succesfully loaded existing raster data from ../data/rebioms/w001001.adf.
Will use the continents/biogeographic raster data for further clipping of the pseudo-absence regions. 
Sampling 1000 pseudo-absence points from environmental layer.
The following unique (pixel) values will be taken into account for sampling pseudo-absences
[ 8  9 10 11 12 13 14 15 17 21]
There are 17842 pixels to sample from...
Filling 1000 random pixel positions...
Sampled 975 unique pixels as pseudo-absences.

4.1 Plot the biomes taken into account for sampling pseudo-absences, to get an idea



In [11]:

    
plt.figure(figsize=(25,20))
plt.imshow(selected_layers, cmap="hot", interpolation="none")









    Out[11]:





<matplotlib.image.AxesImage at 0x7f8868562978>

4.2 Plot the sampled pseudo-absences, to get an idea



In [12]:

    
plt.figure(figsize=(25,20))
plt.imshow(pseudo_absences, cmap="hot", interpolation="none")









    Out[12]:





<matplotlib.image.AxesImage at 0x7f886838aa58>

5. Construct a convenient dataframe for testing with different SDM models

For the Example 2 datasheet, all cells of a global raster map are needed, one pixel per row.

5.1 Get arrays of coordinates (latitude/longitude) for each cell (middle point) in a "base" (zeroes) raster map.



In [13]:

    
all_coordinates = biomes_adf.pixel_to_world_coordinates(raster_data=np.zeros_like(rasterized), filter_no_data_value=False)









    



Transforming to world coordinates...
Affine transformation T0:
 |-0.50, 0.00, 90.00|
| 0.00, 0.50,-180.00|
| 0.00, 0.00, 1.00| 
Raster data shape: (360, 720) 
Affine transformation T1:
 |-0.50, 0.00, 89.75|
| 0.00, 0.50,-179.75|
| 0.00, 0.00, 1.00| 
Not filtering any no_data pixels.
Transformation to world coordinates completed.



In [14]:

    
all_coordinates









    Out[14]:





(array([ 89.75,  89.75,  89.75, ..., -89.75, -89.75, -89.75]),
 array([-179.75, -179.25, -178.75, ...,  178.75,  179.25,  179.75]))



In [19]:

    
base_dataframe = pd.DataFrame([all_coordinates[0], all_coordinates[1]]).T
base_dataframe.columns=['decimallatitude', 'decimallongitude']



In [20]:

    
base_dataframe.set_index(['decimallatitude', 'decimallongitude'], inplace=True, drop=True)



In [22]:

    
base_dataframe.head()









    Out[22]:






  
    
      
      
    
    
      decimallatitude
      decimallongitude
    
  
  
    
      89.75
      -179.75
    
    
      -179.25
    
    
      -178.75
    
    
      -178.25
    
    
      -177.75



In [23]:

    
base_dataframe.tail()









    Out[23]:






  
    
      
      
    
    
      decimallatitude
      decimallongitude
    
  
  
    
      -89.75
      177.75
    
    
      178.25
    
    
      178.75
    
    
      179.25
    
    
      179.75

5.2 Get arrays of coordinates (latitude/longitude) for each cell (middle point) in a presences pixel map



In [24]:

    
presence_coordinates = salmo_trutta.pixel_to_world_coordinates()









    



No raster data provided, attempting to load default...
Loaded raster data from ./salmo_trutta_full.tif 
Driver name: GTiff 
Metadata: {'affine': Affine(0.5, 0.0, -180.0,
       0.0, -0.5, 90.0),
 'count': 1,
 'crs': {'init': 'epsg:4326'},
 'driver': 'GTiff',
 'dtype': 'uint8',
 'height': 360,
 'nodata': 0.0,
 'transform': (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5),
 'width': 720} 
Resolution: x_res=720 y_res=360.
Bounds: BoundingBox(left=-180.0, bottom=-90.0, right=180.0, top=90.0) 
Coordinate reference system: {'init': 'epsg:4326'} 
Affine transformation: (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5) 
Number of layers: 1 
Dataset loaded. Use .read() or .read_masks() to access the layers.
Succesfully loaded existing raster data from ./salmo_trutta_full.tif.
Affine transformation T0:
 |-0.50, 0.00, 90.00|
| 0.00, 0.50,-180.00|
| 0.00, 0.00, 1.00| 
Raster data shape: (360, 720) 
Affine transformation T1:
 |-0.50, 0.00, 89.75|
| 0.00, 0.50,-179.75|
| 0.00, 0.00, 1.00| 
Filtering out no_data pixels.



In [25]:

    
presence_coordinates









    Out[25]:





(array([ 71.25,  71.25,  71.25, ...,  30.75,  30.25,  30.25]),
 array([ 23.75,  24.25,  24.75, ...,  79.25,  78.25,  78.75]))



In [26]:

    
presences_dataframe = pd.DataFrame([presence_coordinates[0], presence_coordinates[1]]).T
presences_dataframe.columns=['decimallatitude', 'decimallongitude']
presences_dataframe[salmo_trutta.name_species] = 1 # fill presences with 1's
presences_dataframe.set_index(['decimallatitude', 'decimallongitude'], inplace=True, drop=True)



In [27]:

    
presences_dataframe.head()









    Out[27]:






  
    
      
      
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      71.25
      23.75
      1
    
    
      24.25
      1
    
    
      24.75
      1
    
    
      25.25
      1
    
    
      25.75
      1



In [28]:

    
presences_dataframe.tail()









    Out[28]:






  
    
      
      
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      30.75
      78.25
      1
    
    
      78.75
      1
    
    
      79.25
      1
    
    
      30.25
      78.25
      1
    
    
      78.75
      1

5.3 Get arrays of coordinates (latitude/longitude) for each cell (middle point) in a pseudo_absences pixel map



In [29]:

    
pseudo_absence_coordinates = biomes_adf.pixel_to_world_coordinates(raster_data=pseudo_absences)









    



Transforming to world coordinates...
Affine transformation T0:
 |-0.50, 0.00, 90.00|
| 0.00, 0.50,-180.00|
| 0.00, 0.00, 1.00| 
Raster data shape: (360, 720) 
Affine transformation T1:
 |-0.50, 0.00, 89.75|
| 0.00, 0.50,-179.75|
| 0.00, 0.00, 1.00| 
Filtering out no_data pixels.
Transformation to world coordinates completed.



In [30]:

    
pseudo_absences_dataframe = pd.DataFrame([pseudo_absence_coordinates[0], pseudo_absence_coordinates[1]]).T
pseudo_absences_dataframe.columns=['decimallatitude', 'decimallongitude']
pseudo_absences_dataframe[salmo_trutta.name_species] = 0
pseudo_absences_dataframe.set_index(['decimallatitude', 'decimallongitude'], inplace=True, drop=True)



In [31]:

    
pseudo_absences_dataframe.head()









    Out[31]:






  
    
      
      
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      79.25
      95.75
      0
    
    
      77.75
      91.75
      0
    
    
      76.75
      105.25
      0
    
    
      76.25
      66.75
      0
    
    
      93.25
      0



In [32]:

    
pseudo_absences_dataframe.tail()









    Out[32]:






  
    
      
      
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      15.25
      44.75
      0
    
    
      14.75
      76.25
      0
    
    
      79.25
      0
    
    
      14.25
      44.25
      0
    
    
      45.75
      0

5.4 Get arrays of coordinates (latitude/longitude) for each cell (middle point) in a minimum temperature pixel map



In [33]:

    
from iSDM.environment import ClimateLayer
water_min_layer =  ClimateLayer(file_path="../data/watertemp/min_wt_2000.tif") 
water_min_reader = water_min_layer.load_data()
# HERE: should we ignore cells with no-data values for temperature? They are set to a really big negative number
# for now we keep them, otherwise could be NaN
water_min_coordinates = water_min_layer.pixel_to_world_coordinates(filter_no_data_value=False)









    



Loaded raster data from ../data/watertemp/min_wt_2000.tif 
Driver name: GTiff 
Metadata: {'affine': Affine(0.5, 0.0, -180.0,
       0.0, -0.5, 90.0),
 'count': 1,
 'crs': {'init': 'epsg:4326'},
 'driver': 'GTiff',
 'dtype': 'float32',
 'height': 360,
 'nodata': -3.402823e+38,
 'transform': (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5),
 'width': 720} 
Resolution: x_res=720 y_res=360.
Bounds: BoundingBox(left=-180.0, bottom=-90.0, right=180.0, top=90.0) 
Coordinate reference system: {'init': 'epsg:4326'} 
Affine transformation: (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5) 
Number of layers: 1 
Dataset loaded. Use .read() or .read_masks() to access the layers.
No raster data provided, attempting to load default...
Loaded raster data from ../data/watertemp/min_wt_2000.tif 
Driver name: GTiff 
Metadata: {'affine': Affine(0.5, 0.0, -180.0,
       0.0, -0.5, 90.0),
 'count': 1,
 'crs': {'init': 'epsg:4326'},
 'driver': 'GTiff',
 'dtype': 'float32',
 'height': 360,
 'nodata': -3.402823e+38,
 'transform': (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5),
 'width': 720} 
Resolution: x_res=720 y_res=360.
Bounds: BoundingBox(left=-180.0, bottom=-90.0, right=180.0, top=90.0) 
Coordinate reference system: {'init': 'epsg:4326'} 
Affine transformation: (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5) 
Number of layers: 1 
Dataset loaded. Use .read() or .read_masks() to access the layers.
Succesfully loaded existing raster data from ../data/watertemp/min_wt_2000.tif.
Transforming to world coordinates...
Affine transformation T0:
 |-0.50, 0.00, 90.00|
| 0.00, 0.50,-180.00|
| 0.00, 0.00, 1.00| 
Raster data shape: (360, 720) 
Affine transformation T1:
 |-0.50, 0.00, 89.75|
| 0.00, 0.50,-179.75|
| 0.00, 0.00, 1.00| 
Not filtering any no_data pixels.
Transformation to world coordinates completed.



In [34]:

    
water_min_coordinates









    Out[34]:





(array([ 89.75,  89.75,  89.75, ..., -89.75, -89.75, -89.75]),
 array([-179.75, -179.25, -178.75, ...,  178.75,  179.25,  179.75]))



In [35]:

    
mintemp_dataframe = pd.DataFrame([water_min_coordinates[0], water_min_coordinates[1]]).T
mintemp_dataframe.columns=['decimallatitude', 'decimallongitude']
water_min_matrix = water_min_reader.read(1)
mintemp_dataframe['MinT'] = water_min_matrix.reshape(np.product(water_min_matrix.shape))
mintemp_dataframe.set_index(['decimallatitude', 'decimallongitude'], inplace=True, drop=True)
mintemp_dataframe.head()









    Out[35]:






  
    
      
      
      MinT
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      89.75
      -179.75
      -3.402823e+38
    
    
      -179.25
      -3.402823e+38
    
    
      -178.75
      -3.402823e+38
    
    
      -178.25
      -3.402823e+38
    
    
      -177.75
      -3.402823e+38



In [36]:

    
mintemp_dataframe.tail()









    Out[36]:






  
    
      
      
      MinT
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      -89.75
      177.75
      -3.402823e+38
    
    
      178.25
      -3.402823e+38
    
    
      178.75
      -3.402823e+38
    
    
      179.25
      -3.402823e+38
    
    
      179.75
      -3.402823e+38

5.5 Get arrays of coordinates (latitude/longitude) for each cell (middle point) in a maximum temperature pixel map



In [37]:

    
water_max_layer =  ClimateLayer(file_path="../data/watertemp/max_wt_2000.tif") 
water_max_reader = water_max_layer.load_data()
# HERE: should we ignore cells with no-data values for temperature? They are set to a really big negative number
# for now we keep them, otherwise could be NaN
water_max_coordinates = water_max_layer.pixel_to_world_coordinates(filter_no_data_value=False)









    



Loaded raster data from ../data/watertemp/max_wt_2000.tif 
Driver name: GTiff 
Metadata: {'affine': Affine(0.5, 0.0, -180.0,
       0.0, -0.5, 90.0),
 'count': 1,
 'crs': {'init': 'epsg:4326'},
 'driver': 'GTiff',
 'dtype': 'float32',
 'height': 360,
 'nodata': -3.402823e+38,
 'transform': (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5),
 'width': 720} 
Resolution: x_res=720 y_res=360.
Bounds: BoundingBox(left=-180.0, bottom=-90.0, right=180.0, top=90.0) 
Coordinate reference system: {'init': 'epsg:4326'} 
Affine transformation: (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5) 
Number of layers: 1 
Dataset loaded. Use .read() or .read_masks() to access the layers.
No raster data provided, attempting to load default...
Loaded raster data from ../data/watertemp/max_wt_2000.tif 
Driver name: GTiff 
Metadata: {'affine': Affine(0.5, 0.0, -180.0,
       0.0, -0.5, 90.0),
 'count': 1,
 'crs': {'init': 'epsg:4326'},
 'driver': 'GTiff',
 'dtype': 'float32',
 'height': 360,
 'nodata': -3.402823e+38,
 'transform': (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5),
 'width': 720} 
Resolution: x_res=720 y_res=360.
Bounds: BoundingBox(left=-180.0, bottom=-90.0, right=180.0, top=90.0) 
Coordinate reference system: {'init': 'epsg:4326'} 
Affine transformation: (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5) 
Number of layers: 1 
Dataset loaded. Use .read() or .read_masks() to access the layers.
Succesfully loaded existing raster data from ../data/watertemp/max_wt_2000.tif.
Transforming to world coordinates...
Affine transformation T0:
 |-0.50, 0.00, 90.00|
| 0.00, 0.50,-180.00|
| 0.00, 0.00, 1.00| 
Raster data shape: (360, 720) 
Affine transformation T1:
 |-0.50, 0.00, 89.75|
| 0.00, 0.50,-179.75|
| 0.00, 0.00, 1.00| 
Not filtering any no_data pixels.
Transformation to world coordinates completed.



In [38]:

    
maxtemp_dataframe = pd.DataFrame([water_max_coordinates[0], water_max_coordinates[1]]).T
maxtemp_dataframe.columns=['decimallatitude', 'decimallongitude']
water_max_matrix = water_max_reader.read(1)
maxtemp_dataframe['MaxT'] = water_max_matrix.reshape(np.product(water_max_matrix.shape))
maxtemp_dataframe.set_index(['decimallatitude', 'decimallongitude'], inplace=True, drop=True)
maxtemp_dataframe.head()









    Out[38]:






  
    
      
      
      MaxT
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      89.75
      -179.75
      -3.402823e+38
    
    
      -179.25
      -3.402823e+38
    
    
      -178.75
      -3.402823e+38
    
    
      -178.25
      -3.402823e+38
    
    
      -177.75
      -3.402823e+38



In [39]:

    
maxtemp_dataframe.tail()









    Out[39]:






  
    
      
      
      MaxT
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      -89.75
      177.75
      -3.402823e+38
    
    
      178.25
      -3.402823e+38
    
    
      178.75
      -3.402823e+38
    
    
      179.25
      -3.402823e+38
    
    
      179.75
      -3.402823e+38

5.6 Get arrays of coordinates (latitude/longitude) for each cell (middle point) in a mean temperature pixel map



In [40]:

    
water_mean_layer =  ClimateLayer(file_path="../data/watertemp/mean_wt_2000.tif") 
water_mean_reader = water_mean_layer.load_data()
# HERE: should we ignore cells with no-data values for temperature? They are set to a really big negative number
# for now we keep them, otherwise could be NaN
water_mean_coordinates = water_mean_layer.pixel_to_world_coordinates(filter_no_data_value=False)









    



Loaded raster data from ../data/watertemp/mean_wt_2000.tif 
Driver name: GTiff 
Metadata: {'affine': Affine(0.5, 0.0, -180.0,
       0.0, -0.5, 90.0),
 'count': 1,
 'crs': {'init': 'epsg:4326'},
 'driver': 'GTiff',
 'dtype': 'float32',
 'height': 360,
 'nodata': -3.402823e+38,
 'transform': (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5),
 'width': 720} 
Resolution: x_res=720 y_res=360.
Bounds: BoundingBox(left=-180.0, bottom=-90.0, right=180.0, top=90.0) 
Coordinate reference system: {'init': 'epsg:4326'} 
Affine transformation: (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5) 
Number of layers: 1 
Dataset loaded. Use .read() or .read_masks() to access the layers.
No raster data provided, attempting to load default...
Loaded raster data from ../data/watertemp/mean_wt_2000.tif 
Driver name: GTiff 
Metadata: {'affine': Affine(0.5, 0.0, -180.0,
       0.0, -0.5, 90.0),
 'count': 1,
 'crs': {'init': 'epsg:4326'},
 'driver': 'GTiff',
 'dtype': 'float32',
 'height': 360,
 'nodata': -3.402823e+38,
 'transform': (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5),
 'width': 720} 
Resolution: x_res=720 y_res=360.
Bounds: BoundingBox(left=-180.0, bottom=-90.0, right=180.0, top=90.0) 
Coordinate reference system: {'init': 'epsg:4326'} 
Affine transformation: (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5) 
Number of layers: 1 
Dataset loaded. Use .read() or .read_masks() to access the layers.
Succesfully loaded existing raster data from ../data/watertemp/mean_wt_2000.tif.
Transforming to world coordinates...
Affine transformation T0:
 |-0.50, 0.00, 90.00|
| 0.00, 0.50,-180.00|
| 0.00, 0.00, 1.00| 
Raster data shape: (360, 720) 
Affine transformation T1:
 |-0.50, 0.00, 89.75|
| 0.00, 0.50,-179.75|
| 0.00, 0.00, 1.00| 
Not filtering any no_data pixels.
Transformation to world coordinates completed.



In [41]:

    
meantemp_dataframe = pd.DataFrame([water_mean_coordinates[0], water_mean_coordinates[1]]).T
meantemp_dataframe.columns=['decimallatitude', 'decimallongitude']
water_mean_matrix = water_mean_reader.read(1)
meantemp_dataframe['MeanT'] = water_mean_matrix.reshape(np.product(water_mean_matrix.shape))
meantemp_dataframe.set_index(['decimallatitude', 'decimallongitude'], inplace=True, drop=True)
meantemp_dataframe.head()









    Out[41]:






  
    
      
      
      MeanT
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      89.75
      -179.75
      -3.402823e+38
    
    
      -179.25
      -3.402823e+38
    
    
      -178.75
      -3.402823e+38
    
    
      -178.25
      -3.402823e+38
    
    
      -177.75
      -3.402823e+38



In [42]:

    
meantemp_dataframe.tail()









    Out[42]:






  
    
      
      
      MeanT
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      -89.75
      177.75
      -3.402823e+38
    
    
      178.25
      -3.402823e+38
    
    
      178.75
      -3.402823e+38
    
    
      179.25
      -3.402823e+38
    
    
      179.75
      -3.402823e+38



In [45]:

    
# merge base with presences
merged = base_dataframe.combine_first(presences_dataframe)



In [46]:

    
merged.head()









    Out[46]:






  
    
      
      
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      -89.75
      -179.75
      NaN
    
    
      -179.25
      NaN
    
    
      -178.75
      NaN
    
    
      -178.25
      NaN
    
    
      -177.75
      NaN



In [47]:

    
merged.tail()









    Out[47]:






  
    
      
      
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      89.75
      177.75
      NaN
    
    
      178.25
      NaN
    
    
      178.75
      NaN
    
    
      179.25
      NaN
    
    
      179.75
      NaN



In [48]:

    
# merge based+presences with pseudo-absences
# merged2 = pd.merge(merged1, pseudo_absences_dataframe, on=["decimallatitude", "decimallongitude", salmo_trutta.name_species], how="outer")

merged = merged.combine_first(pseudo_absences_dataframe)

http://pandas.pydata.org/pandas-docs/stable/merging.html

For this, use the combine_first method.

Note that this method only takes values from the right DataFrame if they are missing in the left DataFrame. A related method, update, alters non-NA values inplace



In [49]:

    
merged.head()









    Out[49]:






  
    
      
      
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      -89.75
      -179.75
      NaN
    
    
      -179.25
      NaN
    
    
      -178.75
      NaN
    
    
      -178.25
      NaN
    
    
      -177.75
      NaN



In [50]:

    
merged.tail()









    Out[50]:






  
    
      
      
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      89.75
      177.75
      NaN
    
    
      178.25
      NaN
    
    
      178.75
      NaN
    
    
      179.25
      NaN
    
    
      179.75
      NaN



In [51]:

    
# merge base+presences+pseudo-absences with min temperature
#merged3 = pd.merge(merged2, mintemp_dataframe, on=["decimallatitude", "decimallongitude"], how="outer")

merged = merged.combine_first(mintemp_dataframe)



In [52]:

    
merged.head()









    Out[52]:






  
    
      
      
      MinT
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
      
    
  
  
    
      -89.75
      -179.75
      -3.402823e+38
      NaN
    
    
      -179.25
      -3.402823e+38
      NaN
    
    
      -178.75
      -3.402823e+38
      NaN
    
    
      -178.25
      -3.402823e+38
      NaN
    
    
      -177.75
      -3.402823e+38
      NaN



In [53]:

    
merged.tail()









    Out[53]:






  
    
      
      
      MinT
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
      
    
  
  
    
      89.75
      177.75
      -3.402823e+38
      NaN
    
    
      178.25
      -3.402823e+38
      NaN
    
    
      178.75
      -3.402823e+38
      NaN
    
    
      179.25
      -3.402823e+38
      NaN
    
    
      179.75
      -3.402823e+38
      NaN



In [54]:

    
# merged4 = pd.merge(merged3, maxtemp_dataframe, on=["decimallatitude", "decimallongitude"], how="outer")
merged = merged.combine_first(maxtemp_dataframe)



In [55]:

    
merged.head()









    Out[55]:






  
    
      
      
      MaxT
      MinT
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
      
      
    
  
  
    
      -89.75
      -179.75
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      -179.25
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      -178.75
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      -178.25
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      -177.75
      -3.402823e+38
      -3.402823e+38
      NaN



In [56]:

    
merged.tail()









    Out[56]:






  
    
      
      
      MaxT
      MinT
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
      
      
    
  
  
    
      89.75
      177.75
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      178.25
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      178.75
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      179.25
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      179.75
      -3.402823e+38
      -3.402823e+38
      NaN



In [57]:

    
# merged5 = pd.merge(merged4, meantemp_dataframe, on=["decimallatitude", "decimallongitude"], how="outer")
merged = merged.combine_first(meantemp_dataframe)



In [58]:

    
merged.tail()









    Out[58]:






  
    
      
      
      MaxT
      MeanT
      MinT
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
      
      
      
    
  
  
    
      89.75
      177.75
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      178.25
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      178.75
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      179.25
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      179.75
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN



In [59]:

    
merged.to_csv("../data/fish/selection/salmo_trutta_again.csv")



In [60]:

    
merged[merged['Salmo trutta']==0].shape[0] # should be equal to number of pseudo absences below









    Out[60]:





975



In [61]:

    
pseudo_absence_coordinates[0].shape[0]









    Out[61]:





975



In [62]:

    
merged[merged['Salmo trutta']==1].shape[0]  # should be equal to number of presences below









    Out[62]:





6089



In [63]:

    
presence_coordinates[0].shape[0]









    Out[63]:





6089



In [64]:

    
merged[merged['Salmo trutta'].isnull()].shape[0] # all that's left









    Out[64]:





252136



In [65]:

    
360 * 720 == merged[merged['Salmo trutta']==0].shape[0] + merged[merged['Salmo trutta']==1].shape[0] + merged[merged['Salmo trutta'].isnull()].shape[0]









    Out[65]:





True



In [66]:

    
# == all pixels in 360 x 720 matrix



In [69]:

    
merged[merged['Salmo trutta']==0.0]









    Out[69]:






  
    
      
      
      MaxT
      MeanT
      MinT
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
      
      
      
    
  
  
    
      14.25
      44.25
      294.348785
      290.497498
      286.225769
      0.0
    
    
      45.75
      295.938843
      291.549133
      287.073181
      0.0
    
    
      14.75
      76.25
      305.077118
      300.666901
      297.389282
      0.0
    
    
      79.25
      306.705841
      302.743439
      299.312866
      0.0
    
    
      15.25
      44.75
      295.634644
      291.429962
      286.902283
      0.0
    
    
      17.75
      74.75
      302.301208
      299.151672
      296.922211
      0.0
    
    
      77.25
      304.959717
      301.225250
      298.016754
      0.0
    
    
      77.75
      305.229645
      300.634857
      297.521271
      0.0
    
    
      20.25
      73.25
      302.265137
      300.006073
      297.970367
      0.0
    
    
      20.75
      77.75
      303.288269
      300.978668
      299.323792
      0.0
    
    
      21.25
      73.25
      306.433197
      300.999573
      296.525177
      0.0
    
    
      21.75
      73.75
      307.357910
      300.695160
      294.550110
      0.0
    
    
      101.75
      297.350006
      294.245880
      290.456543
      0.0
    
    
      22.25
      69.25
      304.384857
      300.280365
      295.579895
      0.0
    
    
      93.75
      296.323669
      294.439453
      292.090576
      0.0
    
    
      106.25
      299.653900
      294.457214
      287.353485
      0.0
    
    
      108.75
      300.579224
      295.935547
      289.976166
      0.0
    
    
      112.75
      301.917999
      296.965271
      290.311584
      0.0
    
    
      114.25
      300.020142
      296.552429
      291.980164
      0.0
    
    
      22.75
      72.75
      307.711853
      301.609802
      294.764954
      0.0
    
    
      97.25
      297.301697
      294.515808
      290.998840
      0.0
    
    
      102.25
      296.893372
      293.027710
      288.674133
      0.0
    
    
      115.75
      299.413208
      296.564117
      293.529358
      0.0
    
    
      23.25
      108.25
      299.629974
      294.765747
      287.241119
      0.0
    
    
      116.75
      301.512939
      295.446381
      288.745514
      0.0
    
    
      23.75
      68.25
      308.485565
      301.814178
      292.111267
      0.0
    
    
      78.75
      307.470917
      300.047699
      294.140656
      0.0
    
    
      109.75
      299.573059
      294.064758
      285.925385
      0.0
    
    
      110.25
      301.049469
      294.014618
      284.467163
      0.0
    
    
      113.75
      299.573303
      294.927246
      289.687592
      0.0
    
    
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      73.75
      115.75
      286.288361
      275.725464
      273.250031
      0.0
    
    
      116.75
      283.255646
      275.114563
      273.250031
      0.0
    
    
      117.75
      283.487091
      275.273346
      273.250031
      0.0
    
    
      124.25
      288.070862
      275.615814
      273.250031
      0.0
    
    
      74.25
      55.25
      280.299927
      276.084808
      273.272705
      0.0
    
    
      87.25
      277.674500
      274.462982
      273.358612
      0.0
    
    
      94.75
      280.017822
      274.560333
      273.250031
      0.0
    
    
      102.75
      274.750580
      273.510254
      273.250031
      0.0
    
    
      106.25
      285.034943
      275.336487
      273.250031
      0.0
    
    
      112.25
      286.482025
      275.288757
      273.250031
      0.0
    
    
      74.75
      56.75
      277.572540
      275.066864
      273.254852
      0.0
    
    
      87.25
      279.660889
      276.205383
      273.394745
      0.0
    
    
      99.25
      280.808075
      274.522644
      273.250031
      0.0
    
    
      112.75
      281.576904
      274.822754
      273.250031
      0.0
    
    
      75.25
      60.75
      280.011536
      274.293488
      273.250031
      0.0
    
    
      99.25
      282.532745
      274.964752
      273.250031
      0.0
    
    
      101.25
      280.831268
      274.709930
      273.250031
      0.0
    
    
      103.25
      279.939087
      274.600677
      273.250031
      0.0
    
    
      140.25
      284.235931
      274.903748
      273.250031
      0.0
    
    
      75.75
      102.75
      280.237244
      274.754822
      273.250031
      0.0
    
    
      104.75
      279.629669
      274.309540
      273.250061
      0.0
    
    
      113.75
      284.064880
      275.061584
      273.250031
      0.0
    
    
      146.75
      282.761536
      275.033936
      273.250031
      0.0
    
    
      76.25
      66.75
      276.715729
      274.978271
      273.259796
      0.0
    
    
      93.25
      279.426910
      274.270721
      273.250031
      0.0
    
    
      100.25
      281.042206
      274.694336
      273.250031
      0.0
    
    
      112.75
      282.890137
      275.888458
      273.272400
      0.0
    
    
      76.75
      105.25
      279.569733
      274.468933
      273.250031
      0.0
    
    
      77.75
      91.75
      276.046143
      273.623474
      273.250031
      0.0
    
    
      79.25
      95.75
      278.719238
      273.940613
      273.250031
      0.0
    
  

975 rows × 4 columns



In [70]:

    
pseudo_absences_dataframe









    Out[70]:






  
    
      
      
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      79.25
      95.75
      0
    
    
      77.75
      91.75
      0
    
    
      76.75
      105.25
      0
    
    
      76.25
      66.75
      0
    
    
      93.25
      0
    
    
      100.25
      0
    
    
      112.75
      0
    
    
      75.75
      102.75
      0
    
    
      104.75
      0
    
    
      113.75
      0
    
    
      146.75
      0
    
    
      75.25
      60.75
      0
    
    
      99.25
      0
    
    
      101.25
      0
    
    
      103.25
      0
    
    
      140.25
      0
    
    
      74.75
      56.75
      0
    
    
      87.25
      0
    
    
      99.25
      0
    
    
      112.75
      0
    
    
      74.25
      55.25
      0
    
    
      87.25
      0
    
    
      94.75
      0
    
    
      102.75
      0
    
    
      106.25
      0
    
    
      112.25
      0
    
    
      73.75
      80.75
      0
    
    
      84.75
      0
    
    
      85.75
      0
    
    
      94.25
      0
    
    
      ...
      ...
      ...
    
    
      23.75
      68.25
      0
    
    
      78.75
      0
    
    
      109.75
      0
    
    
      110.25
      0
    
    
      113.75
      0
    
    
      23.25
      108.25
      0
    
    
      116.75
      0
    
    
      22.75
      72.75
      0
    
    
      97.25
      0
    
    
      102.25
      0
    
    
      115.75
      0
    
    
      22.25
      69.25
      0
    
    
      93.75
      0
    
    
      106.25
      0
    
    
      108.75
      0
    
    
      112.75
      0
    
    
      114.25
      0
    
    
      21.75
      73.75
      0
    
    
      101.75
      0
    
    
      21.25
      73.25
      0
    
    
      20.75
      77.75
      0
    
    
      20.25
      73.25
      0
    
    
      17.75
      74.75
      0
    
    
      77.25
      0
    
    
      77.75
      0
    
    
      15.25
      44.75
      0
    
    
      14.75
      76.25
      0
    
    
      79.25
      0
    
    
      14.25
      44.25
      0
    
    
      45.75
      0
    
  

975 rows × 1 columns



In [ ]:


decimallatitude	decimallongitude
89.75	-179.75
	-179.25
	-178.75
	-178.25
	-177.75


decimallatitude	decimallongitude
-89.75	177.75
	178.25
	178.75
	179.25
	179.75

		Salmo trutta
decimallatitude	decimallongitude
71.25	23.75	1
	24.25	1
	24.75	1
	25.25	1
	25.75	1

		Salmo trutta
decimallatitude	decimallongitude
30.75	78.25	1
	78.75	1
	79.25	1
30.25	78.25	1
30.25	78.75	1

		Salmo trutta
decimallatitude	decimallongitude
15.25	44.75	0
14.75	76.25	0
14.75	79.25	0
14.25	44.25	0
14.25	45.75	0