0. Standard setup for logging and plotting inside a notebook



In [1]:

    
import logging
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
root = logging.getLogger()
root.addHandler(logging.StreamHandler())
%matplotlib inline

1. Choose a representative species for a case study



In [2]:

    
# download from Google Drive: https://drive.google.com/open?id=0B9cazFzBtPuCOFNiUHYwcVFVODQ
# Representative example with multiple polygons in the shapefile, and a lot of point-records (also outside rangemaps)
from iSDM.species import IUCNSpecies
salmo_trutta = IUCNSpecies(name_species='Salmo trutta')
salmo_trutta.load_shapefile("../data/fish/selection/salmo_trutta")









    



Enabled Shapely speedups for performance.
Loading data from: ../data/fish/selection/salmo_trutta
The shapefile contains data on 3 species areas.

2. Rasterize the species, to get a matrix of pixels



In [3]:

    
rasterized = salmo_trutta.rasterize(raster_file="./salmo_trutta_full.tif", pixel_size=0.5)









    



RASTERIO: Data rasterized into file ./salmo_trutta_full.tif 
RASTERIO: Resolution: x_res=720 y_res=360

2.1 Plot to get an idea



In [4]:

    
plt.figure(figsize=(25,20))
plt.imshow(rasterized, cmap="hot", interpolation="none")









    Out[4]:





<matplotlib.image.AxesImage at 0x7fd0e4e33e48>

3. Load the biogeographical regons raster layer



In [5]:

    
from iSDM.environment import RasterEnvironmentalLayer
biomes_adf = RasterEnvironmentalLayer(file_path="../data/rebioms/w001001.adf", name_layer="Biomes")
biomes_adf.load_data()









    



Loaded raster data from ../data/rebioms/w001001.adf 
Driver name: AIG 
Metadata: {'affine': Affine(0.5, 0.0, -180.0,
       0.0, -0.5, 90.0),
 'count': 1,
 'crs': {'init': 'epsg:4326'},
 'driver': 'AIG',
 'dtype': 'uint8',
 'height': 360,
 'nodata': 255.0,
 'transform': (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5),
 'width': 720} 
Resolution: x_res=720 y_res=360.
Bounds: BoundingBox(left=-180.0, bottom=-90.0, right=180.0, top=90.0) 
Coordinate reference system: {'init': 'epsg:4326'} 
Affine transformation: (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5) 
Number of layers: 1 
Dataset loaded. Use .read() or .read_masks() to access the layers.






    Out[5]:





<open RasterReader name='../data/rebioms/w001001.adf' mode='r'>

3.1 Plot to get an idea



In [6]:

    
biomes_adf.plot()

3.2 Load the continents vector layer (for further clipping of pseudo-absence area), rasterize



In [7]:

    
from iSDM.environment import ContinentsLayer
from iSDM.environment import Source
continents = ContinentsLayer(file_path="../data/continents/continent.shp", source=Source.ARCGIS)
continents.load_data()
fig, ax = plt.subplots(1,1, figsize=(30,20))
continents.data_full.plot(column="continent", colormap="hsv")









    



Loading data from ../data/continents/continent.shp 
The shapefile contains data on 8 environmental regions.






    Out[7]:





<matplotlib.axes._subplots.AxesSubplot at 0x7fd0e41d9d68>



In [8]:

    
continents_rasters = continents.rasterize(raster_file="../data/continents/continents_raster.tif", pixel_size=0.5)









    



Will rasterize continent-by-continent.
Rasterizing continent Asia 
Rasterizing continent North America 
Rasterizing continent Europe 
Rasterizing continent Africa 
Rasterizing continent South America 
Rasterizing continent Oceania 
Rasterizing continent Australia 
Rasterizing continent Antarctica 
RASTERIO: Data rasterized into file ../data/continents/continents_raster.tif 
RASTERIO: Resolution: x_res=720 y_res=360



In [9]:

    
continents_rasters.shape # stacked raster with 8 bands, one for each continent.









    Out[9]:





(8, 360, 720)

As agreed, we will merge Europe and Asia to be a bit closer to the biogeographical regions. We do this user-specific patching for now, until a better solution is found.



In [10]:

    
continents_rasters[0] = continents_rasters[0] + continents_rasters[2] # combine Europe and Asia



In [11]:

    
continents_rasters[0].max() # where the continents touch, we have overlap! that's why max is not 1, but 2.









    Out[11]:





1

Set all values >1 to 1. (we only care about presence/absence)



In [12]:

    
continents_rasters[0][continents_rasters[0] > 1] = 1

Delete band 2 (Europe, previously merged with layer 0==Asia)



In [13]:

    
continents_rasters = np.delete(continents_rasters, 2, 0)

This is how the band 0 looks like now.



In [14]:

    
plt.figure(figsize=(25,20))
plt.imshow(continents_rasters[0], cmap="hot", interpolation="none")









    Out[14]:





<matplotlib.image.AxesImage at 0x7fd0da8e2320>



In [15]:

    
continents_rasters.shape # now total of 7 band rather than 8









    Out[15]:





(7, 360, 720)

4. Sample pseudo-absence pixels, taking into account all the distinct biomes that fall in the species region.



In [16]:

    
selected_layers, pseudo_absences = biomes_adf.sample_pseudo_absences(species_raster_data=rasterized,continents_raster_data=continents_rasters, number_of_pseudopoints=1000)









    



Succesfully loaded existing raster data from ../data/rebioms/w001001.adf.
Will use the continents/biogeographic raster data for further clipping of the pseudo-absence regions. 
Sampling 1000 pseudo-absence points from environmental layer.
The following unique (pixel) values will be taken into account for sampling pseudo-absences
[ 8  9 10 11 12 13 14 15 17 21]
There are 17034 pixels to sample from...
Filling 1000 random pixel positions...
Sampled 977 unique pixels as pseudo-absences.

4.1 Plot the biomes taken into account for sampling pseudo-absences, to get an idea



In [17]:

    
plt.figure(figsize=(25,20))
plt.imshow(selected_layers, cmap="hot", interpolation="none")









    Out[17]:





<matplotlib.image.AxesImage at 0x7fd0da6dce10>

4.2 Plot the sampled pseudo-absences, to get an idea



In [18]:

    
plt.figure(figsize=(25,20))
plt.imshow(pseudo_absences, cmap="hot", interpolation="none")









    Out[18]:





<matplotlib.image.AxesImage at 0x7fd0da857780>

5. Construct a convenient dataframe for testing with different SDM models

For the Example 2 datasheet, all cells of a global raster map are needed, one pixel per row.

5.1 Get arrays of coordinates (latitude/longitude) for each cell (middle point) in a "base" (zeroes) raster map.



In [19]:

    
all_coordinates = biomes_adf.pixel_to_world_coordinates(raster_data=np.zeros_like(rasterized), filter_no_data_value=False)









    



Transforming to world coordinates...
Affine transformation T0:
 |-0.50, 0.00, 90.00|
| 0.00, 0.50,-180.00|
| 0.00, 0.00, 1.00| 
Raster data shape: (360, 720) 
Affine transformation T1:
 |-0.50, 0.00, 89.75|
| 0.00, 0.50,-179.75|
| 0.00, 0.00, 1.00| 
Not filtering any no_data pixels.
Transformation to world coordinates completed.



In [20]:

    
all_coordinates









    Out[20]:





(array([ 89.75,  89.75,  89.75, ..., -89.75, -89.75, -89.75]),
 array([-179.75, -179.25, -178.75, ...,  178.75,  179.25,  179.75]))



In [21]:

    
base_dataframe = pd.DataFrame([all_coordinates[0], all_coordinates[1]]).T
base_dataframe.columns=['decimallatitude', 'decimallongitude']
base_dataframe.set_index(['decimallatitude', 'decimallongitude'], inplace=True, drop=True)



In [22]:

    
base_dataframe.head()









    Out[22]:






  
    
      
      
    
    
      decimallatitude
      decimallongitude
    
  
  
    
      89.75
      -179.75
    
    
      -179.25
    
    
      -178.75
    
    
      -178.25
    
    
      -177.75



In [23]:

    
base_dataframe.tail()









    Out[23]:






  
    
      
      
    
    
      decimallatitude
      decimallongitude
    
  
  
    
      -89.75
      177.75
    
    
      178.25
    
    
      178.75
    
    
      179.25
    
    
      179.75

5.2 Get arrays of coordinates (latitude/longitude) for each cell (middle point) in a presences pixel map



In [24]:

    
presence_coordinates = salmo_trutta.pixel_to_world_coordinates()









    



No raster data provided, attempting to load default...
Loaded raster data from ./salmo_trutta_full.tif 
Driver name: GTiff 
Metadata: {'affine': Affine(0.5, 0.0, -180.0,
       0.0, -0.5, 90.0),
 'count': 1,
 'crs': {'init': 'epsg:4326'},
 'driver': 'GTiff',
 'dtype': 'uint8',
 'height': 360,
 'nodata': 0.0,
 'transform': (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5),
 'width': 720} 
Resolution: x_res=720 y_res=360.
Bounds: BoundingBox(left=-180.0, bottom=-90.0, right=180.0, top=90.0) 
Coordinate reference system: {'init': 'epsg:4326'} 
Affine transformation: (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5) 
Number of layers: 1 
Dataset loaded. Use .read() or .read_masks() to access the layers.
Succesfully loaded existing raster data from ./salmo_trutta_full.tif.
Affine transformation T0:
 |-0.50, 0.00, 90.00|
| 0.00, 0.50,-180.00|
| 0.00, 0.00, 1.00| 
Raster data shape: (360, 720) 
Affine transformation T1:
 |-0.50, 0.00, 89.75|
| 0.00, 0.50,-179.75|
| 0.00, 0.00, 1.00| 
Filtering out no_data pixels.



In [25]:

    
presence_coordinates









    Out[25]:





(array([ 70.75,  70.75,  70.75, ...,  30.75,  30.75,  30.75]),
 array([ 22.75,  23.25,  24.75, ...,  78.25,  78.75,  79.25]))



In [26]:

    
presences_dataframe = pd.DataFrame([presence_coordinates[0], presence_coordinates[1]]).T
presences_dataframe.columns=['decimallatitude', 'decimallongitude']
presences_dataframe[salmo_trutta.name_species] = 1 # fill presences with 1's
presences_dataframe.set_index(['decimallatitude', 'decimallongitude'], inplace=True, drop=True)
presences_dataframe.head()









    Out[26]:






  
    
      
      
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      70.75
      22.75
      1
    
    
      23.25
      1
    
    
      24.75
      1
    
    
      25.25
      1
    
    
      26.25
      1



In [27]:

    
presences_dataframe.tail()









    Out[27]:






  
    
      
      
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      35.25
      25.25
      1
    
    
      26.25
      1
    
    
      30.75
      78.25
      1
    
    
      78.75
      1
    
    
      79.25
      1

5.3 Get arrays of coordinates (latitude/longitude) for each cell (middle point) in a pseudo_absences pixel map



In [28]:

    
pseudo_absence_coordinates = biomes_adf.pixel_to_world_coordinates(raster_data=pseudo_absences)









    



Transforming to world coordinates...
Affine transformation T0:
 |-0.50, 0.00, 90.00|
| 0.00, 0.50,-180.00|
| 0.00, 0.00, 1.00| 
Raster data shape: (360, 720) 
Affine transformation T1:
 |-0.50, 0.00, 89.75|
| 0.00, 0.50,-179.75|
| 0.00, 0.00, 1.00| 
Filtering out no_data pixels.
Transformation to world coordinates completed.



In [29]:

    
pseudo_absences_dataframe = pd.DataFrame([pseudo_absence_coordinates[0], pseudo_absence_coordinates[1]]).T
pseudo_absences_dataframe.columns=['decimallatitude', 'decimallongitude']
pseudo_absences_dataframe[salmo_trutta.name_species] = 0 # fill pseudo-absences with 0
pseudo_absences_dataframe.set_index(['decimallatitude', 'decimallongitude'], inplace=True, drop=True)



In [30]:

    
pseudo_absences_dataframe.head()









    Out[30]:






  
    
      
      
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      78.75
      102.25
      0
    
    
      76.75
      103.25
      0
    
    
      149.25
      0
    
    
      76.25
      62.25
      0
    
    
      102.75
      0



In [31]:

    
pseudo_absences_dataframe.tail()









    Out[31]:






  
    
      
      
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      17.75
      74.75
      0
    
    
      17.25
      43.25
      0
    
    
      74.75
      0
    
    
      16.75
      43.75
      0
    
    
      75.25
      0

5.4 Get arrays of coordinates (latitude/longitude) for each cell (middle point) in a minimum temperature pixel map



In [32]:

    
from iSDM.environment import ClimateLayer
water_min_layer =  ClimateLayer(file_path="../data/watertemp/min_wt_2000.tif") 
water_min_reader = water_min_layer.load_data()
# HERE: should we ignore cells with no-data values for temperature? They are set to a really big negative number
# for now we keep them, otherwise could be NaN
water_min_coordinates = water_min_layer.pixel_to_world_coordinates(filter_no_data_value=False)









    



Loaded raster data from ../data/watertemp/min_wt_2000.tif 
Driver name: GTiff 
Metadata: {'affine': Affine(0.5, 0.0, -180.0,
       0.0, -0.5, 90.0),
 'count': 1,
 'crs': {'init': 'epsg:4326'},
 'driver': 'GTiff',
 'dtype': 'float32',
 'height': 360,
 'nodata': -3.402823e+38,
 'transform': (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5),
 'width': 720} 
Resolution: x_res=720 y_res=360.
Bounds: BoundingBox(left=-180.0, bottom=-90.0, right=180.0, top=90.0) 
Coordinate reference system: {'init': 'epsg:4326'} 
Affine transformation: (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5) 
Number of layers: 1 
Dataset loaded. Use .read() or .read_masks() to access the layers.
No raster data provided, attempting to load default...
Loaded raster data from ../data/watertemp/min_wt_2000.tif 
Driver name: GTiff 
Metadata: {'affine': Affine(0.5, 0.0, -180.0,
       0.0, -0.5, 90.0),
 'count': 1,
 'crs': {'init': 'epsg:4326'},
 'driver': 'GTiff',
 'dtype': 'float32',
 'height': 360,
 'nodata': -3.402823e+38,
 'transform': (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5),
 'width': 720} 
Resolution: x_res=720 y_res=360.
Bounds: BoundingBox(left=-180.0, bottom=-90.0, right=180.0, top=90.0) 
Coordinate reference system: {'init': 'epsg:4326'} 
Affine transformation: (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5) 
Number of layers: 1 
Dataset loaded. Use .read() or .read_masks() to access the layers.
Succesfully loaded existing raster data from ../data/watertemp/min_wt_2000.tif.
Transforming to world coordinates...
Affine transformation T0:
 |-0.50, 0.00, 90.00|
| 0.00, 0.50,-180.00|
| 0.00, 0.00, 1.00| 
Raster data shape: (360, 720) 
Affine transformation T1:
 |-0.50, 0.00, 89.75|
| 0.00, 0.50,-179.75|
| 0.00, 0.00, 1.00| 
Not filtering any no_data pixels.
Transformation to world coordinates completed.



In [33]:

    
water_min_coordinates









    Out[33]:





(array([ 89.75,  89.75,  89.75, ..., -89.75, -89.75, -89.75]),
 array([-179.75, -179.25, -178.75, ...,  178.75,  179.25,  179.75]))



In [34]:

    
mintemp_dataframe = pd.DataFrame([water_min_coordinates[0], water_min_coordinates[1]]).T
mintemp_dataframe.columns=['decimallatitude', 'decimallongitude']
water_min_matrix = water_min_reader.read(1)
mintemp_dataframe['MinT'] = water_min_matrix.reshape(np.product(water_min_matrix.shape))
mintemp_dataframe.set_index(['decimallatitude', 'decimallongitude'], inplace=True, drop=True)
mintemp_dataframe.head()









    Out[34]:






  
    
      
      
      MinT
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      89.75
      -179.75
      -3.402823e+38
    
    
      -179.25
      -3.402823e+38
    
    
      -178.75
      -3.402823e+38
    
    
      -178.25
      -3.402823e+38
    
    
      -177.75
      -3.402823e+38



In [35]:

    
mintemp_dataframe.tail()









    Out[35]:






  
    
      
      
      MinT
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      -89.75
      177.75
      -3.402823e+38
    
    
      178.25
      -3.402823e+38
    
    
      178.75
      -3.402823e+38
    
    
      179.25
      -3.402823e+38
    
    
      179.75
      -3.402823e+38

5.5 Get arrays of coordinates (latitude/longitude) for each cell (middle point) in a maximum temperature pixel map



In [36]:

    
water_max_layer =  ClimateLayer(file_path="../data/watertemp/max_wt_2000.tif") 
water_max_reader = water_max_layer.load_data()
# HERE: should we ignore cells with no-data values for temperature? They are set to a really big negative number
# for now we keep them, otherwise could be NaN
water_max_coordinates = water_max_layer.pixel_to_world_coordinates(filter_no_data_value=False)









    



Loaded raster data from ../data/watertemp/max_wt_2000.tif 
Driver name: GTiff 
Metadata: {'affine': Affine(0.5, 0.0, -180.0,
       0.0, -0.5, 90.0),
 'count': 1,
 'crs': {'init': 'epsg:4326'},
 'driver': 'GTiff',
 'dtype': 'float32',
 'height': 360,
 'nodata': -3.402823e+38,
 'transform': (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5),
 'width': 720} 
Resolution: x_res=720 y_res=360.
Bounds: BoundingBox(left=-180.0, bottom=-90.0, right=180.0, top=90.0) 
Coordinate reference system: {'init': 'epsg:4326'} 
Affine transformation: (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5) 
Number of layers: 1 
Dataset loaded. Use .read() or .read_masks() to access the layers.
No raster data provided, attempting to load default...
Loaded raster data from ../data/watertemp/max_wt_2000.tif 
Driver name: GTiff 
Metadata: {'affine': Affine(0.5, 0.0, -180.0,
       0.0, -0.5, 90.0),
 'count': 1,
 'crs': {'init': 'epsg:4326'},
 'driver': 'GTiff',
 'dtype': 'float32',
 'height': 360,
 'nodata': -3.402823e+38,
 'transform': (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5),
 'width': 720} 
Resolution: x_res=720 y_res=360.
Bounds: BoundingBox(left=-180.0, bottom=-90.0, right=180.0, top=90.0) 
Coordinate reference system: {'init': 'epsg:4326'} 
Affine transformation: (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5) 
Number of layers: 1 
Dataset loaded. Use .read() or .read_masks() to access the layers.
Succesfully loaded existing raster data from ../data/watertemp/max_wt_2000.tif.
Transforming to world coordinates...
Affine transformation T0:
 |-0.50, 0.00, 90.00|
| 0.00, 0.50,-180.00|
| 0.00, 0.00, 1.00| 
Raster data shape: (360, 720) 
Affine transformation T1:
 |-0.50, 0.00, 89.75|
| 0.00, 0.50,-179.75|
| 0.00, 0.00, 1.00| 
Not filtering any no_data pixels.
Transformation to world coordinates completed.



In [37]:

    
maxtemp_dataframe = pd.DataFrame([water_max_coordinates[0], water_max_coordinates[1]]).T
maxtemp_dataframe.columns=['decimallatitude', 'decimallongitude']
water_max_matrix = water_max_reader.read(1)
maxtemp_dataframe['MaxT'] = water_max_matrix.reshape(np.product(water_max_matrix.shape))
maxtemp_dataframe.set_index(['decimallatitude', 'decimallongitude'], inplace=True, drop=True)
maxtemp_dataframe.head()









    Out[37]:






  
    
      
      
      MaxT
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      89.75
      -179.75
      -3.402823e+38
    
    
      -179.25
      -3.402823e+38
    
    
      -178.75
      -3.402823e+38
    
    
      -178.25
      -3.402823e+38
    
    
      -177.75
      -3.402823e+38



In [38]:

    
maxtemp_dataframe.tail()









    Out[38]:






  
    
      
      
      MaxT
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      -89.75
      177.75
      -3.402823e+38
    
    
      178.25
      -3.402823e+38
    
    
      178.75
      -3.402823e+38
    
    
      179.25
      -3.402823e+38
    
    
      179.75
      -3.402823e+38

5.6 Get arrays of coordinates (latitude/longitude) for each cell (middle point) in a mean temperature pixel map



In [39]:

    
water_mean_layer =  ClimateLayer(file_path="../data/watertemp/mean_wt_2000.tif") 
water_mean_reader = water_mean_layer.load_data()
# HERE: should we ignore cells with no-data values for temperature? They are set to a really big negative number
# for now we keep them, otherwise could be NaN
water_mean_coordinates = water_mean_layer.pixel_to_world_coordinates(filter_no_data_value=False)









    



Loaded raster data from ../data/watertemp/mean_wt_2000.tif 
Driver name: GTiff 
Metadata: {'affine': Affine(0.5, 0.0, -180.0,
       0.0, -0.5, 90.0),
 'count': 1,
 'crs': {'init': 'epsg:4326'},
 'driver': 'GTiff',
 'dtype': 'float32',
 'height': 360,
 'nodata': -3.402823e+38,
 'transform': (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5),
 'width': 720} 
Resolution: x_res=720 y_res=360.
Bounds: BoundingBox(left=-180.0, bottom=-90.0, right=180.0, top=90.0) 
Coordinate reference system: {'init': 'epsg:4326'} 
Affine transformation: (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5) 
Number of layers: 1 
Dataset loaded. Use .read() or .read_masks() to access the layers.
No raster data provided, attempting to load default...
Loaded raster data from ../data/watertemp/mean_wt_2000.tif 
Driver name: GTiff 
Metadata: {'affine': Affine(0.5, 0.0, -180.0,
       0.0, -0.5, 90.0),
 'count': 1,
 'crs': {'init': 'epsg:4326'},
 'driver': 'GTiff',
 'dtype': 'float32',
 'height': 360,
 'nodata': -3.402823e+38,
 'transform': (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5),
 'width': 720} 
Resolution: x_res=720 y_res=360.
Bounds: BoundingBox(left=-180.0, bottom=-90.0, right=180.0, top=90.0) 
Coordinate reference system: {'init': 'epsg:4326'} 
Affine transformation: (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5) 
Number of layers: 1 
Dataset loaded. Use .read() or .read_masks() to access the layers.
Succesfully loaded existing raster data from ../data/watertemp/mean_wt_2000.tif.
Transforming to world coordinates...
Affine transformation T0:
 |-0.50, 0.00, 90.00|
| 0.00, 0.50,-180.00|
| 0.00, 0.00, 1.00| 
Raster data shape: (360, 720) 
Affine transformation T1:
 |-0.50, 0.00, 89.75|
| 0.00, 0.50,-179.75|
| 0.00, 0.00, 1.00| 
Not filtering any no_data pixels.
Transformation to world coordinates completed.



In [40]:

    
meantemp_dataframe = pd.DataFrame([water_mean_coordinates[0], water_mean_coordinates[1]]).T
meantemp_dataframe.columns=['decimallatitude', 'decimallongitude']
water_mean_matrix = water_mean_reader.read(1)
meantemp_dataframe['MeanT'] = water_mean_matrix.reshape(np.product(water_mean_matrix.shape))
meantemp_dataframe.set_index(['decimallatitude', 'decimallongitude'], inplace=True, drop=True)
meantemp_dataframe.head()









    Out[40]:






  
    
      
      
      MeanT
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      89.75
      -179.75
      -3.402823e+38
    
    
      -179.25
      -3.402823e+38
    
    
      -178.75
      -3.402823e+38
    
    
      -178.25
      -3.402823e+38
    
    
      -177.75
      -3.402823e+38



In [41]:

    
meantemp_dataframe.tail()









    Out[41]:






  
    
      
      
      MeanT
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      -89.75
      177.75
      -3.402823e+38
    
    
      178.25
      -3.402823e+38
    
    
      178.75
      -3.402823e+38
    
    
      179.25
      -3.402823e+38
    
    
      179.75
      -3.402823e+38



In [42]:

    
# merge base with presences
merged = base_dataframe.combine_first(presences_dataframe)



In [43]:

    
merged.head()









    Out[43]:






  
    
      
      
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      -89.75
      -179.75
      NaN
    
    
      -179.25
      NaN
    
    
      -178.75
      NaN
    
    
      -178.25
      NaN
    
    
      -177.75
      NaN



In [44]:

    
merged.tail()









    Out[44]:






  
    
      
      
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      89.75
      177.75
      NaN
    
    
      178.25
      NaN
    
    
      178.75
      NaN
    
    
      179.25
      NaN
    
    
      179.75
      NaN



In [45]:

    
# merge based+presences with pseudo-absences
# merged2 = pd.merge(merged1, pseudo_absences_dataframe, on=["decimallatitude", "decimallongitude", salmo_trutta.name_species], how="outer")

merged = merged.combine_first(pseudo_absences_dataframe)

http://pandas.pydata.org/pandas-docs/stable/merging.html

For this, use the combine_first method.

Note that this method only takes values from the right DataFrame if they are missing in the left DataFrame. A related method, update, alters non-NA values inplace



In [46]:

    
merged.head()









    Out[46]:






  
    
      
      
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      -89.75
      -179.75
      NaN
    
    
      -179.25
      NaN
    
    
      -178.75
      NaN
    
    
      -178.25
      NaN
    
    
      -177.75
      NaN



In [47]:

    
merged.tail()









    Out[47]:






  
    
      
      
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      89.75
      177.75
      NaN
    
    
      178.25
      NaN
    
    
      178.75
      NaN
    
    
      179.25
      NaN
    
    
      179.75
      NaN



In [48]:

    
# merge base+presences+pseudo-absences with min temperature
#merged3 = pd.merge(merged2, mintemp_dataframe, on=["decimallatitude", "decimallongitude"], how="outer")

merged = merged.combine_first(mintemp_dataframe)



In [49]:

    
merged.head()









    Out[49]:






  
    
      
      
      MinT
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
      
    
  
  
    
      -89.75
      -179.75
      -3.402823e+38
      NaN
    
    
      -179.25
      -3.402823e+38
      NaN
    
    
      -178.75
      -3.402823e+38
      NaN
    
    
      -178.25
      -3.402823e+38
      NaN
    
    
      -177.75
      -3.402823e+38
      NaN



In [50]:

    
merged.tail()









    Out[50]:






  
    
      
      
      MinT
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
      
    
  
  
    
      89.75
      177.75
      -3.402823e+38
      NaN
    
    
      178.25
      -3.402823e+38
      NaN
    
    
      178.75
      -3.402823e+38
      NaN
    
    
      179.25
      -3.402823e+38
      NaN
    
    
      179.75
      -3.402823e+38
      NaN



In [51]:

    
# merged4 = pd.merge(merged3, maxtemp_dataframe, on=["decimallatitude", "decimallongitude"], how="outer")
merged = merged.combine_first(maxtemp_dataframe)



In [52]:

    
merged.head()









    Out[52]:






  
    
      
      
      MaxT
      MinT
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
      
      
    
  
  
    
      -89.75
      -179.75
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      -179.25
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      -178.75
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      -178.25
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      -177.75
      -3.402823e+38
      -3.402823e+38
      NaN



In [53]:

    
merged.tail()









    Out[53]:






  
    
      
      
      MaxT
      MinT
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
      
      
    
  
  
    
      89.75
      177.75
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      178.25
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      178.75
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      179.25
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      179.75
      -3.402823e+38
      -3.402823e+38
      NaN



In [54]:

    
# merged5 = pd.merge(merged4, meantemp_dataframe, on=["decimallatitude", "decimallongitude"], how="outer")
merged = merged.combine_first(meantemp_dataframe)



In [55]:

    
merged.tail()









    Out[55]:






  
    
      
      
      MaxT
      MeanT
      MinT
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
      
      
      
    
  
  
    
      89.75
      177.75
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      178.25
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      178.75
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      179.25
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      179.75
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN



In [58]:

    
merged.to_csv("../data/fish/selection/salmo_trutta.csv")



In [56]:

    
merged[merged['Salmo trutta']==0].shape[0] # should be equal to number of pseudo absences below









    Out[56]:





977



In [57]:

    
pseudo_absence_coordinates[0].shape[0]









    Out[57]:





977



In [58]:

    
merged[merged['Salmo trutta']==1].shape[0]  # should be equal to number of presences below









    Out[58]:





5197



In [59]:

    
presence_coordinates[0].shape[0]









    Out[59]:





5197



In [60]:

    
merged[merged['Salmo trutta'].isnull()].shape[0] # all that's left









    Out[60]:





253026



In [61]:

    
360 * 720 == merged[merged['Salmo trutta']==0].shape[0] + merged[merged['Salmo trutta']==1].shape[0] + merged[merged['Salmo trutta'].isnull()].shape[0]









    Out[61]:





True



In [62]:

    
# == all pixels in 360 x 720 matrix

6. Repeat with other species



In [63]:

    
# Download from Google Drive: https://drive.google.com/open?id=0B9cazFzBtPuCaW0wRkk2N0g5d1k
lepidomeda_mollispinis = IUCNSpecies(name_species='Lepidomeda mollispinis')
lepidomeda_mollispinis.load_shapefile("../data/fish/selection/lepidomeda_mollispinis")









    



Enabled Shapely speedups for performance.
Loading data from: ../data/fish/selection/lepidomeda_mollispinis
The shapefile contains data on 1 species areas.



In [64]:

    
rasterized_lm = lepidomeda_mollispinis.rasterize(raster_file="./lepidomeda_mollispinis_full.tif", pixel_size=0.5)









    



RASTERIO: Data rasterized into file ./lepidomeda_mollispinis_full.tif 
RASTERIO: Resolution: x_res=720 y_res=360



In [65]:

    
plt.figure(figsize=(25,20))
plt.imshow(rasterized_lm, cmap="hot", interpolation="none")









    Out[65]:





<matplotlib.image.AxesImage at 0x7fd0c8c41780>



In [66]:

    
selected_layers_lm, pseudo_absences_lm = biomes_adf.sample_pseudo_absences(species_raster_data=rasterized_lm, continents_raster_data=continents_rasters, number_of_pseudopoints=1000)









    



Succesfully loaded existing raster data from ../data/rebioms/w001001.adf.
Will use the continents/biogeographic raster data for further clipping of the pseudo-absence regions. 
Sampling 1000 pseudo-absence points from environmental layer.
The following unique (pixel) values will be taken into account for sampling pseudo-absences
[15 16]
There are 1897 pixels to sample from...
Filling 1000 random pixel positions...
Sampled 774 unique pixels as pseudo-absences.



In [67]:

    
plt.figure(figsize=(25,20))
plt.imshow(selected_layers_lm, cmap="hot", interpolation="none")









    Out[67]:





<matplotlib.image.AxesImage at 0x7fd0c8c2b4a8>



In [68]:

    
plt.figure(figsize=(25,20))
plt.imshow(pseudo_absences_lm, cmap="hot", interpolation="none")









    Out[68]:





<matplotlib.image.AxesImage at 0x7fd0c8b96198>



In [69]:

    
presence_coordinates_lm = lepidomeda_mollispinis.pixel_to_world_coordinates()









    



No raster data provided, attempting to load default...
Loaded raster data from ./lepidomeda_mollispinis_full.tif 
Driver name: GTiff 
Metadata: {'affine': Affine(0.5, 0.0, -180.0,
       0.0, -0.5, 90.0),
 'count': 1,
 'crs': {'init': 'epsg:4326'},
 'driver': 'GTiff',
 'dtype': 'uint8',
 'height': 360,
 'nodata': 0.0,
 'transform': (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5),
 'width': 720} 
Resolution: x_res=720 y_res=360.
Bounds: BoundingBox(left=-180.0, bottom=-90.0, right=180.0, top=90.0) 
Coordinate reference system: {'init': 'epsg:4326'} 
Affine transformation: (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5) 
Number of layers: 1 
Dataset loaded. Use .read() or .read_masks() to access the layers.
Succesfully loaded existing raster data from ./lepidomeda_mollispinis_full.tif.
Affine transformation T0:
 |-0.50, 0.00, 90.00|
| 0.00, 0.50,-180.00|
| 0.00, 0.00, 1.00| 
Raster data shape: (360, 720) 
Affine transformation T1:
 |-0.50, 0.00, 89.75|
| 0.00, 0.50,-179.75|
| 0.00, 0.00, 1.00| 
Filtering out no_data pixels.



In [70]:

    
presences_dataframe = pd.DataFrame([presence_coordinates_lm[0], presence_coordinates_lm[1]]).T
presences_dataframe.columns=['decimallatitude', 'decimallongitude']
presences_dataframe[lepidomeda_mollispinis.name_species] = 1 # fill presences with 1's
presences_dataframe.set_index(['decimallatitude', 'decimallongitude'], inplace=True, drop=True)
presences_dataframe.head()









    Out[70]:






  
    
      
      
      Lepidomeda mollispinis
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      38.25
      -114.25
      1
    
    
      37.75
      -114.25
      1
    
    
      37.25
      -114.25
      1
    
    
      -113.75
      1
    
    
      -113.25
      1



In [71]:

    
presences_dataframe.tail()









    Out[71]:






  
    
      
      
      Lepidomeda mollispinis
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      37.25
      -114.25
      1
    
    
      -113.75
      1
    
    
      -113.25
      1
    
    
      -112.75
      1
    
    
      36.75
      -114.25
      1



In [72]:

    
pseudo_absence_coordinates_lm = biomes_adf.pixel_to_world_coordinates(raster_data=pseudo_absences_lm)









    



Transforming to world coordinates...
Affine transformation T0:
 |-0.50, 0.00, 90.00|
| 0.00, 0.50,-180.00|
| 0.00, 0.00, 1.00| 
Raster data shape: (360, 720) 
Affine transformation T1:
 |-0.50, 0.00, 89.75|
| 0.00, 0.50,-179.75|
| 0.00, 0.00, 1.00| 
Filtering out no_data pixels.
Transformation to world coordinates completed.



In [73]:

    
pseudo_absences_dataframe = pd.DataFrame([pseudo_absence_coordinates_lm[0], pseudo_absence_coordinates_lm[1]]).T
pseudo_absences_dataframe.columns=['decimallatitude', 'decimallongitude']
pseudo_absences_dataframe[lepidomeda_mollispinis.name_species] = 0
pseudo_absences_dataframe.set_index(['decimallatitude', 'decimallongitude'], inplace=True, drop=True)



In [74]:

    
pseudo_absences_dataframe.head()









    Out[74]:






  
    
      
      
      Lepidomeda mollispinis
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      67.25
      -145.75
      0
    
    
      66.75
      -147.75
      0
    
    
      -146.25
      0
    
    
      -145.75
      0
    
    
      -144.25
      0



In [75]:

    
pseudo_absences_dataframe.tail()









    Out[75]:






  
    
      
      
      Lepidomeda mollispinis
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      24.25
      -102.75
      0
    
    
      -102.25
      0
    
    
      23.75
      -109.75
      0
    
    
      -102.25
      0
    
    
      23.25
      -102.75
      0



In [76]:

    
merged1 = merged.combine_first(presences_dataframe)



In [77]:

    
merged1.tail()









    Out[77]:






  
    
      
      
      Lepidomeda mollispinis
      MaxT
      MeanT
      MinT
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
      
      
      
      
    
  
  
    
      89.75
      177.75
      NaN
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      178.25
      NaN
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      178.75
      NaN
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      179.25
      NaN
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      179.75
      NaN
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN



In [78]:

    
merged1 = merged1.combine_first(pseudo_absences_dataframe)



In [79]:

    
merged1.tail()









    Out[79]:






  
    
      
      
      Lepidomeda mollispinis
      MaxT
      MeanT
      MinT
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
      
      
      
      
    
  
  
    
      89.75
      177.75
      NaN
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      178.25
      NaN
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      178.75
      NaN
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      179.25
      NaN
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      179.75
      NaN
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN



In [80]:

    
merged1['Lepidomeda mollispinis'].unique()









    Out[80]:





array([ nan,   0.,   1.])



In [81]:

    
merged1[merged1['Lepidomeda mollispinis']==0].shape # pseudo-absences









    Out[81]:





(774, 5)



In [82]:

    
merged1[merged1['Lepidomeda mollispinis']==1].shape # presences









    Out[82]:





(7, 5)



In [83]:

    
merged1[merged1['Lepidomeda mollispinis'].isnull()].shape









    Out[83]:





(258419, 5)



In [84]:

    
merged1[merged1['Lepidomeda mollispinis'].isnull()].shape[0] + merged1[merged1['Lepidomeda mollispinis']==1].shape[0] + merged1[merged1['Lepidomeda mollispinis']==0].shape[0]









    Out[84]:





259200



In [85]:

    
salmo_trutta.get_data().shape_area.sum()









    Out[85]:





1300.4651633274541



In [86]:

    
lepidomeda_mollispinis.get_data().shape_area.sum()









    Out[86]:





1.8861804794299999

Third species.... (largest IUCN area, also plenty of occurrences in GBIF)



In [87]:

    
# Download from Google drive: https://drive.google.com/open?id=0B9cazFzBtPuCamEwWlZxV3lBZmc
esox_lucius = IUCNSpecies(name_species='Esox lucius')
esox_lucius.load_shapefile("../data/fish/selection/esox_lucius/")









    



Enabled Shapely speedups for performance.
Loading data from: ../data/fish/selection/esox_lucius/
The shapefile contains data on 6 species areas.



In [88]:

    
rasterized_el = esox_lucius.rasterize(raster_file="./esox_lucius_full.tif", pixel_size=0.5)









    



RASTERIO: Data rasterized into file ./esox_lucius_full.tif 
RASTERIO: Resolution: x_res=720 y_res=360



In [89]:

    
plt.figure(figsize=(25,20))
plt.imshow(rasterized_el, cmap="hot", interpolation="none")









    Out[89]:





<matplotlib.image.AxesImage at 0x7fd0c69446d8>



In [90]:

    
selected_layers_el, pseudo_absences_el = biomes_adf.sample_pseudo_absences(species_raster_data=rasterized_el, continents_raster_data=continents_rasters, number_of_pseudopoints=1000)









    



Succesfully loaded existing raster data from ../data/rebioms/w001001.adf.
Will use the continents/biogeographic raster data for further clipping of the pseudo-absence regions. 
Sampling 1000 pseudo-absence points from environmental layer.
The following unique (pixel) values will be taken into account for sampling pseudo-absences
[ 7  8  9 10 11 12 13 14 15 16 17 21]
There are 17842 pixels to sample from...
Filling 1000 random pixel positions...
Sampled 972 unique pixels as pseudo-absences.



In [91]:

    
plt.figure(figsize=(25,20))
plt.imshow(selected_layers_el, cmap="hot", interpolation="none")









    Out[91]:





<matplotlib.image.AxesImage at 0x7fd0c3461940>

Hmm, why does it take South Afrika into account?



In [92]:

    
np.where((continents_rasters[2]+rasterized_el)>1)









    Out[92]:





(array([], dtype=int64), array([], dtype=int64))

This above adds two bands with ones and zeroes: The raster for the species, and the raster for the South-Afrika continent (id=2). Then it finds pixel positions where the "summed band" has value >1. (np.where(...))

Result indicates the pixel x/y positions. Basically they overlap at one pixel position. There, the value is 2.



In [93]:

    
(continents_rasters[2]+rasterized_el)[108,348]









    Out[93]:





1



In [94]:

    
plt.figure(figsize=(25,20))
plt.imshow(pseudo_absences_el, cmap="hot", interpolation="none")









    Out[94]:





<matplotlib.image.AxesImage at 0x7fd0c33627f0>



In [95]:

    
presence_coordinates_el = esox_lucius.pixel_to_world_coordinates()









    



No raster data provided, attempting to load default...
Loaded raster data from ./esox_lucius_full.tif 
Driver name: GTiff 
Metadata: {'affine': Affine(0.5, 0.0, -180.0,
       0.0, -0.5, 90.0),
 'count': 1,
 'crs': {'init': 'epsg:4326'},
 'driver': 'GTiff',
 'dtype': 'uint8',
 'height': 360,
 'nodata': 0.0,
 'transform': (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5),
 'width': 720} 
Resolution: x_res=720 y_res=360.
Bounds: BoundingBox(left=-180.0, bottom=-90.0, right=180.0, top=90.0) 
Coordinate reference system: {'init': 'epsg:4326'} 
Affine transformation: (-180.0, 0.5, 0.0, 90.0, 0.0, -0.5) 
Number of layers: 1 
Dataset loaded. Use .read() or .read_masks() to access the layers.
Succesfully loaded existing raster data from ./esox_lucius_full.tif.
Affine transformation T0:
 |-0.50, 0.00, 90.00|
| 0.00, 0.50,-180.00|
| 0.00, 0.00, 1.00| 
Raster data shape: (360, 720) 
Affine transformation T1:
 |-0.50, 0.00, 89.75|
| 0.00, 0.50,-179.75|
| 0.00, 0.00, 1.00| 
Filtering out no_data pixels.



In [96]:

    
presences_dataframe = pd.DataFrame([presence_coordinates_el[0], presence_coordinates_el[1]]).T
presences_dataframe.columns=['decimallatitude', 'decimallongitude']
presences_dataframe[esox_lucius.name_species] = 1 # fill presences with 1's
presences_dataframe.set_index(['decimallatitude', 'decimallongitude'], inplace=True, drop=True)
presences_dataframe.head()









    Out[96]:






  
    
      
      
      Esox lucius
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      74.25
      91.75
      1
    
    
      92.25
      1
    
    
      92.75
      1
    
    
      93.25
      1
    
    
      93.75
      1



In [97]:

    
pseudo_absence_coordinates_el = biomes_adf.pixel_to_world_coordinates(raster_data=pseudo_absences_el)









    



Transforming to world coordinates...
Affine transformation T0:
 |-0.50, 0.00, 90.00|
| 0.00, 0.50,-180.00|
| 0.00, 0.00, 1.00| 
Raster data shape: (360, 720) 
Affine transformation T1:
 |-0.50, 0.00, 89.75|
| 0.00, 0.50,-179.75|
| 0.00, 0.00, 1.00| 
Filtering out no_data pixels.
Transformation to world coordinates completed.



In [98]:

    
pseudo_absences_dataframe = pd.DataFrame([pseudo_absence_coordinates_el[0], pseudo_absence_coordinates_el[1]]).T
pseudo_absences_dataframe.columns=['decimallatitude', 'decimallongitude']
pseudo_absences_dataframe[esox_lucius.name_species] = 0 # fill pseudo-absences with 0
pseudo_absences_dataframe.set_index(['decimallatitude', 'decimallongitude'], inplace=True, drop=True)
pseudo_absences_dataframe.head()









    Out[98]:






  
    
      
      
      Esox lucius
    
    
      decimallatitude
      decimallongitude
      
    
  
  
    
      82.75
      -75.75
      0
    
    
      -35.75
      0
    
    
      -25.75
      0
    
    
      82.25
      -83.75
      0
    
    
      -71.25
      0



In [99]:

    
merged2 = merged1.combine_first(presences_dataframe)



In [100]:

    
merged2.head()









    Out[100]:






  
    
      
      
      Esox lucius
      Lepidomeda mollispinis
      MaxT
      MeanT
      MinT
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
      
      
      
      
      
    
  
  
    
      -89.75
      -179.75
      NaN
      NaN
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      -179.25
      NaN
      NaN
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      -178.75
      NaN
      NaN
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      -178.25
      NaN
      NaN
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      -177.75
      NaN
      NaN
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN



In [101]:

    
merged2 = merged2.combine_first(pseudo_absences_dataframe)



In [102]:

    
merged2.head()









    Out[102]:






  
    
      
      
      Esox lucius
      Lepidomeda mollispinis
      MaxT
      MeanT
      MinT
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
      
      
      
      
      
    
  
  
    
      -89.75
      -179.75
      NaN
      NaN
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      -179.25
      NaN
      NaN
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      -178.75
      NaN
      NaN
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      -178.25
      NaN
      NaN
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      -177.75
      NaN
      NaN
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN



In [103]:

    
merged2['Esox lucius'].unique()









    Out[103]:





array([ nan,   0.,   1.])



In [104]:

    
merged2[merged2['Esox lucius']==0].shape # pseudo-absences









    Out[104]:





(972, 6)



In [105]:

    
merged2[merged2['Esox lucius']==1].shape # presences









    Out[105]:





(21815, 6)



In [106]:

    
merged2[merged2['Esox lucius'].isnull()].shape









    Out[106]:





(236413, 6)



In [107]:

    
merged2[merged2['Esox lucius'].isnull()].shape[0] + merged2[merged2['Esox lucius']==1].shape[0] + merged2[merged2['Esox lucius']==0].shape[0]









    Out[107]:





259200



In [108]:

    
merged2.tail()









    Out[108]:






  
    
      
      
      Esox lucius
      Lepidomeda mollispinis
      MaxT
      MeanT
      MinT
      Salmo trutta
    
    
      decimallatitude
      decimallongitude
      
      
      
      
      
      
    
  
  
    
      89.75
      177.75
      NaN
      NaN
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      178.25
      NaN
      NaN
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      178.75
      NaN
      NaN
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      179.25
      NaN
      NaN
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
    
    
      179.75
      NaN
      NaN
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN



In [109]:

    
# rearange columns (nothing critical)



In [110]:

    
cols = merged2.columns.values
cols1 = [cols[4], cols[2], cols[3], cols[0], cols[1], cols[5]]



In [111]:

    
merged2 = merged2[cols1]



In [112]:

    
merged2.to_csv("../data/fish/selection/dataframe_merged_all_touching_false.csv")



In [113]:

    
merged2.columns.values









    Out[113]:





array(['MinT', 'MaxT', 'MeanT', 'Esox lucius', 'Lepidomeda mollispinis',
       'Salmo trutta'], dtype=object)

Add continents column



In [114]:

    
for idx, band in enumerate(continents_rasters):
    continents_coordinates = biomes_adf.pixel_to_world_coordinates(raster_data=band)
    continent_dataframe = pd.DataFrame([continents_coordinates[0], continents_coordinates[1]]).T
    continent_dataframe.columns=['decimallatitude', 'decimallongitude']
    continent_dataframe['Continent'] = idx
    continent_dataframe.set_index(['decimallatitude', 'decimallongitude'], inplace=True, drop=True)
    continent_dataframe.head()
    merged2 = merged2.combine_first(continent_dataframe)









    



Transforming to world coordinates...
Affine transformation T0:
 |-0.50, 0.00, 90.00|
| 0.00, 0.50,-180.00|
| 0.00, 0.00, 1.00| 
Raster data shape: (360, 720) 
Affine transformation T1:
 |-0.50, 0.00, 89.75|
| 0.00, 0.50,-179.75|
| 0.00, 0.00, 1.00| 
Filtering out no_data pixels.
Transformation to world coordinates completed.
Transforming to world coordinates...
Affine transformation T0:
 |-0.50, 0.00, 90.00|
| 0.00, 0.50,-180.00|
| 0.00, 0.00, 1.00| 
Raster data shape: (360, 720) 
Affine transformation T1:
 |-0.50, 0.00, 89.75|
| 0.00, 0.50,-179.75|
| 0.00, 0.00, 1.00| 
Filtering out no_data pixels.
Transformation to world coordinates completed.
Transforming to world coordinates...
Affine transformation T0:
 |-0.50, 0.00, 90.00|
| 0.00, 0.50,-180.00|
| 0.00, 0.00, 1.00| 
Raster data shape: (360, 720) 
Affine transformation T1:
 |-0.50, 0.00, 89.75|
| 0.00, 0.50,-179.75|
| 0.00, 0.00, 1.00| 
Filtering out no_data pixels.
Transformation to world coordinates completed.
Transforming to world coordinates...
Affine transformation T0:
 |-0.50, 0.00, 90.00|
| 0.00, 0.50,-180.00|
| 0.00, 0.00, 1.00| 
Raster data shape: (360, 720) 
Affine transformation T1:
 |-0.50, 0.00, 89.75|
| 0.00, 0.50,-179.75|
| 0.00, 0.00, 1.00| 
Filtering out no_data pixels.
Transformation to world coordinates completed.
Transforming to world coordinates...
Affine transformation T0:
 |-0.50, 0.00, 90.00|
| 0.00, 0.50,-180.00|
| 0.00, 0.00, 1.00| 
Raster data shape: (360, 720) 
Affine transformation T1:
 |-0.50, 0.00, 89.75|
| 0.00, 0.50,-179.75|
| 0.00, 0.00, 1.00| 
Filtering out no_data pixels.
Transformation to world coordinates completed.
Transforming to world coordinates...
Affine transformation T0:
 |-0.50, 0.00, 90.00|
| 0.00, 0.50,-180.00|
| 0.00, 0.00, 1.00| 
Raster data shape: (360, 720) 
Affine transformation T1:
 |-0.50, 0.00, 89.75|
| 0.00, 0.50,-179.75|
| 0.00, 0.00, 1.00| 
Filtering out no_data pixels.
Transformation to world coordinates completed.
Transforming to world coordinates...
Affine transformation T0:
 |-0.50, 0.00, 90.00|
| 0.00, 0.50,-180.00|
| 0.00, 0.00, 1.00| 
Raster data shape: (360, 720) 
Affine transformation T1:
 |-0.50, 0.00, 89.75|
| 0.00, 0.50,-179.75|
| 0.00, 0.00, 1.00| 
Filtering out no_data pixels.
Transformation to world coordinates completed.



In [115]:

    
merged2[merged2.Continent==0].shape









    Out[115]:





(27498, 7)



In [116]:

    
np.count_nonzero(continents_rasters[0]) # good!









    Out[116]:





27498



In [117]:

    
merged2[merged2.Continent==2].shape









    Out[117]:





(10236, 7)



In [118]:

    
np.count_nonzero(continents_rasters[2]) # good!









    Out[118]:





10236



In [119]:

    
merged2.columns.values









    Out[119]:





array(['Continent', 'Esox lucius', 'Lepidomeda mollispinis', 'MaxT',
       'MeanT', 'MinT', 'Salmo trutta'], dtype=object)



In [120]:

    
# rearange columns again



In [121]:

    
cols = merged2.columns.values
cols1 = [cols[5], cols[3], cols[4], cols[1], cols[2], cols[6], cols[0]]



In [122]:

    
merged2 = merged2[cols1]



In [123]:

    
merged2.head()









    Out[123]:






  
    
      
      
      MinT
      MaxT
      MeanT
      Esox lucius
      Lepidomeda mollispinis
      Salmo trutta
      Continent
    
    
      decimallatitude
      decimallongitude
      
      
      
      
      
      
      
    
  
  
    
      -89.75
      -179.75
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
      NaN
      NaN
      6.0
    
    
      -179.25
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
      NaN
      NaN
      6.0
    
    
      -178.75
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
      NaN
      NaN
      6.0
    
    
      -178.25
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
      NaN
      NaN
      6.0
    
    
      -177.75
      -3.402823e+38
      -3.402823e+38
      -3.402823e+38
      NaN
      NaN
      NaN
      6.0



In [124]:

    
merged2.to_csv("../data/fish/selection/dataframe_merged_with_continents_rasterized_2.csv")



In [125]:

    
np.where((continents_rasters[2]+rasterized_el)>1)









    Out[125]:





(array([], dtype=int64), array([], dtype=int64))



In [ ]:


decimallatitude	decimallongitude
89.75	-179.75
	-179.25
	-178.75
	-178.25
	-177.75


decimallatitude	decimallongitude
-89.75	177.75
	178.25
	178.75
	179.25
	179.75

		Salmo trutta
decimallatitude	decimallongitude
70.75	22.75	1
	23.25	1
	24.75	1
	25.25	1
	26.25	1

		Salmo trutta
decimallatitude	decimallongitude
35.25	25.25	1
35.25	26.25	1
30.75	78.25	1
	78.75	1
	79.25	1

		Salmo trutta
decimallatitude	decimallongitude
17.75	74.75	0
17.25	43.25	0
17.25	74.75	0
16.75	43.75	0
16.75	75.25	0

		Lepidomeda mollispinis
decimallatitude	decimallongitude
38.25	-114.25	1
37.75	-114.25	1
37.25	-114.25	1
	-113.75	1
	-113.25	1

		Lepidomeda mollispinis
decimallatitude	decimallongitude
67.25	-145.75	0
66.75	-147.75	0
	-146.25	0
	-145.75	0
	-144.25	0

		Lepidomeda mollispinis
decimallatitude	decimallongitude
24.25	-102.75	0
24.25	-102.25	0
23.75	-109.75	0
23.75	-102.25	0
23.25	-102.75	0

		Esox lucius
decimallatitude	decimallongitude
74.25	91.75	1
	92.25	1
	92.75	1
	93.25	1
	93.75	1

		Esox lucius
decimallatitude	decimallongitude
82.75	-75.75	0
	-35.75	0
	-25.75	0
82.25	-83.75	0
82.25	-71.25	0