Working with IUCN data in shapefiles

just some logging/plotting magic to output in this notebook, nothing to care about.


In [1]:
import logging
root = logging.getLogger()
root.addHandler(logging.StreamHandler())
%matplotlib inline

1. Load a shapefile with all turtles data. At this point no data cleaning is done yet.


In [2]:
# download http://bit.ly/1R8pt20 (zipped Turtles shapefiles), and unzip them
from iSDM.species import IUCNSpecies
turtles = IUCNSpecies(name_species='Acanthochelys pallidipectoris')
turtles.load_shapefile('../data/FW_TURTLES/FW_TURTLES.shp')


Loading data from: ../data/FW_TURTLES/FW_TURTLES.shp
The shapefile contains data on 181 species.

Show only first 5 species (meta)data, to get an idea of the data structure.


In [3]:
turtles.get_data().head(10)


Out[3]:
binomial category citation class_name compiler dist_comm family_nam genus_name geometry id_no ... presence seasonal shape_area shape_leng source species_na subpop subspecies tax_comm year
0 Batagur baska CR CRF REPTILIA Rhodin None GEOEMYDIDAE Batagur (POLYGON ((88.17095947265625 21.91680908203125... 2614.0 ... 1.0 1.0 11.002731 126.804751 CBFTT baska None None None 2013.0
1 Cuora galbinifrons CR CRF REPTILIA Rhodin None GEOEMYDIDAE Cuora (POLYGON ((105.4661254882812 21.1263427734375,... 5955.0 ... 1.0 1.0 16.619326 52.207722 CBFTT galbinifrons None None None 2013.0
2 Graptemys pseudogeographica LC CRF REPTILIA Rhodin None EMYDIDAE Graptemys (POLYGON ((-95.635009765625 35.36834716796875,... 165600.0 ... 1.0 1.0 97.575806 256.031805 CBFTT pseudogeographica None None None 2013.0
3 Malaclemys terrapin LR/nt CRF REPTILIA Rhodin None EMYDIDAE Malaclemys (POLYGON ((-76.6082763671875 39.25, -76.615722... 12695.0 ... 1.0 1.0 11.345627 347.909301 CBFTT terrapin None None None 2013.0
4 Melanochelys trijuga LR/nt CRF REPTILIA Rhodin None GEOEMYDIDAE Melanochelys (POLYGON ((84.14166259765625 26.06658935546875... 13039.0 ... 1.0 1.0 172.597842 326.610136 CBFTT trijuga None None None 2013.0
5 Emys orbicularis LR/nt CRF REPTILIA Rhodin None EMYDIDAE Emys (POLYGON ((13.3084716796875 45.74169921875, 13... 7717.0 ... 1.0 1.0 590.040604 1332.204343 CBFTT orbicularis None None None 2013.0
6 Graptemys oculifera VU CRF REPTILIA Rhodin None EMYDIDAE Graptemys POLYGON ((-89.12554931640625 33.19378662109375... 9499.0 ... 1.0 1.0 2.017690 10.636744 CBFTT oculifera None None None 2013.0
7 Graptemys ouachitensis LC CRF REPTILIA Rhodin None EMYDIDAE Graptemys (POLYGON ((-95.731201171875 35.03363037109375,... 165599.0 ... 1.0 1.0 75.220760 250.600669 CBFTT ouachitensis None None None 2013.0
8 Notochelys platynota VU CRF REPTILIA Rhodin None GEOEMYDIDAE Notochelys (POLYGON ((108.9833984375 1.2708740234375, 108... 14856.0 ... 1.0 1.0 88.398220 251.904956 CBFTT platynota None None None 2013.0
9 Pseudemys rubriventris NT CRF REPTILIA Rhodin None EMYDIDAE Pseudemys (POLYGON ((-77.2666015625 37.3125, -77.2666015... 18460.0 ... 1.0 1.0 11.561191 92.024785 CBFTT rubriventris None None None 2013.0

10 rows × 26 columns


In [4]:
turtles.get_data().columns # all the columns available per species geometry


Out[4]:
Index(['binomial', 'category', 'citation', 'class_name', 'compiler',
       'dist_comm', 'family_nam', 'genus_name', 'geometry', 'id_no', 'island',
       'kingdom_na', 'legend', 'order_name', 'origin', 'phylum_nam',
       'presence', 'seasonal', 'shape_area', 'shape_leng', 'source',
       'species_na', 'subpop', 'subspecies', 'tax_comm', 'year'],
      dtype='object')

In [5]:
turtles.source.name


Out[5]:
'IUCN'

In [10]:
# WARNING! many shapefiles to plot for 181 species! You may want to skip this line if your PC is not strong! :P
turtles.plot_species_occurrence()


2. Filter species by the name given above


In [6]:
turtles.find_species_occurrences()


Loaded species: ['Acanthochelys pallidipectoris'] 
Out[6]:
binomial category citation class_name compiler dist_comm family_nam genus_name geometry id_no ... presence seasonal shape_area shape_leng source species_na subpop subspecies tax_comm year
138 Acanthochelys pallidipectoris VU CRF REPTILIA Rhodin None CHELIDAE Acanthochelys POLYGON ((-59.20001220703125 -28.0374755859375... 75.0 ... 1.0 1.0 36.632836 54.770042 CBFTT pallidipectoris None None None 2013.0

1 rows × 26 columns


In [7]:
turtles.get_data() # datatype: geopandas.geodataframe.GeoDataFrame


Out[7]:
binomial category citation class_name compiler dist_comm family_nam genus_name geometry id_no ... presence seasonal shape_area shape_leng source species_na subpop subspecies tax_comm year
138 Acanthochelys pallidipectoris VU CRF REPTILIA Rhodin None CHELIDAE Acanthochelys POLYGON ((-59.20001220703125 -28.0374755859375... 75.0 ... 1.0 1.0 36.632836 54.770042 CBFTT pallidipectoris None None None 2013.0

1 rows × 26 columns


In [8]:
# Now plot the filtered selection
turtles.plot_species_occurrence()



In [9]:
turtles.save_data() # serialize all the current data to a pickle file, so it can be loaded later on


Saved data: /Users/daniela/git/iSDM/notebooks/Acanthochelys pallidipectoris75.pkl 
Type of data: <class 'geopandas.geodataframe.GeoDataFrame'> 

In [10]:
turtles.load_data()


Loading data from: /Users/daniela/git/iSDM/notebooks/Acanthochelys pallidipectoris75.pkl
Succesfully loaded previously saved data.
Out[10]:
binomial category citation class_name compiler dist_comm family_nam genus_name geometry id_no ... presence seasonal shape_area shape_leng source species_na subpop subspecies tax_comm year
138 Acanthochelys pallidipectoris VU CRF REPTILIA Rhodin None CHELIDAE Acanthochelys POLYGON ((-59.20001220703125 -28.0374755859375... 75.0 ... 1.0 1.0 36.632836 54.770042 CBFTT pallidipectoris None None None 2013.0

1 rows × 26 columns


In [11]:
turtles.ID # derived from "id_no" column. It's a sort of unique ID per species


Out[11]:
75

3. Plot geometry

Plot the shapefile data, and a convex hull. GeoPandas objects also know how to plot themselves directly.


In [12]:
ax = turtles.get_data().plot(figsize=(10,10))
turtles.data_full.geometry.convex_hull.plot(ax=ax)


Out[12]:
<matplotlib.axes._subplots.AxesSubplot at 0x10f354400>

Let's put a buffer around the data, and plot that


In [13]:
with_buffer = turtles.get_data().geometry.buffer(0.5)
with_buffer.plot() # hmm, now the buffer behaves differently


Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0x10ea4e550>

The currently filtered shape data can be saved. If overwrite=True, the shapefile it was loaded from, will be overwritten. Otherwise you can provide a new shape_file as an argument.


In [14]:
turtles.save_shapefile(full_name="./TURTLES_FILTERED.shp")


Saved data: ./TURTLES_FILTERED.shp 

4. Sample pseudo-absence points around the geometry


In [15]:
pseudo_absence = turtles.random_pseudo_absence_points(buffer_distance=1.5, count=200)


Buffered geometry valid. Now creating a buffer difference from which to draw random points
Creating random points ... 

In [16]:
#import matplotlib.pyplot as plt
#plt.figure(figsize=(15,15))
ax = pseudo_absence.plot(figsize=(10,10))
turtles.get_data().plot(ax=ax)


Out[16]:
<matplotlib.axes._subplots.AxesSubplot at 0x10e86bfd0>

This is what is actually happening behind the scenes to create a geometry for drawing points from


In [17]:
# 1. Make simplified buffer around the geometry
simplified_buffer = turtles.get_data().geometry.buffer(0).simplify(1,True).buffer(1.5)
# 2. Make a difference of the above area, with the original one
buffer_difference = simplified_buffer.geometry.difference(turtles.get_data().geometry.buffer(0))
# 3. Randomly generate points in simplified_buffer's envelope until you reach <count> number of points belonging to this area
buffer_difference.plot(figsize=(10,10))


Out[17]:
<matplotlib.axes._subplots.AxesSubplot at 0x10f195fd0>

There are more complicated geometries (possibly with multiple polygons and "invalid"). Operations like intersection or union are problematic for such geometries, as they can throw errors. For example:


In [18]:
turtles.load_shapefile("../data/filtered_turtles_again.shp")
turtles.plot_species_occurrence()


Loading data from: ../data/filtered_turtles_again.shp
The shapefile contains data on 1 species.

For them the process of creating a buffer region is a bit more involved, as simplifications must be made. Note that this is done automatically behind the scenes. We show the inner-process just to get an idea of the area from which random points are drawn. In this case, the area to draw points from, would look like this:


In [19]:
simplified_buffer = turtles.get_data().geometry.buffer(0).simplify(1,True).buffer(1.5)
# this would throw an exception, so we must simplify it further
# buffer_difference = simplified_buffer.geometry.difference(turtles.get_data().geometry.buffer(0))
simplified_buffer = simplified_buffer.simplify(4, True) 
buffer_difference = simplified_buffer.geometry.difference(turtles.get_data().geometry.buffer(0))
ax = buffer_difference.plot(figsize=(10,10))
turtles.get_data().plot(ax=ax)


Out[19]:
<matplotlib.axes._subplots.AxesSubplot at 0x1126de7f0>

In practise, all you need to do is, again, only the following:


In [20]:
pseudo_absence = turtles.random_pseudo_absence_points(buffer_distance=1.5, count=200)


Self-intersection at or near point 33.767489061418452 -15.693063020979068
Self-intersection at or near point 33.767489061418452 -15.693063020979068
Buffered geometry is invalid, trying to simplify it using tolerance = 1
Self-intersection at or near point 33.767489061418452 -15.693063020979068
Self-intersection at or near point 33.767489061418452 -15.693063020979068
Buffered geometry is invalid, trying to simplify it using tolerance = 2
Self-intersection at or near point 33.767489061418452 -15.693063020979068
Self-intersection at or near point 33.767489061418452 -15.693063020979068
Buffered geometry is invalid, trying to simplify it using tolerance = 3
Self-intersection at or near point 33.767489061418452 -15.693063020979068
Self-intersection at or near point 33.767489061418452 -15.693063020979068
Buffered geometry is invalid, trying to simplify it using tolerance = 4
Buffered geometry valid. Now creating a buffer difference from which to draw random points
Creating random points ... 

Let's plot them together, the original area and the random points


In [21]:
ax = pseudo_absence.plot(figsize=(10,10))
turtles.get_data().plot(ax=ax)


Out[21]:
<matplotlib.axes._subplots.AxesSubplot at 0x10e869be0>

5. Rasterize

Rasterize the data: we need a target raster_file to save it to, and a resolution.


In [22]:
turtles.rasterize(raster_file='./turtles.tif', pixel_size=0.5, all_touched=True)


RASTERIO: Data rasterized into file ./turtles.tif 
RASTERIO: Resolution: x_res=47 y_res=65
Out[22]:
array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 1],
       [0, 0, 0, ..., 1, 1, 1],
       ..., 
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=uint8)

Or at some point later, if you want to load the raster file


In [23]:
turtles_raster_data = turtles.load_raster_data(raster_file='./turtles.tif')


Loaded raster data from ./turtles.tif 
Driver name: GTiff 
Resolution: x_res=47 y_res=65.
Coordinate reference system: {'init': 'epsg:4326'} 
Affine transformation: (11.92498779296875, 0.5, 0.0, 2.779296875, 0.0, -0.5) 
Number of layers: 1 

In [24]:
turtles_raster_data


Out[24]:
array([[[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 1],
        [0, 0, 0, ..., 1, 1, 1],
        ..., 
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]]], dtype=uint8)

In [25]:
turtles_raster_data.shape # 1 layer(band), resolution


Out[25]:
(1, 65, 47)

A simple plot of the raster data


In [26]:
import matplotlib.pyplot as plt
plt.figure(figsize=turtles_raster_data.shape[1:]) # careful with big images!
plt.imshow(turtles_raster_data[0], cmap="hot", interpolation="none") # the first band has index 0


Out[26]:
<matplotlib.image.AxesImage at 0x118a17160>

Rasterize with more options


In [27]:
result = turtles.rasterize(raster_file='./turtles.tif', pixel_size=0.5, all_touched=False, no_data_value=254, default_value=5) # all_touched=False now


RASTERIO: Data rasterized into file ./turtles.tif 
RASTERIO: Resolution: x_res=47 y_res=65

In [28]:
result


Out[28]:
array([[254, 254, 254, ..., 254, 254, 254],
       [254, 254, 254, ..., 254, 254, 254],
       [254, 254, 254, ...,   5,   5,   5],
       ..., 
       [254, 254, 254, ..., 254, 254, 254],
       [254, 254, 254, ..., 254, 254, 254],
       [254, 254, 254, ..., 254, 254, 254]], dtype=uint8)

In [29]:
plt.figure(figsize=result.shape) # careful with big images!
plt.imshow(result, cmap="hot", interpolation="none")


Out[29]:
<matplotlib.image.AxesImage at 0x10f9269e8>

In [ ]: