Working with GBIF data in presence point records


In [1]:
from iSDM.species import GBIFSpecies

In [2]:
my_species = GBIFSpecies(name_species="Etheostoma_blennioides")

In [3]:
my_species.name_species


Out[3]:
'Etheostoma_blennioides'

just some logging/plotting magic to output in this notebook, nothing to care about.


In [4]:
%matplotlib inline
import logging
root = logging.getLogger()
root.addHandler(logging.StreamHandler())

1. Find and download all matching species data from GBIF. At this point no data cleaning is done yet.

Show only first 5 observation rows (head()).


In [5]:
my_species.find_species_occurrences().head()


Loading species ... 
Number of occurrences: 7229 
True
Loaded species: ['Etheostoma blennioides'] 
Out[5]:
accessrights associatedoccurrences associatedreferences associatedsequences basisofrecord bibliographiccitation catalognumber class classkey collectioncode ... type typestatus verbatimcoordinatesystem verbatimdepth verbatimelevation verbatimeventdate verbatimlocality vernacularname waterbody year
0 Open Access, http://creativecommons.org/public... NaN NaN NaN PRESERVED_SPECIMEN Etheostoma blennioides (YPM ICH 028684) YPM ICH 028684 Actinopterygii 204 VZ ... PhysicalObject NaN NaN NaN NaN NaN NaN perches; perch-like fishes; ray-finned fishes;... NaN 2016.0
1 Open Access, http://creativecommons.org/public... NaN NaN NaN PRESERVED_SPECIMEN Etheostoma blennioides (YPM ICH 028456) YPM ICH 028456 Actinopterygii 204 VZ ... PhysicalObject NaN NaN NaN NaN NaN NaN perches; perch-like fishes; ray-finned fishes;... NaN 2015.0
2 NaN NaN NaN NaN HUMAN_OBSERVATION NaN 1937841 Actinopterygii 204 Observations ... NaN NaN NaN NaN NaN Thu Sep 10 2015 14:51:49 GMT-0400 (EDT) 3827–4235 Fobes Rd, Rock Creek, OH, US NaN NaN 2015.0
3 NaN NaN NaN NaN HUMAN_OBSERVATION NaN 623289 Actinopterygii 204 Observations ... NaN NaN NaN NaN NaN 2014-04-13 Beaver Creek NaN NaN 2014.0
4 Open Access, http://creativecommons.org/public... NaN Det. by: Thomas J. Near NaN PRESERVED_SPECIMEN Etheostoma blennioides (YPM ICH 026964) YPM ICH 026964 Actinopterygii 204 VZ ... PhysicalObject NaN NaN NaN NaN NaN NaN perches; perch-like fishes; ray-finned fishes;... NaN 2014.0

5 rows × 137 columns

taxonkey derived from GBIF data. It's a sort of unique ID per species


In [6]:
my_species.ID # taxonkey derived from GBIF. It's a sort of unique ID per species


Out[6]:
2382397

Data is serialized and saved in a file.

Default location: current working directory. Default filename: GBIFID of the species


In [7]:
my_species.save_data()


Saved data: /home/daniela/git/iSDM/notebooks/Etheostoma_blennioides2382397.pkl 
Type of data: <class 'pandas.core.frame.DataFrame'> 

In [8]:
my_species.source.name


Out[8]:
'GBIF'

Let's get a general idea of where the species is distributed on the map


In [9]:
my_species.plot_species_occurrence()


Data geometrized: converted into GeoPandas dataframe.
Points with NaN coordinnates ignored. 

In [10]:
polygonized_species = my_species.polygonize()


Data polygonized without envelope.
Cascaded union of polygons created.

In [11]:
my_species.overlay(polygonized_species.geometry)
my_species.data_full.shape


Overlayed species occurrence data with the given range map.
Out[11]:
(5232, 138)

In [12]:
polygonized_species.geometry = polygonized_species.geometry[7:]

In [13]:
polygonized_species.dropna()


Out[13]:
geometry
7 POLYGON ((-92.26972000000001 33.30278, -92.562...

In [14]:
polygonized_species.dropna(inplace=True)

In [15]:
my_species.overlay(polygonized_species.geometry)
my_species.data_full.shape


Overlayed species occurrence data with the given range map.
Out[15]:
(5218, 138)

In [16]:
my_species.plot_species_occurrence()


The map is always zoomed to the species borders. Notice low right corner also has one point.

2. Or just load existing data into a Species object. Let's use the file we saved before.


In [17]:
data = my_species.load_data("./Etheostoma_blennioides2382397.pkl") # or just load existing data into Species object


Loading data from: ./Etheostoma_blennioides2382397.pkl
Succesfully loaded previously saved data.

In [18]:
data.columns # all the columns available per observation


Out[18]:
Index(['accessrights', 'associatedoccurrences', 'associatedreferences',
       'associatedsequences', 'basisofrecord', 'bibliographiccitation',
       'catalognumber', 'class', 'classkey', 'collectioncode',
       ...
       'type', 'typestatus', 'verbatimcoordinatesystem', 'verbatimdepth',
       'verbatimelevation', 'verbatimeventdate', 'verbatimlocality',
       'vernacularname', 'waterbody', 'year'],
      dtype='object', length=137)

In [19]:
data.head()


Out[19]:
accessrights associatedoccurrences associatedreferences associatedsequences basisofrecord bibliographiccitation catalognumber class classkey collectioncode ... type typestatus verbatimcoordinatesystem verbatimdepth verbatimelevation verbatimeventdate verbatimlocality vernacularname waterbody year
0 Open Access, http://creativecommons.org/public... NaN NaN NaN PRESERVED_SPECIMEN Etheostoma blennioides (YPM ICH 028684) YPM ICH 028684 Actinopterygii 204 VZ ... PhysicalObject NaN NaN NaN NaN NaN NaN perches; perch-like fishes; ray-finned fishes;... NaN 2016.0
1 Open Access, http://creativecommons.org/public... NaN NaN NaN PRESERVED_SPECIMEN Etheostoma blennioides (YPM ICH 028456) YPM ICH 028456 Actinopterygii 204 VZ ... PhysicalObject NaN NaN NaN NaN NaN NaN perches; perch-like fishes; ray-finned fishes;... NaN 2015.0
2 NaN NaN NaN NaN HUMAN_OBSERVATION NaN 1937841 Actinopterygii 204 Observations ... NaN NaN NaN NaN NaN Thu Sep 10 2015 14:51:49 GMT-0400 (EDT) 3827–4235 Fobes Rd, Rock Creek, OH, US NaN NaN 2015.0
3 NaN NaN NaN NaN HUMAN_OBSERVATION NaN 623289 Actinopterygii 204 Observations ... NaN NaN NaN NaN NaN 2014-04-13 Beaver Creek NaN NaN 2014.0
4 Open Access, http://creativecommons.org/public... NaN Det. by: Thomas J. Near NaN PRESERVED_SPECIMEN Etheostoma blennioides (YPM ICH 026964) YPM ICH 026964 Actinopterygii 204 VZ ... PhysicalObject NaN NaN NaN NaN NaN NaN perches; perch-like fishes; ray-finned fishes;... NaN 2014.0

5 rows × 137 columns

3. Examples of simple (meta-)data exploration

Show all unique values of the 'country' column


In [20]:
data['country'].unique().tolist()


Out[20]:
['United States', nan, 'Canada', 'India']

In [21]:
data.shape # there are 7226 observations, 138 parameters per observation


Out[21]:
(7229, 137)

In [22]:
data['vernacularname'].unique().tolist() # self-explanatory


Out[22]:
['perches; perch-like fishes; ray-finned fishes; vertebrates; chordates; animals',
 nan,
 'Greenside Darter',
 'GREENSIDE DARTER',
 'greenside darter']

How about latitude/longitude? Does the data need cleaning?

head() or tail() is only used to limit the tabular output in this notebook. The "data" structure contains it all.


In [23]:
data['decimallatitude'].tail(10)


Out[23]:
7219    39.81718
7220         NaN
7221         NaN
7222    37.05000
7223    41.84000
7224    35.89813
7225    43.38421
7226         NaN
7227         NaN
7228         NaN
Name: decimallatitude, dtype: float64

Hmm, so some values are 'NaN', which means not available.

We can fill them with something (default?), or drop those records where latitude/longitude are not available. Let's drop records where the latitude/longitude data is not available


In [24]:
import numpy as np
data_cleaned = data.dropna(subset = ['decimallatitude', 'decimallongitude']) # drop records where data not available

In [25]:
data_cleaned.shape # less occurrence records now: 5226


Out[25]:
(5232, 137)

In [26]:
data_cleaned['basisofrecord'].unique()


Out[26]:
array(['PRESERVED_SPECIMEN', 'HUMAN_OBSERVATION', 'UNKNOWN'], dtype=object)

In [27]:
# this many records with no decimalLatitude and decimalLongitude
import numpy as np
data[data['decimallatitude'].isnull() & data['decimallongitude'].isnull()].size


Out[27]:
273589

How many of those have no 'locality' or 'verbatimLocality'? : 26 apparently.


In [28]:
data[data['decimallatitude'].isnull() & 
     data['decimallongitude'].isnull() & 
     data['locality'].isnull() & 
     data['verbatimlocality'].isnull()]


Out[28]:
accessrights associatedoccurrences associatedreferences associatedsequences basisofrecord bibliographiccitation catalognumber class classkey collectioncode ... type typestatus verbatimcoordinatesystem verbatimdepth verbatimelevation verbatimeventdate verbatimlocality vernacularname waterbody year
1269 http://fieldmuseum.org/about/copyright-informa... NaN NaN NaN PRESERVED_SPECIMEN NaN 112647 Actinopterygii 204 Fishes ... PhysicalObject NaN NaN NaN NaN NaN NaN NaN NaN 1999.0
4391 NaN NaN NaN NaN UNKNOWN NaN 56-5116 Actinopterygii 204 ON-CDC ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 1972.0
4756 NaN NULL NaN NaN PRESERVED_SPECIMEN NaN GCRL 17588 Actinopterygii 204 Occurrence ... NaN NaN NULL NULL NULL NULL NaN NaN NaN 1968.0
5224 NaN NULL NaN NaN PRESERVED_SPECIMEN NaN GCRL 17585 Actinopterygii 204 Occurrence ... NaN NaN NULL NULL NULL NULL NaN NaN NaN 1964.0
5231 NaN NULL NaN NaN PRESERVED_SPECIMEN NaN GCRL 17580 Actinopterygii 204 Occurrence ... NaN NaN NULL NULL NULL NULL NaN NaN NaN 1964.0
5237 NaN NULL NaN NaN PRESERVED_SPECIMEN NaN GCRL 17578 Actinopterygii 204 Occurrence ... NaN NaN NULL NULL NULL NULL NaN NaN NaN 1964.0
5253 NaN NULL NaN NaN PRESERVED_SPECIMEN NaN GCRL 17583 Actinopterygii 204 Occurrence ... NaN NaN NULL NULL NULL NULL NaN NaN NaN 1964.0
5254 NaN NULL NaN NaN PRESERVED_SPECIMEN NaN GCRL 17598 Actinopterygii 204 Occurrence ... NaN NaN NULL NULL NULL NULL NaN NaN NaN 1964.0
5258 NaN NULL NaN NaN PRESERVED_SPECIMEN NaN GCRL 17592 Actinopterygii 204 Occurrence ... NaN NaN NULL NULL NULL NULL NaN NaN NaN 1964.0
5286 NaN NULL NaN NaN PRESERVED_SPECIMEN NaN GCRL 17582 Actinopterygii 204 Occurrence ... NaN NaN NULL NULL NULL NULL NaN NaN NaN 1964.0
5287 NaN NULL NaN NaN PRESERVED_SPECIMEN NaN GCRL 17591 Actinopterygii 204 Occurrence ... NaN NaN NULL NULL NULL NULL NaN NaN NaN 1964.0
5288 NaN NULL NaN NaN PRESERVED_SPECIMEN NaN GCRL 17584 Actinopterygii 204 Occurrence ... NaN NaN NULL NULL NULL NULL NaN NaN NaN 1964.0
6048 not-for-profit use only NaN NaN NaN PRESERVED_SPECIMEN NaN 21736 Actinopterygii 204 Fishes ... PhysicalObject NaN NaN NaN NaN 19500000 NaN NaN NaN 1950.0
6067 not-for-profit use only NaN NaN NaN PRESERVED_SPECIMEN NaN 22581 Actinopterygii 204 Fishes ... PhysicalObject NaN NaN NaN NaN 19500700 NaN NaN NaN 1950.0
6654 not-for-profit use only NaN NaN NaN PRESERVED_SPECIMEN NaN 7092 Actinopterygii 204 Fishes ... PhysicalObject NaN NaN NaN NaN 19380625 NaN NaN NaN 1938.0
7043 NaN NaN NaN NaN PRESERVED_SPECIMEN NaN 692 Actinopterygii 204 SU (ICH) ... PhysicalObject NaN NaN NaN NaN NaN NaN NaN Roaring river NaN
7052 http://fieldmuseum.org/about/copyright-informa... NaN NaN NaN PRESERVED_SPECIMEN NaN 112640 Actinopterygii 204 Fishes ... PhysicalObject NaN NaN NaN NaN NaN NaN NaN NaN NaN
7056 NaN NaN NaN NaN PRESERVED_SPECIMEN Etheostoma blennioides UAMZ F2503 F2503 Actinopterygii 204 UAMZ ... PhysicalObject NaN NaN NaN NaN unknown NaN NaN NaN NaN
7061 NaN NaN NaN NaN PRESERVED_SPECIMEN NaN SU 5326 Actinopterygii 204 Occurrence ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
7066 NaN NaN NaN NaN PRESERVED_SPECIMEN NaN 5326 Actinopterygii 204 SU (ICH) ... PhysicalObject NaN NaN NaN NaN NaN NaN NaN NaN NaN
7076 NaN NaN NaN NaN PRESERVED_SPECIMEN NaN 715 Actinopterygii 204 SU (ICH) ... PhysicalObject NaN NaN NaN NaN NaN NaN NaN Eel river NaN
7142 http://fieldmuseum.org/about/copyright-informa... NaN NaN NaN PRESERVED_SPECIMEN NaN 112677 Actinopterygii 204 Fishes ... PhysicalObject NaN NaN NaN NaN NaN NaN NaN NaN NaN
7144 NaN NaN NaN NaN PRESERVED_SPECIMEN NaN SU 692 Actinopterygii 204 Occurrence ... NaN NaN NaN NaN NaN NaN NaN NaN Roaring river NaN
7152 NaN NaN NaN NaN PRESERVED_SPECIMEN NaN SU 715 Actinopterygii 204 Occurrence ... NaN NaN NaN NaN NaN NaN NaN NaN Eel river NaN
7217 not-for-profit use only NaN NaN NaN PRESERVED_SPECIMEN NaN 9050 Actinopterygii 204 Vertebrate Paleontology ... PhysicalObject NaN NaN NaN NaN NaN NaN NaN NaN NaN
7221 not-for-profit use only NaN NaN NaN PRESERVED_SPECIMEN NaN 8498 Actinopterygii 204 Fishes ... PhysicalObject NaN NaN NaN NaN NaN NaN NaN NaN NaN

26 rows × 137 columns


In [29]:
data_cleaned[['dateidentified', 'day', 'month', 'year']].head()


Out[29]:
dateidentified day month year
1 NaN 23.0 5.0 2015.0
2 2015-09-11T23:37:54.000+0000 10.0 9.0 2015.0
3 2014-04-14T00:24:18.000+0000 13.0 4.0 2014.0
4 NaN 13.0 5.0 2014.0
5 NaN 14.0 5.0 2014.0

Seems like not all records have a 'dateidentified', but 'day','month', 'year' fields are there for many (all?) records. TODO: what about verbatimdate

Select only observation records newer than 2010;

Say that only latitude, longitude, rightsHolder, datasetName columns are interesting for our selection.


In [30]:
data_selected = data_cleaned[data_cleaned['year']>2010][['decimallatitude','decimallongitude', 'rightsholder', 'datasetname']]

more filtering: select only those with a non-null datasetName


In [31]:
data_selected[~data_selected.datasetname.isnull()].head(10)


Out[31]:
decimallatitude decimallongitude rightsholder datasetname
2 41.79664 -80.97289 Robert L Curtis iNaturalist research-grade observations
3 37.97240 -83.56716 Brian Wulker iNaturalist research-grade observations
11 35.17770 -83.88780 North Carolina Museum of Natural Sciences NCSM Fishes Collection
14 35.07780 -83.97430 North Carolina Museum of Natural Sciences NCSM Fishes Collection
18 35.16030 -83.92020 North Carolina Museum of Natural Sciences NCSM Fishes Collection
30 36.41300 -81.40710 North Carolina Museum of Natural Sciences NCSM Fishes Collection
48 36.40790 -81.40160 North Carolina Museum of Natural Sciences NCSM Fishes Collection
52 36.54960 -81.00230 North Carolina Museum of Natural Sciences NCSM Fishes Collection
53 36.55790 -81.21670 North Carolina Museum of Natural Sciences NCSM Fishes Collection
55 36.06540 -91.61040 NaN Auburn University Museum Fish Collection

If you hare happy with this filtering, and you want to save the species data:


In [32]:
my_species.set_data(data_selected) # update the object "my_species" to contain the filtered data

In [33]:
my_species.save_data(file_name="updated_dataset.pkl")


Saved data: /home/daniela/git/iSDM/notebooks/updated_dataset.pkl 
Type of data: <class 'pandas.core.frame.DataFrame'> 

Plot our filtered selection


In [34]:
my_species.plot_species_occurrence()


Data geometrized: converted into GeoPandas dataframe.
Points with NaN coordinnates ignored. 

In [35]:
my_species.get_data().shape # there are 119 records now


Out[35]:
(119, 5)

4. Load data from downloaded csv file (from GBIF website, not API; differs a bit)


In [36]:
csv_data = my_species.load_csv('../data/GBIF.csv')


Loading data from: ../data/GBIF.csv
Succesfully loaded previously saved CSV data.
Updated species ID: 2382397 

In [37]:
csv_data.head() # let's peak into the data


Out[37]:
gbifid datasetkey occurrenceid kingdom phylum class order family genus species ... recordnumber identifiedby rights rightsholder recordedby typestatus establishmentmeans lastinterpreted mediatype issue
0 1224542608 71e6db8e-f762-11e1-a439-00145eb45e9a urn:catalog:OMNH:FISH:85718 Animalia Chordata Actinopterygii Perciformes Percidae Etheostoma Etheostoma blennioides ... NaN Dr. Aaron Geheber NaN Sam Noble Oklahoma Museum of Natural History Aaron Geheber NaN NaN 2015-12-23T21:01Z NaN GEODETIC_DATUM_ASSUMED_WGS84
1 17598896 83a8c0da-f762-11e1-a439-00145eb45e9a NaN Animalia Chordata Actinopterygii Perciformes Percidae Etheostoma Etheostoma blennioides ... NaN Baldwin, M.E. NaN NaN Baldwin, M.E.; Bowlby, J.N. NaN NaN 2014-06-04T23:44Z NaN GEODETIC_DATUM_ASSUMED_WGS84
2 17598905 83a8c0da-f762-11e1-a439-00145eb45e9a NaN Animalia Chordata Actinopterygii Perciformes Percidae Etheostoma Etheostoma blennioides ... NaN Baldwin, Mary Elizabeth NaN NaN Baldwin, Mary Elizabeth; Casbourn, Hugh R. NaN NaN 2014-06-04T23:44Z NaN GEODETIC_DATUM_ASSUMED_WGS84
3 198193430 961f602a-f762-11e1-a439-00145eb45e9a NaN Animalia Chordata Actinopterygii Perciformes Percidae Etheostoma Etheostoma blennioides ... NaN NaN NaN NaN R.D. Suttkus, Eaton & Donahue NaN NaN 2014-06-05T03:09Z NaN GEODETIC_DATUM_ASSUMED_WGS84
4 198193618 961f602a-f762-11e1-a439-00145eb45e9a NaN Animalia Chordata Actinopterygii Perciformes Percidae Etheostoma Etheostoma blennioides ... NaN NaN NaN NaN R.D. Suttkus, J.S. Ramsey & M.D. Dahlberg NaN NaN 2014-06-05T03:09Z NaN TAXON_MATCH_HIGHERRANK;GEODETIC_DATUM_ASSUMED_...

5 rows × 42 columns


In [38]:
csv_data['specieskey'].unique()


Out[38]:
array([2382397])

In [39]:
my_species.save_data() # by default this 'speciesKey' is used. Alternative name can be provided


Saved data: /home/daniela/git/iSDM/notebooks/Etheostoma_blennioides2382397.pkl 
Type of data: <class 'pandas.core.frame.DataFrame'> 

In [40]:
csv_data.columns.size # csv data for some reason a lot less columns


Out[40]:
42

In [41]:
data.columns.size # data from using GBIF API directly


Out[41]:
137

Which columns are in 'data', but not in 'csv_data'?


In [42]:
list(set(data.columns.tolist()) - set(csv_data.columns.tolist())) # hmm, 'decimalLatitude' vs 'decimallatitude'


Out[42]:
['references',
 'country',
 'identificationverificationstatus',
 'islandgroup',
 'identifier',
 'genuskey',
 'higherclassification',
 'media',
 'enddayofyear',
 'relations',
 'stateprovince',
 'fieldnotes',
 'individualcount',
 'license',
 'datasetid',
 'verbatimdepth',
 'island',
 'eventremarks',
 'lastparsed',
 'accessrights',
 'locationremarks',
 'familykey',
 'geodeticdatum',
 'samplingprotocol',
 'genericname',
 'language',
 'identificationremarks',
 'lastcrawled',
 'issues',
 'associatedoccurrences',
 'protocol',
 'ownerinstitutioncode',
 'eventid',
 'waterbody',
 'georeferencesources',
 'associatedreferences',
 'locationid',
 'continent',
 'dynamicproperties',
 'habitat',
 'highergeography',
 'collectionid',
 'locationaccordingto',
 'datasetname',
 'startdayofyear',
 'organismid',
 'disposition',
 'footprintwkt',
 'phylumkey',
 'taxonid',
 'kingdomkey',
 'informationwithheld',
 'lifestage',
 'previousidentifications',
 'georeferenceprotocol',
 'coordinateprecision',
 'http://unknown.org/occurrencedetails',
 'orderkey',
 'scientificnameid',
 'othercatalognumbers',
 'created',
 'associatedsequences',
 'fieldnumber',
 'modified',
 'facts',
 'occurrencestatus',
 'specificepithet',
 'taxonremarks',
 'source',
 'county',
 'municipality',
 'verbatimelevation',
 'parentnameusage',
 'georeferenceremarks',
 'georeferenceddate',
 'dateidentified',
 'type',
 'key',
 'bibliographiccitation',
 'vernacularname',
 'verbatimlocality',
 'nomenclaturalcode',
 'occurrenceremarks',
 'classkey',
 'verbatimeventdate',
 'preparations',
 'institutionid',
 'publishingcountry',
 'http://unknown.org/organismid',
 'extensions',
 'georeferencedby',
 'verbatimcoordinatesystem',
 'identificationqualifier',
 'eventtime',
 'identificationid',
 'identifiers',
 'coordinateuncertaintyinmeters',
 'georeferenceverificationstatus']

Which columns are in 'csv_data' but not in 'data'?


In [43]:
list(set(csv_data.columns.tolist()) - set(data.columns.tolist())) # hmm, not many


Out[43]:
['mediatype', 'infraspecificepithet', 'issue']

5. Geometrizing the GBIF presence-only point records.

One way of converting point-records (lat/lon) to geometric shapes is by expanding each sample point into a buffer (or "polygon of influence"), and simplifying + merging the overlapping buffers into a cascaded union.


In [44]:
geometrized_species =  my_species.polygonize()  # returns a geopandas dataframe with a geometry column.


Data geometrized: converted into GeoPandas dataframe.
Points with NaN coordinnates ignored. 
Data polygonized without envelope.
Cascaded union of polygons created.

In [45]:
geometrized_species


Out[45]:
geometry
0 POLYGON ((-82.50020000000001 32.7504, -82.7930...
1 POLYGON ((-77.16722 35.54528, -77.460113218813...
2 POLYGON ((-124 44.65, -124.2928932188134 43.94...
3 POLYGON ((-85.23066999999999 44.35975999999999...
4 POLYGON ((-81.2 45.858, -81.49289321881345 45....
5 POLYGON ((-92.26971999999999 33.30278, -92.562...

In [46]:
geometrized_species.plot()  # each isolated polygon is a separate record (do we want that or?)


Out[46]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f47ecac51d0>

In [47]:
# we can tweak the parameters for the polygonize function
geometrized_species = my_species.polygonize(buffer_distance=0.2, simplify_tolerance=0.02)
geometrized_species.plot()


Data polygonized without envelope.
Cascaded union of polygons created.
Out[47]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f47ecfaa240>

In [48]:
my_species.get_data().shape


Out[48]:
(5215, 43)

In [49]:
# with_envelope means "pixelized" (envelope around each buffer region)
geometrized_species = my_species.polygonize(buffer_distance=0.3, simplify_tolerance=0.03, with_envelope=True)
geometrized_species.plot()


Data polygonized with envelope.
Cascaded union of polygons created.
Out[49]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f47e93a7320>

6. Cut the outliers, i.e., overlay with a polygon

Define a "zoom-in" polygon that we use for selecting a subset of the data.


In [50]:
from shapely.geometry import Point, Polygon

In [51]:
# say we want to crop to this polygon area only
overlay_polygon = Polygon(((-100,30), (-100, 50), (-70, 50),(-70, 30)))

In [52]:
# Beware, this overwrites the original my_species data ("data_full" field)
my_species.data_full = my_species.data_full[my_species.data_full.geometry.within(overlay_polygon)]

In [53]:
my_species.polygonize().plot()


Data polygonized without envelope.
Cascaded union of polygons created.
Out[53]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f47e91db860>

In [54]:
my_species.polygonize(buffer_distance=0.5, simplify_tolerance=0.05).plot() # more fine-grained


Data polygonized without envelope.
Cascaded union of polygons created.
Out[54]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f47e9292828>

In [58]:
my_species.polygonize(buffer_distance=0.3, simplify_tolerance=0.03).plot()  # etc


Data polygonized without envelope.
Cascaded union of polygons created.
Out[58]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f47e9188828>

In [59]:
my_species.polygonize(buffer_distance=0.3, simplify_tolerance=0.03, with_envelope=True).plot() # with_envelope means pixelized


Data polygonized with envelope.
Cascaded union of polygons created.
Out[59]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f47e92a2eb8>

In [60]:
# we can further simplify with a "convex hull" around each polygon
my_species.polygonize().geometry.convex_hull.plot()


Data polygonized without envelope.
Cascaded union of polygons created.
Out[60]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f47e8ea65c0>

In [61]:
polygonized_species = my_species.polygonize()


Data polygonized without envelope.
Cascaded union of polygons created.

In [62]:
polygonized_species


Out[62]:
geometry
0 POLYGON ((-82.50020000000001 32.7504, -82.7930...
1 POLYGON ((-77.16722 35.54528, -77.460113218813...
2 POLYGON ((-85.23066999999999 44.35975999999999...
3 POLYGON ((-81.2 45.858, -81.49289321881345 45....
4 POLYGON ((-92.26971999999999 33.30278, -92.562...

In [63]:
polygonized_species.plot()


Out[63]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f47e8e255f8>

6. Low-level manipulations


In [64]:
# We can make a union of all polygons into one "multipolygon" (Do we need this? I can make a wrapper if needed)
import shapely.ops
my_multipolygon = shapely.ops.cascaded_union(polygonized_species.geometry.tolist())
my_multipolygon


Out[64]:

In [65]:
from geopandas import GeoDataFrame, GeoSeries
new_series = GeoSeries(shapely.ops.cascaded_union(polygonized_species.geometry.tolist()))
new_series


Out[65]:
0    (POLYGON ((-77.16722 35.54528, -77.46011321881...
dtype: object

In [66]:
new_series.plot()


Out[66]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f47e8e71a58>

In [67]:
new_series.convex_hull.plot()


Out[67]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f47e8ee6668>

In [68]:
my_species.data_full.geometry.total_bounds


Out[68]:
(-97.0792, 32.704440000000005, -74.257649999999998, 45.858000000000004)

In [69]:
my_species.data_full.geometry.bounds.minx.min()


Out[69]:
-97.0792

In [70]:
my_species.data_full.geometry.bounds.miny.min()


Out[70]:
32.704440000000005

In [71]:
my_species.data_full.geometry.bounds.maxx.max()


Out[71]:
-74.257649999999998

In [72]:
my_species.data_full.geometry.bounds.maxy.max()


Out[72]:
45.858000000000004

In [ ]: