Working with GBIF data in presence point records



In [1]:

    
from iSDM.species import GBIFSpecies



In [2]:

    
my_species = GBIFSpecies(name_species="Etheostoma_blennioides")



In [3]:

    
my_species.name_species









    Out[3]:





'Etheostoma_blennioides'

just some logging/plotting magic to output in this notebook, nothing to care about.



In [4]:

    
%matplotlib inline
import logging
root = logging.getLogger()
root.addHandler(logging.StreamHandler())

1. Find and download all matching species data from GBIF. At this point no data cleaning is done yet.

Show only first 5 observation rows (head()).



In [5]:

    
my_species.find_species_occurrences().head()









    



Loading species ... 
Number of occurrences: 7229 
True
Loaded species: ['Etheostoma blennioides'] 






    Out[5]:






  
    
      
      accessrights
      associatedoccurrences
      associatedreferences
      associatedsequences
      basisofrecord
      bibliographiccitation
      catalognumber
      class
      classkey
      collectioncode
      ...
      type
      typestatus
      verbatimcoordinatesystem
      verbatimdepth
      verbatimelevation
      verbatimeventdate
      verbatimlocality
      vernacularname
      waterbody
      year
    
  
  
    
      0
      Open Access, http://creativecommons.org/public...
      NaN
      NaN
      NaN
      PRESERVED_SPECIMEN
      Etheostoma blennioides (YPM ICH 028684)
      YPM ICH 028684
      Actinopterygii
      204
      VZ
      ...
      PhysicalObject
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      perches; perch-like fishes; ray-finned fishes;...
      NaN
      2016.0
    
    
      1
      Open Access, http://creativecommons.org/public...
      NaN
      NaN
      NaN
      PRESERVED_SPECIMEN
      Etheostoma blennioides (YPM ICH 028456)
      YPM ICH 028456
      Actinopterygii
      204
      VZ
      ...
      PhysicalObject
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      perches; perch-like fishes; ray-finned fishes;...
      NaN
      2015.0
    
    
      2
      NaN
      NaN
      NaN
      NaN
      HUMAN_OBSERVATION
      NaN
      1937841
      Actinopterygii
      204
      Observations
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      Thu Sep 10 2015 14:51:49 GMT-0400 (EDT)
      3827–4235 Fobes Rd, Rock Creek, OH, US
      NaN
      NaN
      2015.0
    
    
      3
      NaN
      NaN
      NaN
      NaN
      HUMAN_OBSERVATION
      NaN
      623289
      Actinopterygii
      204
      Observations
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      2014-04-13
      Beaver Creek
      NaN
      NaN
      2014.0
    
    
      4
      Open Access, http://creativecommons.org/public...
      NaN
      Det. by: Thomas J. Near
      NaN
      PRESERVED_SPECIMEN
      Etheostoma blennioides (YPM ICH 026964)
      YPM ICH 026964
      Actinopterygii
      204
      VZ
      ...
      PhysicalObject
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      perches; perch-like fishes; ray-finned fishes;...
      NaN
      2014.0
    
  

5 rows × 137 columns

taxonkey derived from GBIF data. It's a sort of unique ID per species



In [6]:

    
my_species.ID # taxonkey derived from GBIF. It's a sort of unique ID per species









    Out[6]:





2382397

Data is serialized and saved in a file.

Default location: current working directory. Default filename: GBIFID of the species



In [7]:

    
my_species.save_data()









    



Saved data: /home/daniela/git/iSDM/notebooks/Etheostoma_blennioides2382397.pkl 
Type of data: <class 'pandas.core.frame.DataFrame'>



In [8]:

    
my_species.source.name









    Out[8]:





'GBIF'

Let's get a general idea of where the species is distributed on the map



In [9]:

    
my_species.plot_species_occurrence()









    



Data geometrized: converted into GeoPandas dataframe.
Points with NaN coordinnates ignored.



In [10]:

    
polygonized_species = my_species.polygonize()









    



Data polygonized without envelope.
Cascaded union of polygons created.



In [11]:

    
my_species.overlay(polygonized_species.geometry)
my_species.data_full.shape









    



Overlayed species occurrence data with the given range map.






    Out[11]:





(5232, 138)



In [12]:

    
polygonized_species.geometry = polygonized_species.geometry[7:]



In [13]:

    
polygonized_species.dropna()









    Out[13]:






  
    
      
      geometry
    
  
  
    
      7
      POLYGON ((-92.26972000000001 33.30278, -92.562...



In [14]:

    
polygonized_species.dropna(inplace=True)



In [15]:

    
my_species.overlay(polygonized_species.geometry)
my_species.data_full.shape









    



Overlayed species occurrence data with the given range map.






    Out[15]:





(5218, 138)



In [16]:

    
my_species.plot_species_occurrence()

The map is always zoomed to the species borders. Notice low right corner also has one point.

2. Or just load existing data into a Species object. Let's use the file we saved before.



In [17]:

    
data = my_species.load_data("./Etheostoma_blennioides2382397.pkl") # or just load existing data into Species object









    



Loading data from: ./Etheostoma_blennioides2382397.pkl
Succesfully loaded previously saved data.



In [18]:

    
data.columns # all the columns available per observation









    Out[18]:





Index(['accessrights', 'associatedoccurrences', 'associatedreferences',
       'associatedsequences', 'basisofrecord', 'bibliographiccitation',
       'catalognumber', 'class', 'classkey', 'collectioncode',
       ...
       'type', 'typestatus', 'verbatimcoordinatesystem', 'verbatimdepth',
       'verbatimelevation', 'verbatimeventdate', 'verbatimlocality',
       'vernacularname', 'waterbody', 'year'],
      dtype='object', length=137)



In [19]:

    
data.head()









    Out[19]:






  
    
      
      accessrights
      associatedoccurrences
      associatedreferences
      associatedsequences
      basisofrecord
      bibliographiccitation
      catalognumber
      class
      classkey
      collectioncode
      ...
      type
      typestatus
      verbatimcoordinatesystem
      verbatimdepth
      verbatimelevation
      verbatimeventdate
      verbatimlocality
      vernacularname
      waterbody
      year
    
  
  
    
      0
      Open Access, http://creativecommons.org/public...
      NaN
      NaN
      NaN
      PRESERVED_SPECIMEN
      Etheostoma blennioides (YPM ICH 028684)
      YPM ICH 028684
      Actinopterygii
      204
      VZ
      ...
      PhysicalObject
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      perches; perch-like fishes; ray-finned fishes;...
      NaN
      2016.0
    
    
      1
      Open Access, http://creativecommons.org/public...
      NaN
      NaN
      NaN
      PRESERVED_SPECIMEN
      Etheostoma blennioides (YPM ICH 028456)
      YPM ICH 028456
      Actinopterygii
      204
      VZ
      ...
      PhysicalObject
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      perches; perch-like fishes; ray-finned fishes;...
      NaN
      2015.0
    
    
      2
      NaN
      NaN
      NaN
      NaN
      HUMAN_OBSERVATION
      NaN
      1937841
      Actinopterygii
      204
      Observations
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      Thu Sep 10 2015 14:51:49 GMT-0400 (EDT)
      3827–4235 Fobes Rd, Rock Creek, OH, US
      NaN
      NaN
      2015.0
    
    
      3
      NaN
      NaN
      NaN
      NaN
      HUMAN_OBSERVATION
      NaN
      623289
      Actinopterygii
      204
      Observations
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      2014-04-13
      Beaver Creek
      NaN
      NaN
      2014.0
    
    
      4
      Open Access, http://creativecommons.org/public...
      NaN
      Det. by: Thomas J. Near
      NaN
      PRESERVED_SPECIMEN
      Etheostoma blennioides (YPM ICH 026964)
      YPM ICH 026964
      Actinopterygii
      204
      VZ
      ...
      PhysicalObject
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      perches; perch-like fishes; ray-finned fishes;...
      NaN
      2014.0
    
  

5 rows × 137 columns

3. Examples of simple (meta-)data exploration

Show all unique values of the 'country' column



In [20]:

    
data['country'].unique().tolist()









    Out[20]:





['United States', nan, 'Canada', 'India']



In [21]:

    
data.shape # there are 7226 observations, 138 parameters per observation









    Out[21]:





(7229, 137)



In [22]:

    
data['vernacularname'].unique().tolist() # self-explanatory









    Out[22]:





['perches; perch-like fishes; ray-finned fishes; vertebrates; chordates; animals',
 nan,
 'Greenside Darter',
 'GREENSIDE DARTER',
 'greenside darter']

How about latitude/longitude? Does the data need cleaning?

head() or tail() is only used to limit the tabular output in this notebook. The "data" structure contains it all.



In [23]:

    
data['decimallatitude'].tail(10)









    Out[23]:





7219    39.81718
7220         NaN
7221         NaN
7222    37.05000
7223    41.84000
7224    35.89813
7225    43.38421
7226         NaN
7227         NaN
7228         NaN
Name: decimallatitude, dtype: float64

Hmm, so some values are 'NaN', which means not available.

We can fill them with something (default?), or drop those records where latitude/longitude are not available. Let's drop records where the latitude/longitude data is not available



In [24]:

    
import numpy as np
data_cleaned = data.dropna(subset = ['decimallatitude', 'decimallongitude']) # drop records where data not available



In [25]:

    
data_cleaned.shape # less occurrence records now: 5226









    Out[25]:





(5232, 137)



In [26]:

    
data_cleaned['basisofrecord'].unique()









    Out[26]:





array(['PRESERVED_SPECIMEN', 'HUMAN_OBSERVATION', 'UNKNOWN'], dtype=object)



In [27]:

    
# this many records with no decimalLatitude and decimalLongitude
import numpy as np
data[data['decimallatitude'].isnull() & data['decimallongitude'].isnull()].size









    Out[27]:





273589

How many of those have no 'locality' or 'verbatimLocality'? : 26 apparently.



In [28]:

    
data[data['decimallatitude'].isnull() & 
     data['decimallongitude'].isnull() & 
     data['locality'].isnull() & 
     data['verbatimlocality'].isnull()]









    Out[28]:






  
    
      
      accessrights
      associatedoccurrences
      associatedreferences
      associatedsequences
      basisofrecord
      bibliographiccitation
      catalognumber
      class
      classkey
      collectioncode
      ...
      type
      typestatus
      verbatimcoordinatesystem
      verbatimdepth
      verbatimelevation
      verbatimeventdate
      verbatimlocality
      vernacularname
      waterbody
      year
    
  
  
    
      1269
      http://fieldmuseum.org/about/copyright-informa...
      NaN
      NaN
      NaN
      PRESERVED_SPECIMEN
      NaN
      112647
      Actinopterygii
      204
      Fishes
      ...
      PhysicalObject
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      1999.0
    
    
      4391
      NaN
      NaN
      NaN
      NaN
      UNKNOWN
      NaN
      56-5116
      Actinopterygii
      204
      ON-CDC
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      1972.0
    
    
      4756
      NaN
      NULL
      NaN
      NaN
      PRESERVED_SPECIMEN
      NaN
      GCRL 17588
      Actinopterygii
      204
      Occurrence
      ...
      NaN
      NaN
      NULL
      NULL
      NULL
      NULL
      NaN
      NaN
      NaN
      1968.0
    
    
      5224
      NaN
      NULL
      NaN
      NaN
      PRESERVED_SPECIMEN
      NaN
      GCRL 17585
      Actinopterygii
      204
      Occurrence
      ...
      NaN
      NaN
      NULL
      NULL
      NULL
      NULL
      NaN
      NaN
      NaN
      1964.0
    
    
      5231
      NaN
      NULL
      NaN
      NaN
      PRESERVED_SPECIMEN
      NaN
      GCRL 17580
      Actinopterygii
      204
      Occurrence
      ...
      NaN
      NaN
      NULL
      NULL
      NULL
      NULL
      NaN
      NaN
      NaN
      1964.0
    
    
      5237
      NaN
      NULL
      NaN
      NaN
      PRESERVED_SPECIMEN
      NaN
      GCRL 17578
      Actinopterygii
      204
      Occurrence
      ...
      NaN
      NaN
      NULL
      NULL
      NULL
      NULL
      NaN
      NaN
      NaN
      1964.0
    
    
      5253
      NaN
      NULL
      NaN
      NaN
      PRESERVED_SPECIMEN
      NaN
      GCRL 17583
      Actinopterygii
      204
      Occurrence
      ...
      NaN
      NaN
      NULL
      NULL
      NULL
      NULL
      NaN
      NaN
      NaN
      1964.0
    
    
      5254
      NaN
      NULL
      NaN
      NaN
      PRESERVED_SPECIMEN
      NaN
      GCRL 17598
      Actinopterygii
      204
      Occurrence
      ...
      NaN
      NaN
      NULL
      NULL
      NULL
      NULL
      NaN
      NaN
      NaN
      1964.0
    
    
      5258
      NaN
      NULL
      NaN
      NaN
      PRESERVED_SPECIMEN
      NaN
      GCRL 17592
      Actinopterygii
      204
      Occurrence
      ...
      NaN
      NaN
      NULL
      NULL
      NULL
      NULL
      NaN
      NaN
      NaN
      1964.0
    
    
      5286
      NaN
      NULL
      NaN
      NaN
      PRESERVED_SPECIMEN
      NaN
      GCRL 17582
      Actinopterygii
      204
      Occurrence
      ...
      NaN
      NaN
      NULL
      NULL
      NULL
      NULL
      NaN
      NaN
      NaN
      1964.0
    
    
      5287
      NaN
      NULL
      NaN
      NaN
      PRESERVED_SPECIMEN
      NaN
      GCRL 17591
      Actinopterygii
      204
      Occurrence
      ...
      NaN
      NaN
      NULL
      NULL
      NULL
      NULL
      NaN
      NaN
      NaN
      1964.0
    
    
      5288
      NaN
      NULL
      NaN
      NaN
      PRESERVED_SPECIMEN
      NaN
      GCRL 17584
      Actinopterygii
      204
      Occurrence
      ...
      NaN
      NaN
      NULL
      NULL
      NULL
      NULL
      NaN
      NaN
      NaN
      1964.0
    
    
      6048
      not-for-profit use only
      NaN
      NaN
      NaN
      PRESERVED_SPECIMEN
      NaN
      21736
      Actinopterygii
      204
      Fishes
      ...
      PhysicalObject
      NaN
      NaN
      NaN
      NaN
      19500000
      NaN
      NaN
      NaN
      1950.0
    
    
      6067
      not-for-profit use only
      NaN
      NaN
      NaN
      PRESERVED_SPECIMEN
      NaN
      22581
      Actinopterygii
      204
      Fishes
      ...
      PhysicalObject
      NaN
      NaN
      NaN
      NaN
      19500700
      NaN
      NaN
      NaN
      1950.0
    
    
      6654
      not-for-profit use only
      NaN
      NaN
      NaN
      PRESERVED_SPECIMEN
      NaN
      7092
      Actinopterygii
      204
      Fishes
      ...
      PhysicalObject
      NaN
      NaN
      NaN
      NaN
      19380625
      NaN
      NaN
      NaN
      1938.0
    
    
      7043
      NaN
      NaN
      NaN
      NaN
      PRESERVED_SPECIMEN
      NaN
      692
      Actinopterygii
      204
      SU (ICH)
      ...
      PhysicalObject
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      Roaring river
      NaN
    
    
      7052
      http://fieldmuseum.org/about/copyright-informa...
      NaN
      NaN
      NaN
      PRESERVED_SPECIMEN
      NaN
      112640
      Actinopterygii
      204
      Fishes
      ...
      PhysicalObject
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      7056
      NaN
      NaN
      NaN
      NaN
      PRESERVED_SPECIMEN
      Etheostoma blennioides UAMZ F2503
      F2503
      Actinopterygii
      204
      UAMZ
      ...
      PhysicalObject
      NaN
      NaN
      NaN
      NaN
      unknown
      NaN
      NaN
      NaN
      NaN
    
    
      7061
      NaN
      NaN
      NaN
      NaN
      PRESERVED_SPECIMEN
      NaN
      SU 5326
      Actinopterygii
      204
      Occurrence
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      7066
      NaN
      NaN
      NaN
      NaN
      PRESERVED_SPECIMEN
      NaN
      5326
      Actinopterygii
      204
      SU (ICH)
      ...
      PhysicalObject
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      7076
      NaN
      NaN
      NaN
      NaN
      PRESERVED_SPECIMEN
      NaN
      715
      Actinopterygii
      204
      SU (ICH)
      ...
      PhysicalObject
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      Eel river
      NaN
    
    
      7142
      http://fieldmuseum.org/about/copyright-informa...
      NaN
      NaN
      NaN
      PRESERVED_SPECIMEN
      NaN
      112677
      Actinopterygii
      204
      Fishes
      ...
      PhysicalObject
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      7144
      NaN
      NaN
      NaN
      NaN
      PRESERVED_SPECIMEN
      NaN
      SU 692
      Actinopterygii
      204
      Occurrence
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      Roaring river
      NaN
    
    
      7152
      NaN
      NaN
      NaN
      NaN
      PRESERVED_SPECIMEN
      NaN
      SU 715
      Actinopterygii
      204
      Occurrence
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      Eel river
      NaN
    
    
      7217
      not-for-profit use only
      NaN
      NaN
      NaN
      PRESERVED_SPECIMEN
      NaN
      9050
      Actinopterygii
      204
      Vertebrate Paleontology
      ...
      PhysicalObject
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      7221
      not-for-profit use only
      NaN
      NaN
      NaN
      PRESERVED_SPECIMEN
      NaN
      8498
      Actinopterygii
      204
      Fishes
      ...
      PhysicalObject
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
  

26 rows × 137 columns



In [29]:

    
data_cleaned[['dateidentified', 'day', 'month', 'year']].head()









    Out[29]:






  
    
      
      dateidentified
      day
      month
      year
    
  
  
    
      1
      NaN
      23.0
      5.0
      2015.0
    
    
      2
      2015-09-11T23:37:54.000+0000
      10.0
      9.0
      2015.0
    
    
      3
      2014-04-14T00:24:18.000+0000
      13.0
      4.0
      2014.0
    
    
      4
      NaN
      13.0
      5.0
      2014.0
    
    
      5
      NaN
      14.0
      5.0
      2014.0

Seems like not all records have a 'dateidentified', but 'day','month', 'year' fields are there for many (all?) records. TODO: what about verbatimdate

Select only observation records newer than 2010;

Say that only latitude, longitude, rightsHolder, datasetName columns are interesting for our selection.



In [30]:

    
data_selected = data_cleaned[data_cleaned['year']>2010][['decimallatitude','decimallongitude', 'rightsholder', 'datasetname']]

more filtering: select only those with a non-null datasetName



In [31]:

    
data_selected[~data_selected.datasetname.isnull()].head(10)









    Out[31]:






  
    
      
      decimallatitude
      decimallongitude
      rightsholder
      datasetname
    
  
  
    
      2
      41.79664
      -80.97289
      Robert L Curtis
      iNaturalist research-grade observations
    
    
      3
      37.97240
      -83.56716
      Brian Wulker
      iNaturalist research-grade observations
    
    
      11
      35.17770
      -83.88780
      North Carolina Museum of Natural Sciences
      NCSM Fishes Collection
    
    
      14
      35.07780
      -83.97430
      North Carolina Museum of Natural Sciences
      NCSM Fishes Collection
    
    
      18
      35.16030
      -83.92020
      North Carolina Museum of Natural Sciences
      NCSM Fishes Collection
    
    
      30
      36.41300
      -81.40710
      North Carolina Museum of Natural Sciences
      NCSM Fishes Collection
    
    
      48
      36.40790
      -81.40160
      North Carolina Museum of Natural Sciences
      NCSM Fishes Collection
    
    
      52
      36.54960
      -81.00230
      North Carolina Museum of Natural Sciences
      NCSM Fishes Collection
    
    
      53
      36.55790
      -81.21670
      North Carolina Museum of Natural Sciences
      NCSM Fishes Collection
    
    
      55
      36.06540
      -91.61040
      NaN
      Auburn University Museum Fish Collection

If you hare happy with this filtering, and you want to save the species data:



In [32]:

    
my_species.set_data(data_selected) # update the object "my_species" to contain the filtered data



In [33]:

    
my_species.save_data(file_name="updated_dataset.pkl")









    



Saved data: /home/daniela/git/iSDM/notebooks/updated_dataset.pkl 
Type of data: <class 'pandas.core.frame.DataFrame'>

Plot our filtered selection



In [34]:

    
my_species.plot_species_occurrence()









    



Data geometrized: converted into GeoPandas dataframe.
Points with NaN coordinnates ignored.



In [35]:

    
my_species.get_data().shape # there are 119 records now









    Out[35]:





(119, 5)

4. Load data from downloaded csv file (from GBIF website, not API; differs a bit)



In [36]:

    
csv_data = my_species.load_csv('../data/GBIF.csv')









    



Loading data from: ../data/GBIF.csv
Succesfully loaded previously saved CSV data.
Updated species ID: 2382397



In [37]:

    
csv_data.head() # let's peak into the data









    Out[37]:






  
    
      
      gbifid
      datasetkey
      occurrenceid
      kingdom
      phylum
      class
      order
      family
      genus
      species
      ...
      recordnumber
      identifiedby
      rights
      rightsholder
      recordedby
      typestatus
      establishmentmeans
      lastinterpreted
      mediatype
      issue
    
  
  
    
      0
      1224542608
      71e6db8e-f762-11e1-a439-00145eb45e9a
      urn:catalog:OMNH:FISH:85718
      Animalia
      Chordata
      Actinopterygii
      Perciformes
      Percidae
      Etheostoma
      Etheostoma blennioides
      ...
      NaN
      Dr. Aaron Geheber
      NaN
      Sam Noble Oklahoma Museum of Natural History
      Aaron Geheber
      NaN
      NaN
      2015-12-23T21:01Z
      NaN
      GEODETIC_DATUM_ASSUMED_WGS84
    
    
      1
      17598896
      83a8c0da-f762-11e1-a439-00145eb45e9a
      NaN
      Animalia
      Chordata
      Actinopterygii
      Perciformes
      Percidae
      Etheostoma
      Etheostoma blennioides
      ...
      NaN
      Baldwin, M.E.
      NaN
      NaN
      Baldwin, M.E.; Bowlby, J.N.
      NaN
      NaN
      2014-06-04T23:44Z
      NaN
      GEODETIC_DATUM_ASSUMED_WGS84
    
    
      2
      17598905
      83a8c0da-f762-11e1-a439-00145eb45e9a
      NaN
      Animalia
      Chordata
      Actinopterygii
      Perciformes
      Percidae
      Etheostoma
      Etheostoma blennioides
      ...
      NaN
      Baldwin, Mary Elizabeth
      NaN
      NaN
      Baldwin, Mary Elizabeth; Casbourn, Hugh R.
      NaN
      NaN
      2014-06-04T23:44Z
      NaN
      GEODETIC_DATUM_ASSUMED_WGS84
    
    
      3
      198193430
      961f602a-f762-11e1-a439-00145eb45e9a
      NaN
      Animalia
      Chordata
      Actinopterygii
      Perciformes
      Percidae
      Etheostoma
      Etheostoma blennioides
      ...
      NaN
      NaN
      NaN
      NaN
      R.D. Suttkus, Eaton & Donahue
      NaN
      NaN
      2014-06-05T03:09Z
      NaN
      GEODETIC_DATUM_ASSUMED_WGS84
    
    
      4
      198193618
      961f602a-f762-11e1-a439-00145eb45e9a
      NaN
      Animalia
      Chordata
      Actinopterygii
      Perciformes
      Percidae
      Etheostoma
      Etheostoma blennioides
      ...
      NaN
      NaN
      NaN
      NaN
      R.D. Suttkus, J.S. Ramsey & M.D. Dahlberg
      NaN
      NaN
      2014-06-05T03:09Z
      NaN
      TAXON_MATCH_HIGHERRANK;GEODETIC_DATUM_ASSUMED_...
    
  

5 rows × 42 columns



In [38]:

    
csv_data['specieskey'].unique()









    Out[38]:





array([2382397])



In [39]:

    
my_species.save_data() # by default this 'speciesKey' is used. Alternative name can be provided









    



Saved data: /home/daniela/git/iSDM/notebooks/Etheostoma_blennioides2382397.pkl 
Type of data: <class 'pandas.core.frame.DataFrame'>



In [40]:

    
csv_data.columns.size # csv data for some reason a lot less columns









    Out[40]:





42



In [41]:

    
data.columns.size # data from using GBIF API directly









    Out[41]:





137

Which columns are in 'data', but not in 'csv_data'?



In [42]:

    
list(set(data.columns.tolist()) - set(csv_data.columns.tolist())) # hmm, 'decimalLatitude' vs 'decimallatitude'









    Out[42]:





['references',
 'country',
 'identificationverificationstatus',
 'islandgroup',
 'identifier',
 'genuskey',
 'higherclassification',
 'media',
 'enddayofyear',
 'relations',
 'stateprovince',
 'fieldnotes',
 'individualcount',
 'license',
 'datasetid',
 'verbatimdepth',
 'island',
 'eventremarks',
 'lastparsed',
 'accessrights',
 'locationremarks',
 'familykey',
 'geodeticdatum',
 'samplingprotocol',
 'genericname',
 'language',
 'identificationremarks',
 'lastcrawled',
 'issues',
 'associatedoccurrences',
 'protocol',
 'ownerinstitutioncode',
 'eventid',
 'waterbody',
 'georeferencesources',
 'associatedreferences',
 'locationid',
 'continent',
 'dynamicproperties',
 'habitat',
 'highergeography',
 'collectionid',
 'locationaccordingto',
 'datasetname',
 'startdayofyear',
 'organismid',
 'disposition',
 'footprintwkt',
 'phylumkey',
 'taxonid',
 'kingdomkey',
 'informationwithheld',
 'lifestage',
 'previousidentifications',
 'georeferenceprotocol',
 'coordinateprecision',
 'http://unknown.org/occurrencedetails',
 'orderkey',
 'scientificnameid',
 'othercatalognumbers',
 'created',
 'associatedsequences',
 'fieldnumber',
 'modified',
 'facts',
 'occurrencestatus',
 'specificepithet',
 'taxonremarks',
 'source',
 'county',
 'municipality',
 'verbatimelevation',
 'parentnameusage',
 'georeferenceremarks',
 'georeferenceddate',
 'dateidentified',
 'type',
 'key',
 'bibliographiccitation',
 'vernacularname',
 'verbatimlocality',
 'nomenclaturalcode',
 'occurrenceremarks',
 'classkey',
 'verbatimeventdate',
 'preparations',
 'institutionid',
 'publishingcountry',
 'http://unknown.org/organismid',
 'extensions',
 'georeferencedby',
 'verbatimcoordinatesystem',
 'identificationqualifier',
 'eventtime',
 'identificationid',
 'identifiers',
 'coordinateuncertaintyinmeters',
 'georeferenceverificationstatus']

Which columns are in 'csv_data' but not in 'data'?



In [43]:

    
list(set(csv_data.columns.tolist()) - set(data.columns.tolist())) # hmm, not many









    Out[43]:





['mediatype', 'infraspecificepithet', 'issue']

5. Geometrizing the GBIF presence-only point records.

One way of converting point-records (lat/lon) to geometric shapes is by expanding each sample point into a buffer (or "polygon of influence"), and simplifying + merging the overlapping buffers into a cascaded union.



In [44]:

    
geometrized_species =  my_species.polygonize()  # returns a geopandas dataframe with a geometry column.









    



Data geometrized: converted into GeoPandas dataframe.
Points with NaN coordinnates ignored. 
Data polygonized without envelope.
Cascaded union of polygons created.



In [45]:

    
geometrized_species









    Out[45]:






  
    
      
      geometry
    
  
  
    
      0
      POLYGON ((-82.50020000000001 32.7504, -82.7930...
    
    
      1
      POLYGON ((-77.16722 35.54528, -77.460113218813...
    
    
      2
      POLYGON ((-124 44.65, -124.2928932188134 43.94...
    
    
      3
      POLYGON ((-85.23066999999999 44.35975999999999...
    
    
      4
      POLYGON ((-81.2 45.858, -81.49289321881345 45....
    
    
      5
      POLYGON ((-92.26971999999999 33.30278, -92.562...



In [46]:

    
geometrized_species.plot()  # each isolated polygon is a separate record (do we want that or?)









    Out[46]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f47ecac51d0>



In [47]:

    
# we can tweak the parameters for the polygonize function
geometrized_species = my_species.polygonize(buffer_distance=0.2, simplify_tolerance=0.02)
geometrized_species.plot()









    



Data polygonized without envelope.
Cascaded union of polygons created.






    Out[47]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f47ecfaa240>



In [48]:

    
my_species.get_data().shape









    Out[48]:





(5215, 43)



In [49]:

    
# with_envelope means "pixelized" (envelope around each buffer region)
geometrized_species = my_species.polygonize(buffer_distance=0.3, simplify_tolerance=0.03, with_envelope=True)
geometrized_species.plot()









    



Data polygonized with envelope.
Cascaded union of polygons created.






    Out[49]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f47e93a7320>

6. Cut the outliers, i.e., overlay with a polygon

Define a "zoom-in" polygon that we use for selecting a subset of the data.



In [50]:

    
from shapely.geometry import Point, Polygon



In [51]:

    
# say we want to crop to this polygon area only
overlay_polygon = Polygon(((-100,30), (-100, 50), (-70, 50),(-70, 30)))



In [52]:

    
# Beware, this overwrites the original my_species data ("data_full" field)
my_species.data_full = my_species.data_full[my_species.data_full.geometry.within(overlay_polygon)]



In [53]:

    
my_species.polygonize().plot()









    



Data polygonized without envelope.
Cascaded union of polygons created.






    Out[53]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f47e91db860>



In [54]:

    
my_species.polygonize(buffer_distance=0.5, simplify_tolerance=0.05).plot() # more fine-grained









    



Data polygonized without envelope.
Cascaded union of polygons created.






    Out[54]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f47e9292828>



In [58]:

    
my_species.polygonize(buffer_distance=0.3, simplify_tolerance=0.03).plot()  # etc









    



Data polygonized without envelope.
Cascaded union of polygons created.






    Out[58]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f47e9188828>



In [59]:

    
my_species.polygonize(buffer_distance=0.3, simplify_tolerance=0.03, with_envelope=True).plot() # with_envelope means pixelized









    



Data polygonized with envelope.
Cascaded union of polygons created.






    Out[59]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f47e92a2eb8>



In [60]:

    
# we can further simplify with a "convex hull" around each polygon
my_species.polygonize().geometry.convex_hull.plot()









    



Data polygonized without envelope.
Cascaded union of polygons created.






    Out[60]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f47e8ea65c0>



In [61]:

    
polygonized_species = my_species.polygonize()









    



Data polygonized without envelope.
Cascaded union of polygons created.



In [62]:

    
polygonized_species









    Out[62]:






  
    
      
      geometry
    
  
  
    
      0
      POLYGON ((-82.50020000000001 32.7504, -82.7930...
    
    
      1
      POLYGON ((-77.16722 35.54528, -77.460113218813...
    
    
      2
      POLYGON ((-85.23066999999999 44.35975999999999...
    
    
      3
      POLYGON ((-81.2 45.858, -81.49289321881345 45....
    
    
      4
      POLYGON ((-92.26971999999999 33.30278, -92.562...



In [63]:

    
polygonized_species.plot()









    Out[63]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f47e8e255f8>

6. Low-level manipulations



In [64]:

    
# We can make a union of all polygons into one "multipolygon" (Do we need this? I can make a wrapper if needed)
import shapely.ops
my_multipolygon = shapely.ops.cascaded_union(polygonized_species.geometry.tolist())
my_multipolygon









    Out[64]:



In [65]:

    
from geopandas import GeoDataFrame, GeoSeries
new_series = GeoSeries(shapely.ops.cascaded_union(polygonized_species.geometry.tolist()))
new_series









    Out[65]:





0    (POLYGON ((-77.16722 35.54528, -77.46011321881...
dtype: object



In [66]:

    
new_series.plot()









    Out[66]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f47e8e71a58>



In [67]:

    
new_series.convex_hull.plot()









    Out[67]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f47e8ee6668>



In [68]:

    
my_species.data_full.geometry.total_bounds









    Out[68]:





(-97.0792, 32.704440000000005, -74.257649999999998, 45.858000000000004)



In [69]:

    
my_species.data_full.geometry.bounds.minx.min()









    Out[69]:





-97.0792



In [70]:

    
my_species.data_full.geometry.bounds.miny.min()









    Out[70]:





32.704440000000005



In [71]:

    
my_species.data_full.geometry.bounds.maxx.max()









    Out[71]:





-74.257649999999998



In [72]:

    
my_species.data_full.geometry.bounds.maxy.max()









    Out[72]:





45.858000000000004



In [ ]:

	accessrights	associatedoccurrences	associatedreferences	associatedsequences	basisofrecord	bibliographiccitation	catalognumber	class	classkey	collectioncode	...	type	typestatus	verbatimcoordinatesystem	verbatimdepth	verbatimelevation	verbatimeventdate	verbatimlocality	vernacularname	waterbody	year
0	Open Access, http://creativecommons.org/public...	NaN	NaN	NaN	PRESERVED_SPECIMEN	Etheostoma blennioides (YPM ICH 028684)	YPM ICH 028684	Actinopterygii	204	VZ	...	PhysicalObject	NaN	NaN	NaN	NaN	NaN	NaN	perches; perch-like fishes; ray-finned fishes;...	NaN	2016.0
1	Open Access, http://creativecommons.org/public...	NaN	NaN	NaN	PRESERVED_SPECIMEN	Etheostoma blennioides (YPM ICH 028456)	YPM ICH 028456	Actinopterygii	204	VZ	...	PhysicalObject	NaN	NaN	NaN	NaN	NaN	NaN	perches; perch-like fishes; ray-finned fishes;...	NaN	2015.0
2	NaN	NaN	NaN	NaN	HUMAN_OBSERVATION	NaN	1937841	Actinopterygii	204	Observations	...	NaN	NaN	NaN	NaN	NaN	Thu Sep 10 2015 14:51:49 GMT-0400 (EDT)	3827–4235 Fobes Rd, Rock Creek, OH, US	NaN	NaN	2015.0
3	NaN	NaN	NaN	NaN	HUMAN_OBSERVATION	NaN	623289	Actinopterygii	204	Observations	...	NaN	NaN	NaN	NaN	NaN	2014-04-13	Beaver Creek	NaN	NaN	2014.0
4	Open Access, http://creativecommons.org/public...	NaN	Det. by: Thomas J. Near	NaN	PRESERVED_SPECIMEN	Etheostoma blennioides (YPM ICH 026964)	YPM ICH 026964	Actinopterygii	204	VZ	...	PhysicalObject	NaN	NaN	NaN	NaN	NaN	NaN	perches; perch-like fishes; ray-finned fishes;...	NaN	2014.0

	accessrights	associatedoccurrences	associatedreferences	associatedsequences	basisofrecord	bibliographiccitation	catalognumber	class	classkey	collectioncode	...	type	typestatus	verbatimcoordinatesystem	verbatimdepth	verbatimelevation	verbatimeventdate	verbatimlocality	vernacularname	waterbody	year
1269	http://fieldmuseum.org/about/copyright-informa...	NaN	NaN	NaN	PRESERVED_SPECIMEN	NaN	112647	Actinopterygii	204	Fishes	...	PhysicalObject	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1999.0
4391	NaN	NaN	NaN	NaN	UNKNOWN	NaN	56-5116	Actinopterygii	204	ON-CDC	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1972.0
4756	NaN	NULL	NaN	NaN	PRESERVED_SPECIMEN	NaN	GCRL 17588	Actinopterygii	204	Occurrence	...	NaN	NaN	NULL	NULL	NULL	NULL	NaN	NaN	NaN	1968.0
5224	NaN	NULL	NaN	NaN	PRESERVED_SPECIMEN	NaN	GCRL 17585	Actinopterygii	204	Occurrence	...	NaN	NaN	NULL	NULL	NULL	NULL	NaN	NaN	NaN	1964.0
5231	NaN	NULL	NaN	NaN	PRESERVED_SPECIMEN	NaN	GCRL 17580	Actinopterygii	204	Occurrence	...	NaN	NaN	NULL	NULL	NULL	NULL	NaN	NaN	NaN	1964.0
5237	NaN	NULL	NaN	NaN	PRESERVED_SPECIMEN	NaN	GCRL 17578	Actinopterygii	204	Occurrence	...	NaN	NaN	NULL	NULL	NULL	NULL	NaN	NaN	NaN	1964.0
5253	NaN	NULL	NaN	NaN	PRESERVED_SPECIMEN	NaN	GCRL 17583	Actinopterygii	204	Occurrence	...	NaN	NaN	NULL	NULL	NULL	NULL	NaN	NaN	NaN	1964.0
5254	NaN	NULL	NaN	NaN	PRESERVED_SPECIMEN	NaN	GCRL 17598	Actinopterygii	204	Occurrence	...	NaN	NaN	NULL	NULL	NULL	NULL	NaN	NaN	NaN	1964.0
5258	NaN	NULL	NaN	NaN	PRESERVED_SPECIMEN	NaN	GCRL 17592	Actinopterygii	204	Occurrence	...	NaN	NaN	NULL	NULL	NULL	NULL	NaN	NaN	NaN	1964.0
5286	NaN	NULL	NaN	NaN	PRESERVED_SPECIMEN	NaN	GCRL 17582	Actinopterygii	204	Occurrence	...	NaN	NaN	NULL	NULL	NULL	NULL	NaN	NaN	NaN	1964.0
5287	NaN	NULL	NaN	NaN	PRESERVED_SPECIMEN	NaN	GCRL 17591	Actinopterygii	204	Occurrence	...	NaN	NaN	NULL	NULL	NULL	NULL	NaN	NaN	NaN	1964.0
5288	NaN	NULL	NaN	NaN	PRESERVED_SPECIMEN	NaN	GCRL 17584	Actinopterygii	204	Occurrence	...	NaN	NaN	NULL	NULL	NULL	NULL	NaN	NaN	NaN	1964.0
6048	not-for-profit use only	NaN	NaN	NaN	PRESERVED_SPECIMEN	NaN	21736	Actinopterygii	204	Fishes	...	PhysicalObject	NaN	NaN	NaN	NaN	19500000	NaN	NaN	NaN	1950.0
6067	not-for-profit use only	NaN	NaN	NaN	PRESERVED_SPECIMEN	NaN	22581	Actinopterygii	204	Fishes	...	PhysicalObject	NaN	NaN	NaN	NaN	19500700	NaN	NaN	NaN	1950.0
6654	not-for-profit use only	NaN	NaN	NaN	PRESERVED_SPECIMEN	NaN	7092	Actinopterygii	204	Fishes	...	PhysicalObject	NaN	NaN	NaN	NaN	19380625	NaN	NaN	NaN	1938.0
7043	NaN	NaN	NaN	NaN	PRESERVED_SPECIMEN	NaN	692	Actinopterygii	204	SU (ICH)	...	PhysicalObject	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Roaring river	NaN
7052	http://fieldmuseum.org/about/copyright-informa...	NaN	NaN	NaN	PRESERVED_SPECIMEN	NaN	112640	Actinopterygii	204	Fishes	...	PhysicalObject	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
7056	NaN	NaN	NaN	NaN	PRESERVED_SPECIMEN	Etheostoma blennioides UAMZ F2503	F2503	Actinopterygii	204	UAMZ	...	PhysicalObject	NaN	NaN	NaN	NaN	unknown	NaN	NaN	NaN	NaN
7061	NaN	NaN	NaN	NaN	PRESERVED_SPECIMEN	NaN	SU 5326	Actinopterygii	204	Occurrence	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
7066	NaN	NaN	NaN	NaN	PRESERVED_SPECIMEN	NaN	5326	Actinopterygii	204	SU (ICH)	...	PhysicalObject	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
7076	NaN	NaN	NaN	NaN	PRESERVED_SPECIMEN	NaN	715	Actinopterygii	204	SU (ICH)	...	PhysicalObject	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Eel river	NaN
7142	http://fieldmuseum.org/about/copyright-informa...	NaN	NaN	NaN	PRESERVED_SPECIMEN	NaN	112677	Actinopterygii	204	Fishes	...	PhysicalObject	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
7144	NaN	NaN	NaN	NaN	PRESERVED_SPECIMEN	NaN	SU 692	Actinopterygii	204	Occurrence	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Roaring river	NaN
7152	NaN	NaN	NaN	NaN	PRESERVED_SPECIMEN	NaN	SU 715	Actinopterygii	204	Occurrence	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Eel river	NaN
7217	not-for-profit use only	NaN	NaN	NaN	PRESERVED_SPECIMEN	NaN	9050	Actinopterygii	204	Vertebrate Paleontology	...	PhysicalObject	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
7221	not-for-profit use only	NaN	NaN	NaN	PRESERVED_SPECIMEN	NaN	8498	Actinopterygii	204	Fishes	...	PhysicalObject	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

	dateidentified	day	month	year
1	NaN	23.0	5.0	2015.0
2	2015-09-11T23:37:54.000+0000	10.0	9.0	2015.0
3	2014-04-14T00:24:18.000+0000	13.0	4.0	2014.0
4	NaN	13.0	5.0	2014.0
5	NaN	14.0	5.0	2014.0

	decimallatitude	decimallongitude	rightsholder	datasetname
2	41.79664	-80.97289	Robert L Curtis	iNaturalist research-grade observations
3	37.97240	-83.56716	Brian Wulker	iNaturalist research-grade observations
11	35.17770	-83.88780	North Carolina Museum of Natural Sciences	NCSM Fishes Collection
14	35.07780	-83.97430	North Carolina Museum of Natural Sciences	NCSM Fishes Collection
18	35.16030	-83.92020	North Carolina Museum of Natural Sciences	NCSM Fishes Collection
30	36.41300	-81.40710	North Carolina Museum of Natural Sciences	NCSM Fishes Collection
48	36.40790	-81.40160	North Carolina Museum of Natural Sciences	NCSM Fishes Collection
52	36.54960	-81.00230	North Carolina Museum of Natural Sciences	NCSM Fishes Collection
53	36.55790	-81.21670	North Carolina Museum of Natural Sciences	NCSM Fishes Collection
55	36.06540	-91.61040	NaN	Auburn University Museum Fish Collection

	gbifid	datasetkey	occurrenceid	kingdom	phylum	class	order	family	genus	species	...	recordnumber	identifiedby	rights	rightsholder	recordedby	typestatus	establishmentmeans	lastinterpreted	mediatype	issue
0	1224542608	71e6db8e-f762-11e1-a439-00145eb45e9a	urn:catalog:OMNH:FISH:85718	Animalia	Chordata	Actinopterygii	Perciformes	Percidae	Etheostoma	Etheostoma blennioides	...	NaN	Dr. Aaron Geheber	NaN	Sam Noble Oklahoma Museum of Natural History	Aaron Geheber	NaN	NaN	2015-12-23T21:01Z	NaN	GEODETIC_DATUM_ASSUMED_WGS84
1	17598896	83a8c0da-f762-11e1-a439-00145eb45e9a	NaN	Animalia	Chordata	Actinopterygii	Perciformes	Percidae	Etheostoma	Etheostoma blennioides	...	NaN	Baldwin, M.E.	NaN	NaN	Baldwin, M.E.; Bowlby, J.N.	NaN	NaN	2014-06-04T23:44Z	NaN	GEODETIC_DATUM_ASSUMED_WGS84
2	17598905	83a8c0da-f762-11e1-a439-00145eb45e9a	NaN	Animalia	Chordata	Actinopterygii	Perciformes	Percidae	Etheostoma	Etheostoma blennioides	...	NaN	Baldwin, Mary Elizabeth	NaN	NaN	Baldwin, Mary Elizabeth; Casbourn, Hugh R.	NaN	NaN	2014-06-04T23:44Z	NaN	GEODETIC_DATUM_ASSUMED_WGS84
3	198193430	961f602a-f762-11e1-a439-00145eb45e9a	NaN	Animalia	Chordata	Actinopterygii	Perciformes	Percidae	Etheostoma	Etheostoma blennioides	...	NaN	NaN	NaN	NaN	R.D. Suttkus, Eaton & Donahue	NaN	NaN	2014-06-05T03:09Z	NaN	GEODETIC_DATUM_ASSUMED_WGS84
4	198193618	961f602a-f762-11e1-a439-00145eb45e9a	NaN	Animalia	Chordata	Actinopterygii	Perciformes	Percidae	Etheostoma	Etheostoma blennioides	...	NaN	NaN	NaN	NaN	R.D. Suttkus, J.S. Ramsey & M.D. Dahlberg	NaN	NaN	2014-06-05T03:09Z	NaN	TAXON_MATCH_HIGHERRANK;GEODETIC_DATUM_ASSUMED_...

	geometry
0	POLYGON ((-82.50020000000001 32.7504, -82.7930...
1	POLYGON ((-77.16722 35.54528, -77.460113218813...
2	POLYGON ((-124 44.65, -124.2928932188134 43.94...
3	POLYGON ((-85.23066999999999 44.35975999999999...
4	POLYGON ((-81.2 45.858, -81.49289321881345 45....
5	POLYGON ((-92.26971999999999 33.30278, -92.562...