Calling R libraries from Python

In this example we will explore the Coral Reef Evaluation and Monitoring Project (CREMP) data available in the Gulf of Mexico Coastal Ocean Observing System (GCOOS) ERDDAP server.

To access the server we will use the rerddap library and export the data to Python for easier plotting.

The first step is to load the rpy2 extension that will allow us to use the R libraries.



In [1]:

    
%load_ext rpy2.ipython

The first line below has a %%R to make it an R cell. The code below specify the GCOOS server and fetches the data information for the cremp_fk_v2_1996 dataset.

For more information on rerddap please see https://rmendels.github.io/Using_rerddap.nb.html.



In [2]:

    
%%R

library('rerddap')

url <- 'http://gcoos4.tamu.edu:8080/erddap/'
data_info <- rerddap::info('cremp_fk_1996_v20', url=url)

data_info









    





<ERDDAP info> cremp_fk_1996_v20 
 Dimensions (range):  
 Variables:  
     AverageNumberOfPoints: 
         Range: 1045, 2625 
     basisOfRecord: 
     bottomType: 
     class: 
     country: 
     datasetID: 
     datasetName: 
     depth: 
         Range: 3.0, 54.0 
         Units: m 
     dynamicProperties: 
     eventDate: 
     eventDateRemarks: 
     eventDateTimeZone: 
     family: 
     firstYear: 
         Range: 1996, 1996 
     genus: 
     geodeticDatum: 
     habitat: 
     higherInstitutionCode: 
     institutionCode: 
     kingdom: 
     lastYear: 
         Range: 2000, 2015 
     latitude: 
         Range: 24.4517, 25.2953 
         Units: degrees_north 
     locality: 
     longitude: 
         Range: -81.9195, -80.2087 
         Units: degrees_east 
     materialSampleID: 
     maximumDepthInMeters: 
         Range: 3.0, 54.0 
         Units: m 
     minimumDepthInMeters: 
         Range: 3.0, 54.0 
         Units: m 
     order: 
     ownerInstitutionCode: 
     phylum: 
     protection: 
     quantificationMethod: 
     quantificationName: 
     quantificationStatus: 
     quantificationUnit: 
     quantificationValue: 
         Range: 0.0, 0.919800885 
     recordedBy: 
     region: 
     sampleAreaInSquareMeters: 
         Range: 19.0, 26.0 
         Units: m^2 
     Samples: 
         Range: 0, 11679 
     scientificName: 
     scientificNameAuthorship: 
     scientificNameIDITIS: 
         Range: -999, 914166 
     scientificNameIDWoRMS: 
         Range: -999, 828326 
     specificEpithet: 
     stateProvince: 
     stationNumber: 
         Range: 101, 814 
     subSampleID: 
     taxonRank: 
     time: 
         Range: 8.204544E8, 8.204544E8 
         Units: seconds since 1970-01-01T00:00:00Z 
     timeUncertainty: 
         Units: sec 
     underwaterVisibilityInMeters: 
     vernacularName: 
     verticalDatum: 
     waterBody: 
     waterTemperatureInCelsius: 
         Units: C

By inspecting the information above we can find the variables available in the dateset and use the tabledap function to download them.

Note that the %%R -o rdf will export the rdf variable back to the Python workspace.



In [3]:

    
%%R -o rdf

fields <- c(
    'Samples',
    'depth',
    'time',
    'longitude',
    'latitude',
    'scientificName',
    'habitat',
    'genus',
    'quantificationValue'
)

rdf <- tabledap(
    data_info,
    fields=fields,
    url=url
)

Now we need to export the R DataFrame to a pandas objects and ensure that all numeric types are numbers and not strings.



In [4]:

    
import pandas as pd

from rpy2.robjects import pandas2ri

pandas2ri.activate()
df = pandas2ri.ri2py_dataframe(rdf)

cols = ["longitude", "latitude", "depth", "quantificationValue"]
df[cols] = df[cols].apply(pd.to_numeric)

df.head()









    Out[4]:







  
    
      
      Samples
      depth
      time
      longitude
      latitude
      scientificName
      habitat
      genus
      quantificationValue
    
  
  
    
      2
      0
      6.0
      1996-01-01T00:00:00Z
      -80.3475
      25.1736
      Acropora cervicornis
      Hard Bottom
      Acropora
      0.0
    
    
      3
      1
      6.0
      1996-01-01T00:00:00Z
      -80.3475
      25.1736
      Acropora cervicornis
      Hard Bottom
      Acropora
      0.0
    
    
      4
      2
      6.0
      1996-01-01T00:00:00Z
      -80.3475
      25.1736
      Acropora cervicornis
      Hard Bottom
      Acropora
      0.0
    
    
      5
      3
      6.0
      1996-01-01T00:00:00Z
      -80.3475
      25.1736
      Acropora cervicornis
      Hard Bottom
      Acropora
      0.0
    
    
      6
      4
      9.0
      1996-01-01T00:00:00Z
      -80.3782
      25.1201
      Acropora cervicornis
      Hard Bottom
      Acropora
      0.0

We can navigate to ERDDAP's info page to find the variables description. Let's check what is quantificationValue:

The is value of the derived information product, such as the numerical value for biomass. This term does not include units. Mean number of observed fish per species for 5 Minutes

We can see that quantificationValue has a lot of zero values, let's remove that first to plot the data positions only where something was found.



In [5]:

    
# Filter invalid values (-999).

cremp_1996 = df.loc[df["quantificationValue"] >= 0]
cremp_1996.head()









    Out[5]:







  
    
      
      Samples
      depth
      time
      longitude
      latitude
      scientificName
      habitat
      genus
      quantificationValue
    
  
  
    
      2
      0
      6.0
      1996-01-01T00:00:00Z
      -80.3475
      25.1736
      Acropora cervicornis
      Hard Bottom
      Acropora
      0.0
    
    
      3
      1
      6.0
      1996-01-01T00:00:00Z
      -80.3475
      25.1736
      Acropora cervicornis
      Hard Bottom
      Acropora
      0.0
    
    
      4
      2
      6.0
      1996-01-01T00:00:00Z
      -80.3475
      25.1736
      Acropora cervicornis
      Hard Bottom
      Acropora
      0.0
    
    
      5
      3
      6.0
      1996-01-01T00:00:00Z
      -80.3475
      25.1736
      Acropora cervicornis
      Hard Bottom
      Acropora
      0.0
    
    
      6
      4
      9.0
      1996-01-01T00:00:00Z
      -80.3782
      25.1201
      Acropora cervicornis
      Hard Bottom
      Acropora
      0.0

What is the most common genus of Coral observed?



In [6]:

    
avg = cremp_1996.groupby("genus").mean()

avg









    Out[6]:







  
    
      
      Samples
      depth
      longitude
      latitude
      quantificationValue
    
    
      genus
      
      
      
      
      
    
  
  
    
      
      8010.928571
      23.2375
      -81.046667
      24.768957
      0.128788
    
    
      Acropora
      159.500000
      23.2375
      -81.046667
      24.768957
      0.005494
    
    
      Agaricia
      479.500000
      23.2375
      -81.046667
      24.768957
      0.000003
    
    
      Cladocora
      719.500000
      23.2375
      -81.046667
      24.768958
      0.000000
    
    
      Colpophyllia
      879.500000
      23.2375
      -81.046667
      24.768958
      0.004462
    
    
      Dendrogyra
      1039.500000
      23.2375
      -81.046667
      24.768958
      0.000977
    
    
      Diadema
      10159.500000
      23.2375
      -81.046667
      24.768958
      0.000007
    
    
      Dichocoenia
      1199.500000
      23.2375
      -81.046667
      24.768958
      0.000509
    
    
      Diploria
      1359.500000
      23.2375
      -81.046667
      24.768958
      0.000999
    
    
      Eusmilia
      1519.500000
      23.2375
      -81.046667
      24.768958
      0.000091
    
    
      Favia
      1679.500000
      23.2375
      -81.046667
      24.768958
      0.000045
    
    
      Helioseris
      1839.500000
      23.2375
      -81.046667
      24.768958
      0.000004
    
    
      Isophyllia
      2079.500000
      23.2375
      -81.046667
      24.768957
      0.000006
    
    
      Madracis
      2399.500000
      23.2375
      -81.046667
      24.768957
      0.000040
    
    
      Manicina
      2639.500000
      23.2375
      -81.046667
      24.768958
      0.000007
    
    
      Meandrina
      2799.500000
      23.2375
      -81.046667
      24.768958
      0.000363
    
    
      Millepora
      3039.500000
      23.2375
      -81.046667
      24.768957
      0.006159
    
    
      Montastraea
      3279.500000
      23.2375
      -81.046667
      24.768958
      0.013597
    
    
      Mussa
      3439.500000
      23.2375
      -81.046667
      24.768958
      0.000061
    
    
      Mycetophyllia
      3759.500000
      23.2375
      -81.046667
      24.768957
      0.000144
    
    
      Oculina
      4159.500000
      23.2375
      -81.046667
      24.768957
      0.000046
    
    
      Orbicella
      4399.500000
      23.2375
      -81.046667
      24.768958
      0.033860
    
    
      Other
      10319.500000
      23.2375
      -81.046667
      24.768958
      0.000161
    
    
      Porites
      4639.500000
      23.2375
      -81.046667
      24.768957
      0.003372
    
    
      Pseudodiploria
      4959.500000
      23.2375
      -81.046667
      24.768957
      0.001024
    
    
      Scolymia
      5359.500000
      23.2375
      -81.046667
      24.768958
      0.000009
    
    
      Siderastrea
      5599.500000
      23.2375
      -81.046667
      24.768957
      0.004603
    
    
      Solenastrea
      5919.500000
      23.2375
      -81.046667
      24.768957
      0.000048
    
    
      Stephanocoenia
      6159.500000
      23.2375
      -81.046667
      24.768958
      0.000263
    
    
      Undaria
      6319.500000
      23.2375
      -81.046667
      24.768958
      0.001044

rerddap's info request does not have enough metadata about the variables to explain the blank, and most abundant, genus. Checking the sever did not help figure that out. We'll remove that for now to deal with only those that are identified.



In [7]:

    
cremp_1996 = cremp_1996.loc[cremp_1996["genus"] != ""]

There are also many genus with zero biomass count. In this example we'll choose to do a biased analysis of occurrence and eliminate those where nothing was observed.



In [8]:

    
# Filter zero values (nothing was observed).

cremp_1996 = cremp_1996.loc[cremp_1996["quantificationValue"] > 0]

Now we can check the quantificationValue average by genus.



In [9]:

    
%matplotlib inline

avg = cremp_1996.groupby("genus").mean()  # re-compute the "biased" average.
ax = avg["quantificationValue"].plot(kind="bar")

and habitat.



In [10]:

    
ax = cremp_1996.groupby("habitat").mean()["quantificationValue"].plot(kind="bar")

It seems the most of the biomass was found around the Dendrogyra genus in Patch Reef habitats. But where are those Coral Reefs? How is the distribution of top three species with more biomass around them? With a pandas DataFrame it is easy to group the data by location and count the genus occurrence based on it.



In [11]:

    
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import numpy as np
from cartopy.mpl.ticker import LatitudeFormatter, LongitudeFormatter


def make_plot():
    bbox = [-82, -80, 24, 26]
    projection = ccrs.PlateCarree()

    fig, ax = plt.subplots(figsize=(9, 9), subplot_kw=dict(projection=projection))

    ax.set_extent(bbox)

    land = cfeature.NaturalEarthFeature(
        "physical", "land", "10m", edgecolor="face", facecolor=[0.85] * 3
    )

    ax.add_feature(land, zorder=0)
    ax.coastlines("10m", zorder=1)

    ax.set_xticks(np.linspace(bbox[0], bbox[1], 3), crs=projection)
    ax.set_yticks(np.linspace(bbox[2], bbox[3], 3), crs=projection)
    lon_formatter = LongitudeFormatter(zero_direction_label=True)
    lat_formatter = LatitudeFormatter()
    ax.xaxis.set_major_formatter(lon_formatter)
    ax.yaxis.set_major_formatter(lat_formatter)
    return fig, ax



In [12]:

    
count = (
    cremp_1996.loc[cremp_1996["genus"] == "Acropora"]
    .groupby(["longitude", "latitude"])
    .count()
    .reset_index()
)


fig, ax = make_plot()
c = ax.scatter(
    count["longitude"],
    count["latitude"],
    s=200,
    c=count["genus"],
    alpha=0.5,
    cmap=plt.cm.get_cmap("viridis_r", 6),
    zorder=3,
)
cbar = fig.colorbar(c, shrink=0.75, extend="both")
cbar.ax.set_ylabel("Genus occurence count")
ax.set_title("Acropora")



In [13]:

    
count = (
    cremp_1996.loc[cremp_1996["genus"] == "Dendrogyra"]
    .groupby(["longitude", "latitude"])
    .count()
    .reset_index()
)


fig, ax = make_plot()
c = ax.scatter(
    count["longitude"],
    count["latitude"],
    s=200,
    c=count["genus"],
    alpha=0.5,
    cmap=plt.cm.get_cmap("viridis_r", 6),
    zorder=3,
)
cbar = fig.colorbar(c, shrink=0.75, extend="both")
cbar.ax.set_ylabel("Genus occurence count")
ax.set_title("Dendrogyra")



In [14]:

    
count = (
    cremp_1996.loc[cremp_1996["genus"] == "Orbicella"]
    .groupby(["longitude", "latitude"])
    .count()
    .reset_index()
)


fig, ax = make_plot()
c = ax.scatter(
    count["longitude"],
    count["latitude"],
    s=200,
    c=count["genus"],
    alpha=0.5,
    cmap=plt.cm.get_cmap("viridis_r", 6),
    zorder=3,
)
cbar = fig.colorbar(c, shrink=0.75, extend="both")
cbar.ax.set_ylabel("Genus occurence count")
ax.set_title("Orbicella")

This demonstration showed the power of mixing Python and R to reduce developer time and allow the research to focus on the data and not the programming language.

	Samples	depth	time	longitude	latitude	scientificName	habitat	genus
2	0	6.0	1996-01-01T00:00:00Z	-80.3475	25.1736	Acropora cervicornis	Hard Bottom	Acropora
3	1	6.0	1996-01-01T00:00:00Z	-80.3475	25.1736	Acropora cervicornis	Hard Bottom	Acropora
4	2	6.0	1996-01-01T00:00:00Z	-80.3475	25.1736	Acropora cervicornis	Hard Bottom	Acropora
5	3	6.0	1996-01-01T00:00:00Z	-80.3475	25.1736	Acropora cervicornis	Hard Bottom	Acropora
6	4	9.0	1996-01-01T00:00:00Z	-80.3782	25.1201	Acropora cervicornis	Hard Bottom	Acropora

	Samples	depth	longitude	latitude	quantificationValue
genus
	8010.928571	23.2375	-81.046667	24.768957	0.128788
Acropora	159.500000	23.2375	-81.046667	24.768957	0.005494
Agaricia	479.500000	23.2375	-81.046667	24.768957	0.000003
Cladocora	719.500000	23.2375	-81.046667	24.768958	0.000000
Colpophyllia	879.500000	23.2375	-81.046667	24.768958	0.004462
Dendrogyra	1039.500000	23.2375	-81.046667	24.768958	0.000977
Diadema	10159.500000	23.2375	-81.046667	24.768958	0.000007
Dichocoenia	1199.500000	23.2375	-81.046667	24.768958	0.000509
Diploria	1359.500000	23.2375	-81.046667	24.768958	0.000999
Eusmilia	1519.500000	23.2375	-81.046667	24.768958	0.000091
Favia	1679.500000	23.2375	-81.046667	24.768958	0.000045
Helioseris	1839.500000	23.2375	-81.046667	24.768958	0.000004
Isophyllia	2079.500000	23.2375	-81.046667	24.768957	0.000006
Madracis	2399.500000	23.2375	-81.046667	24.768957	0.000040
Manicina	2639.500000	23.2375	-81.046667	24.768958	0.000007
Meandrina	2799.500000	23.2375	-81.046667	24.768958	0.000363
Millepora	3039.500000	23.2375	-81.046667	24.768957	0.006159
Montastraea	3279.500000	23.2375	-81.046667	24.768958	0.013597
Mussa	3439.500000	23.2375	-81.046667	24.768958	0.000061
Mycetophyllia	3759.500000	23.2375	-81.046667	24.768957	0.000144
Oculina	4159.500000	23.2375	-81.046667	24.768957	0.000046
Orbicella	4399.500000	23.2375	-81.046667	24.768958	0.033860
Other	10319.500000	23.2375	-81.046667	24.768958	0.000161
Porites	4639.500000	23.2375	-81.046667	24.768957	0.003372
Pseudodiploria	4959.500000	23.2375	-81.046667	24.768957	0.001024
Scolymia	5359.500000	23.2375	-81.046667	24.768958	0.000009
Siderastrea	5599.500000	23.2375	-81.046667	24.768957	0.004603
Solenastrea	5919.500000	23.2375	-81.046667	24.768957	0.000048
Stephanocoenia	6159.500000	23.2375	-81.046667	24.768958	0.000263
Undaria	6319.500000	23.2375	-81.046667	24.768958	0.001044