This example shows how to use the ODM2 Python API (odm2api) to connect to an ODM2 database, retrieve data, and analyze and visualize the data. The database (iUTAHGAMUT_waterquality_measurementresults_ODM2.sqlite) contains "measurement"-type results.
This example uses SQLite for the database because it doesn't require a server. However, the ODM2 Python API demonstrated here can alse be used with ODM2 databases implemented in MySQL, PostgreSQL or Microsoft SQL Server.
More details on the ODM2 Python API and its source code and latest development can be found at https://github.com/ODM2/ODM2PythonAPI
Emilio Mayorga. Last updated 2019-3-27.
In [1]:
%matplotlib inline
import os
import matplotlib.pyplot as plt
from shapely.geometry import Point
import pandas as pd
import geopandas as gpd
import folium
from folium.plugins import MarkerCluster
import odm2api
from odm2api.ODMconnection import dbconnection
import odm2api.services.readService as odm2rs
In [2]:
pd.__version__, gpd.__version__, folium.__version__
Out[2]:
odm2api version used to run this notebook:
In [3]:
odm2api.__version__
Out[3]:
This example uses an ODM2 SQLite database file loaded with water quality sample data from multiple monitoring sites in the iUTAH Gradients Along Mountain to Urban Transitions (GAMUT) water quality monitoring network. Water quality samples have been collected and analyzed for nitrogen, phosphorus, total coliform, E-coli, and some water isotopes. The database (iUTAHGAMUT_waterquality_measurementresults_ODM2.sqlite) contains "measurement"-type results.
The example database is located in the data sub-directory.
In [4]:
# Assign directory paths and SQLite file name
dbname_sqlite = "MariaWaterQualityData.sqlite"
# sqlite_pth = os.path.join("data", dbname_sqlite)
sqlite_pth = dbname_sqlite
In [5]:
try:
session_factory = dbconnection.createConnection('sqlite', sqlite_pth)
read = odm2rs.ReadODM2(session_factory)
print("Database connection successful!")
except Exception as e:
print("Unable to establish connection to the database: ", e)
This section shows some examples of how to use the API to run both simple and more advanced queries on the ODM2 database, as well as how to examine the query output in convenient ways thanks to Python tools.
Simple query functions like getVariables( ) return objects similar to the entities in ODM2, and individual attributes can then be retrieved from the objects returned.
In [6]:
# Get all of the Variables from the ODM2 database then read the records
# into a Pandas DataFrame to make it easy to view and manipulate
allVars = read.getVariables()
variables_df = pd.DataFrame.from_records([vars(variable) for variable in allVars],
index='VariableID')
variables_df.head(10)
Out[6]:
In [7]:
allPeople = read.getPeople()
pd.DataFrame.from_records([vars(person) for person in allPeople]).head()
Out[7]:
Some of the API functions accept arguments that let you subset what is returned. For example, I can query the database using the getSamplingFeatures( ) function and pass it a SamplingFeatureType of "Site" to return a list of those SamplingFeatures that are Sites.
In [8]:
# Get all of the SamplingFeatures from the ODM2 database that are Sites
siteFeatures = read.getSamplingFeatures(sftype='Site')
# Read Sites records into a Pandas DataFrame
# "if sf.Latitude" is used only to instantiate/read Site attributes)
df = pd.DataFrame.from_records([vars(sf) for sf in siteFeatures if sf.Latitude])
Since we know this is a geospatial dataset (Sites, which have latitude and longitude), we can use more specialized Python tools like GeoPandas (geospatially enabled Pandas) and Folium interactive maps.
In [9]:
# Create a GeoPandas GeoDataFrame from Sites DataFrame
ptgeom = [Point(xy) for xy in zip(df['Longitude'], df['Latitude'])]
gdf = gpd.GeoDataFrame(df, geometry=ptgeom, crs={'init': 'epsg:4326'})
gdf.head(5)
Out[9]:
In [10]:
# Number of records (features) in GeoDataFrame
len(gdf)
Out[10]:
In [11]:
# A trivial but easy-to-generate GeoPandas plot
gdf.plot();
A site has a SiteTypeCV. Let's examine the site type distribution, and use that information to create a new GeoDataFrame column to specify a map marker color by SiteTypeCV.
In [12]:
gdf['SiteTypeCV'].value_counts()
Out[12]:
Now we'll create an interactive and helpful Folium map of the sites. This map features:
SiteTypeCV
In [13]:
gdf["color"] = gdf.apply(lambda feat: 'green' if feat['SiteTypeCV'] == 'Stream' else 'red', axis=1)
In [14]:
m = folium.Map(tiles='CartoDB positron')
marker_cluster = MarkerCluster().add_to(m)
for idx, feature in gdf.iterrows():
folium.Marker(location=[feature.geometry.y, feature.geometry.x],
icon=folium.Icon(color=feature['color']),
popup="{0} ({1}): {2}".format(
feature['SamplingFeatureCode'], feature['SiteTypeCV'],
feature['SamplingFeatureName'])
).add_to(marker_cluster)
# Set the map extent (bounds) to the extent of the ODM2 sites
m.fit_bounds(m.get_bounds())
# Done with setup. Now render the map
m
Out[14]:
In [15]:
# Get the SamplingFeature object for a particular SamplingFeature by passing its SamplingFeatureCode
site_sf_code = 'SystemD1_PV'
sf = read.getSamplingFeatures(codes=[site_sf_code])[0]
type(sf)
Out[15]:
In [16]:
# Simple way to examine the content (properties) of a Python object, as if it were a dictionary
vars(sf)
Out[16]:
You can also drill down and get objects linked by foreign keys. The API returns related objects in a nested hierarchy so they can be interrogated in an object oriented way. So, if I use the getResults( ) function to return a Result from the database (e.g., a "Measurement" Result), I also get the associated Action that created that Result (e.g., a "Specimen analysis" Action).
In [17]:
try:
# Call getResults, but return only the first Result
firstResult = read.getResults()[0]
frfa = firstResult.FeatureActionObj
frfaa = firstResult.FeatureActionObj.ActionObj
print("The FeatureAction object for the Result is: ", frfa)
print("The Action object for the Result is: ", frfaa)
# Print some Action attributes in a more human readable form
print("\nThe following are some of the attributes for the Action that created the Result: ")
print("ActionTypeCV: {}".format(frfaa.ActionTypeCV))
print("ActionDescription: {}".format(frfaa.ActionDescription))
print("BeginDateTime: {}".format(frfaa.BeginDateTime))
print("EndDateTime: {}".format(frfaa.EndDateTime))
print("MethodName: {}".format(frfaa.MethodObj.MethodName))
print("MethodDescription: {}".format(frfaa.MethodObj.MethodDescription))
except Exception as e:
print("Unable to demo Foreign Key Example: {}".format(e))
In [18]:
vars(frfaa)
Out[18]:
In [19]:
vars(frfaa.MethodObj)
Out[19]:
Because all of the objects are returned in a nested form, if you retrieve a result, you can interrogate it to get all of its related attributes. When a Result object is returned, it includes objects that contain information about Variable, Units, ProcessingLevel, and the related Action that created that Result.
In [20]:
print("------- Example of Retrieving Attributes of a Result -------")
try:
firstResult = read.getResults()[0]
frfa = firstResult.FeatureActionObj
print("The following are some of the attributes for the Result retrieved: ")
print("ResultID: {}".format(firstResult.ResultID))
print("ResultTypeCV: {}".format(firstResult.ResultTypeCV))
print("ValueCount: {}".format(firstResult.ValueCount))
print("ProcessingLevel: {}".format(firstResult.ProcessingLevelObj.Definition))
print("SampledMedium: {}".format(firstResult.SampledMediumCV))
print("Variable: {}: {}".format(firstResult.VariableObj.VariableCode,
firstResult.VariableObj.VariableNameCV))
print("Units: {}".format(firstResult.UnitsObj.UnitsName))
print("SamplingFeatureID: {}".format(frfa.SamplingFeatureObj.SamplingFeatureID))
print("SamplingFeatureCode: {}".format(frfa.SamplingFeatureObj.SamplingFeatureCode))
except Exception as e:
print("Unable to demo example of retrieving Attributes of a Result: {}".format(e))
In [21]:
vars(firstResult)
Out[21]:
In [22]:
vars(frfa)
Out[22]:
The last block of code returns a particular Measurement Result. From that I can get the SamplingFeaureID (in this case 26) for the Specimen from which the Result was generated. But, if I want to figure out which Site the Specimen was collected at, I need to query the database to get the related Site SamplingFeature. I can use getRelatedSamplingFeatures( ) for this. Once I've got the SamplingFeature for the Site, I could get the rest of the SamplingFeature attributes.
In [23]:
specimen_sf_id = frfa.SamplingFeatureObj.SamplingFeatureID
In [24]:
# specimen-entity attributes only show up after first printing one out explicitly
frfa.SamplingFeatureObj.SpecimenTypeCV
Out[24]:
In [25]:
vars(frfa.SamplingFeatureObj)
Out[25]:
In [26]:
# Pass the Sampling Feature ID of the specimen, and the relationship type
relatedSite = read.getRelatedSamplingFeatures(sfid=specimen_sf_id,
relationshiptype='Was Collected at')[0]
In [27]:
vars(relatedSite)
Out[27]:
From the list of Variables returned above and the information about the SamplingFeature I queried above, I know that VariableID = 2 for Total Phosphorus and SiteID = 1 for the Red Butte Creek site at 1300E. I can use the getResults( ) function to get all of the Total Phosphorus results for this site by passing in the VariableID and the SiteID.
In [28]:
siteID = relatedSite.SamplingFeatureID
In [29]:
results_all_at_site = read.getResults(siteid=siteID, restype="Measurement")
In [30]:
len(results_all_at_site)
Out[30]:
In [31]:
vars(results_all_at_site[0])
Out[31]:
In [32]:
res_vars = []
for r in results_all_at_site:
res_vars.append([r.ResultID, r.ResultDateTime, r.VariableID,
r.VariableObj.VariableCode, r.VariableObj.VariableNameCV, r.VariableObj.VariableDefinition])
In [33]:
# print out a count of number of results for each variable, plus the date range
# Do this by ingesting into a data frame first
In [34]:
res_vars_df = pd.DataFrame(res_vars, columns=['ResultID', 'ResultDateTime', 'VariableID', 'VariableCode', 'VariableNameCV', 'VariableDefinition'])
In [35]:
res_vars_df.head()
Out[35]:
In [36]:
res_vars_df.VariableCode.value_counts()
Out[36]:
Now I can retrieve all of the data values associated with the list of Results I just retrieved. In ODM2, water chemistry measurements are stored as "Measurement" results. Each "Measurement" Result has a single data value associated with it. So, for convenience, the getResultValues( ) function allows you to pass in a list of ResultIDs so you can get the data values for all of them back in a Pandas data frame object, which is easier to work with. Once I've got the data in a Pandas data frame object, I can use the plot( ) function directly on the data frame to create a quick visualization.
10/5/2018. NOTE CURRENT ISSUE REGARDING ValueDateTime RETURNED BY read.getResultValues. There seems to be an unexpected behavior with the data type returned for ValueDateTime for SQLite databases. It should be a datetime, but it's currently a string. This is being investigated. For now, we are converting to a datetime manually in cells 25 and 27 via the statement:
resultValues['ValueDateTime'] = pd.to_datetime(resultValues['ValueDateTime'])
This problem is present in odm2api version 0.7.1, but was not present in Nov. 2017
In [37]:
# function that encapsulates the `VariableID`, `getResults` and `getResultValues` queries
def get_results_and_values(siteid, variablecode):
v = variables_df[variables_df['VariableCode'] == variablecode]
variableID = v.index[0]
results = read.getResults(siteid=siteid, variableid=variableID, restype="Measurement")
resultIDList = [x.ResultID for x in results]
# Get all of the data values for the Results in the list created above
# Call getResultValues, which returns a Pandas Data Frame with the data
resultValues = read.getResultValues(resultids=resultIDList, lowercols=False)
resultValues['ValueDateTime'] = pd.to_datetime(resultValues['ValueDateTime'])
return resultValues, results
In [43]:
resultValues, results = get_results_and_values(siteID, 'pH')
In [44]:
resultValues.head()
Out[44]:
In [45]:
result_select = results[0]
In [46]:
# Plot the time sequence of Measurement Result Values
ax = resultValues.plot(x='ValueDateTime', y='DataValue', title=relatedSite.SamplingFeatureName,
kind='line', use_index=True, style='o')
ax.set_ylabel("{0} ({1})".format(result_select.VariableObj.VariableNameCV,
result_select.UnitsObj.UnitsAbbreviation))
ax.set_xlabel('Date/Time')
ax.legend().set_visible(False)
In [47]:
# results_faam = lambda results, i: results[i].FeatureActionObj.ActionObj.MethodObj
method = result_select.FeatureActionObj.ActionObj.MethodObj
print("METHOD: {0} ({1})".format(method.MethodName, method.MethodDescription))
In [ ]: