This example shows how to connect to an ODM2 database using the ODM2 Python API, retrieve data, and use it for analysis or visualization. The database (iUTAHGAMUT_waterquality_measurementresults_ODM2.sqlite) contains "measurement"-type results.
This example uses SQLite for the database because it doesn't require a server. However, the ODM2 Python API demonstrated here can alse be used with ODM2 databases implemented in MySQL, PostgresSQL or Microsoft SQL Server.
More details on the ODM2 Python API and its source code and latest development can be found at: https://github.com/ODM2/ODM2PythonAPI
Emilio Mayorga. Last udpated 9/23/2018.
In [1]:
%matplotlib inline
import sys
import os
import sqlite3
import matplotlib.pyplot as plt
from shapely.geometry import Point
import pandas as pd
import geopandas as gpd
import folium
from folium.plugins import MarkerCluster
import odm2api
from odm2api.ODMconnection import dbconnection
import odm2api.services.readService as odm2rs
from odm2api.models import SamplingFeatures
In [2]:
pd.__version__, gpd.__version__, folium.__version__
Out[2]:
odm2api version used to run this notebook:
In [3]:
odm2api.__version__
Out[3]:
This example uses an ODM2 SQLite database loaded with water quality sample data from multiple monitoring sites in the iUTAH Gradients Along Mountain to Urban Transitions (GAMUT) water quality monitoring network. Water quality samples have been collected and analyzed for nitrogen, phosphorus, total coliform, E-coli, and some water isotopes. The database (iUTAHGAMUT_waterquality_measurementresults_ODM2.sqlite) contains "measurement"-type results.
The example database is located in the data sub-directory.
In [4]:
# Assign directory paths and SQLite file name
dbname_sqlite = "iUTAHGAMUT_waterquality_measurementresults_ODM2.sqlite"
sqlite_pth = os.path.join("data", dbname_sqlite)
In [5]:
try:
session_factory = dbconnection.createConnection('sqlite', sqlite_pth, 2.0)
read = odm2rs.ReadODM2(session_factory)
print("Database connection successful!")
except Exception as e:
print("Unable to establish connection to the database: ", e)
This section shows some example of how to use the API to run both simple and more advanced queries on the ODM2 database, as well as how to examine the query output in convenient ways thanks to Python tools.
Simple query functions like getVariables( ) return objects similar to the entities in ODM2, and individual attributes can then be retrieved from the objects returned.
In [6]:
# Get all of the Variables from the ODM2 database then read the records
# into a Pandas DataFrame to make it easy to view and manipulate
allVars = read.getVariables()
variables_df = pd.DataFrame.from_records([vars(variable) for variable in allVars], index='VariableID')
variables_df.head(10)
Out[6]:
In [7]:
allPeople = read.getPeople()
pd.DataFrame.from_records([vars(person) for person in allPeople]).head()
Out[7]:
Some of the API functions accept arguments that let you subset what is returned. For example, I can query the database using the getSamplingFeatures( ) function and pass it a SamplingFeatureType of "Site" to return a list of those SamplingFeatures that are Sites.
In [8]:
# Get all of the SamplingFeatures from the ODM2 database that are Sites
siteFeatures = read.getSamplingFeatures(sftype='Site')
# Read Sites records into a Pandas DataFrame
# "if sf.Latitude" is used only to instantiate/read Site attributes)
df = pd.DataFrame.from_records([vars(sf) for sf in siteFeatures if sf.Latitude])
Since we know this is a geospatial dataset (Sites, which have latitude and longitude), we can use more specialized (spatialized?) Python tools like GeoPandas (geospatially enabled Pandas) and Folium interactive maps.
In [9]:
# Create a GeoPandas GeoDataFrame from Sites DataFrame
ptgeom = [Point(xy) for xy in zip(df['Longitude'], df['Latitude'])]
gdf = gpd.GeoDataFrame(df, geometry=ptgeom, crs={'init': 'epsg:4326'})
gdf.head(5)
Out[9]:
In [10]:
# Number of records (features) in GeoDataFrame
len(gdf)
Out[10]:
In [11]:
# A trivial but easy-to-generate GeoPandas plot
gdf.plot();
A site has a SiteTypeCV. Let's examine the site type distribution, and use that information to create a new GeoDataFrame column to specify a map marker color by SiteTypeCV.
In [12]:
gdf['SiteTypeCV'].value_counts()
Out[12]:
In [13]:
gdf["color"] = gdf.apply(lambda feat: 'green' if feat['SiteTypeCV'] == 'Stream' else 'red', axis=1)
Note: While the database holds a copy of the ODM2 Controlled Vocabularies, the complete description of each CV term is available from a web request to the CV API at http://vocabulary.odm2.org. Want to know more about how a "spring" is defined? Here's one simple way, using Pandas to access and parse the CSV web service response.
In [14]:
sitetype = 'spring'
pd.read_csv("http://vocabulary.odm2.org/api/v1/sitetype/{}/?format=csv".format(sitetype))
Out[14]:
Now we'll create an interactive and helpful Folium map of the sites. This map features:
SiteTypeCV
In [15]:
c = gdf.unary_union.centroid
m = folium.Map(location=[c.y, c.x], tiles='CartoDB positron', zoom_start=11)
marker_cluster = MarkerCluster().add_to(m)
for idx, feature in gdf.iterrows():
folium.Marker(location=[feature.geometry.y, feature.geometry.x],
icon=folium.Icon(color=feature['color']),
popup="{0} ({1}): {2}".format(
feature['SamplingFeatureCode'], feature['SiteTypeCV'],
feature['SamplingFeatureName'])
).add_to(marker_cluster)
# Done with setup. Now render the map
m
Out[15]:
In [16]:
sitesf0 = siteFeatures[0]
try:
newsf = SamplingFeatures()
session = session_factory.getSession()
newsf.FeatureGeometryWKT = "POINT(-111.946 41.718)"
newsf.Elevation_m = 100
newsf.ElevationDatumCV = sitesf0.ElevationDatumCV
newsf.SamplingFeatureCode = "TestSF"
newsf.SamplingFeatureDescription = "this is a test to add a sampling feature"
newsf.SamplingFeatureGeotypeCV = "Point"
newsf.SamplingFeatureTypeCV = sitesf0.SamplingFeatureTypeCV
newsf.SamplingFeatureUUID = sitesf0.SamplingFeatureUUID+"2"
session.add(newsf)
# To save the new sampling feature, do session.commit()
print("New sampling feature created, but not saved to database.\n")
print(newsf)
except Exception as e :
print("error adding a sampling feature: {}".format(e))
This code shows some examples of how objects and related objects can be retrieved using the API. In the following, we use the getSamplingFeatures( ) function to return a particular sampling feature by passing in its SamplingFeatureCode. This function returns a list of SamplingFeature objects, so just get the first one in the returned list.
In [17]:
# Get the SamplingFeature object for a particular SamplingFeature by passing its SamplingFeatureCode
sf = read.getSamplingFeatures(codes=['RB_1300E'])[0]
type(sf)
Out[17]:
In [18]:
# Simple way to examine the content (properties) of a Python object, as if it were a dictionary
vars(sf)
Out[18]:
You can also drill down and get objects linked by foreign keys. The API returns related objects in a nested hierarchy so they can be interrogated in an object oriented way. So, if I use the getResults( ) function to return a Result from the database (e.g., a "Measurement" Result), I also get the associated Action that created that Result (e.g., a "Specimen analysis" Action).
In [19]:
print("------------ Foreign Key Example --------- \n")
try:
# Call getResults, but return only the first Result
firstResult = read.getResults()[0]
frfa = firstResult.FeatureActionObj
print("The FeatureAction object for the Result is: ", frfa)
print("The Action object for the Result is: ", frfa.ActionObj)
# Print some of those attributes in a more human readable form
frfaa = firstResult.FeatureActionObj.ActionObj
print("\nThe following are some of the attributes for the Action that created the Result: ")
print("ActionTypeCV: {}".format(frfaa.ActionTypeCV))
print("ActionDescription: {}".format(frfaa.ActionDescription))
print("BeginDateTime: {}".format(frfaa.BeginDateTime))
print("EndDateTime: {}".format(frfaa.EndDateTime))
print("MethodName: {}".format(frfaa.MethodObj.MethodName))
print("MethodDescription: {}".format(frfaa.MethodObj.MethodDescription))
except Exception as e:
print("Unable to demo Foreign Key Example: ", e)
Because all of the objects are returned in a nested form, if you retrieve a result, you can interrogate it to get all of its related attributes. When a Result object is returned, it includes objects that contain information about Variable, Units, ProcessingLevel, and the related Action that created that Result.
In [20]:
print("------- Example of Retrieving Attributes of a Result -------")
try:
firstResult = read.getResults()[0]
frfa = firstResult.FeatureActionObj
print("The following are some of the attributes for the Result retrieved: ")
print("ResultID: {}".format(firstResult.ResultID))
print("ResultTypeCV: {}".format(firstResult.ResultTypeCV))
print("ValueCount: {}".format(firstResult.ValueCount))
print("ProcessingLevel: {}".format(firstResult.ProcessingLevelObj.Definition))
print("SampledMedium: {}".format(firstResult.SampledMediumCV))
print("Variable: {}: {}".format(firstResult.VariableObj.VariableCode,
firstResult.VariableObj.VariableNameCV))
print("Units: {}".format(firstResult.UnitsObj.UnitsName))
print("SamplingFeatureID: {}".format(frfa.SamplingFeatureObj.SamplingFeatureID))
print("SamplingFeatureCode: {}".format(frfa.SamplingFeatureObj.SamplingFeatureCode))
except Exception as e:
print("Unable to demo example of retrieving Attributes of a Result: ", e)
The last block of code returns a particular Measurement Result. From that I can get the SamplingFeaureID (in this case 26) for the Specimen from which the Result was generated. But, if I want to figure out which Site the Specimen was collected at, I need to query the database to get the related Site SamplingFeature. I can use getRelatedSamplingFeatures( ) for this. Once I've got the SamplingFeature for the Site, I could get the rest of the SamplingFeature attributes.
In [21]:
# Pass the Sampling Feature ID of the specimen, and the relationship type
relatedSite = read.getRelatedSamplingFeatures(sfid=26, relationshiptype='Was Collected at')[0]
In [22]:
vars(relatedSite)
Out[22]:
From the list of Variables returned above and the information about the SamplingFeature I queried above, I know that VariableID = 2 for Total Phosphorus and SiteID = 1 for the Red Butte Creek site at 1300E. I can use the getResults( ) function to get all of the Total Phosphorus results for this site by passing in the VariableID and the SiteID.
In [23]:
siteID = 1 # Red Butte Creek at 1300 E (obtained from the getRelatedSamplingFeatures query)
In [24]:
v = variables_df[variables_df['VariableCode'] == 'TP']
variableID = v.index[0]
results = read.getResults(siteid=siteID, variableid=variableID, restype="Measurement")
# Get the list of ResultIDs so I can retrieve the data values associated with all of the results
resultIDList = [x.ResultID for x in results]
len(resultIDList)
Out[24]:
Now I can retrieve all of the data values associated with the list of Results I just retrieved. In ODM2, water chemistry measurements are stored as "Measurement" results. Each "Measurement" Result has a single data value associated with it. So, for convenience, the getResultValues( ) function allows you to pass in a list of ResultIDs so you can get the data values for all of them back in a Pandas data frame object, which is easier to work with. Once I've got the data in a Pandas data frame object, I can use the plot( ) function directly on the data frame to create a quick visualization.
In [25]:
# Get all of the data values for the Results in the list created above
# Call getResultValues, which returns a Pandas Data Frame with the data
resultValues = read.getResultValues(resultids=resultIDList, lowercols=False)
resultValues.head()
Out[25]:
In [26]:
# Plot the time sequence of Measurement Result Values
ax = resultValues.plot(x='ValueDateTime', y='DataValue', title=relatedSite.SamplingFeatureName,
kind='line', use_index=True, linestyle='solid', style='o')
ax.set_ylabel("{0} ({1})".format(results[0].VariableObj.VariableNameCV,
results[0].UnitsObj.UnitsAbbreviation))
ax.set_xlabel('Date/Time')
ax.grid(True)
ax.legend().set_visible(False)
9/23/2018. NOTE CURRENT ISSUE REGARDING ValueDateTime RETURNED BY read.getResultValues. There seems to be a problem with the data type returned for ValueDateTime for SQLite databases. It should be a datetime, but it's currently a string. This is being investigated. For now, the x-axis in this plot and the one below do not show proper datetime values, as expected. Refer to the original notebook to see what the axis should look like. This problem wasn't present in Nov. 2017.
If I'm going to reuse a series of steps, it's always helpful to write little generic functions that can be called to quickly and consistently get what we need. To conclude this demo, here's one such function that encapsulates the VariableID, getResults and getResultValues queries we showed above. Then we leverage it to create a nice 2-variable (2-axis) plot of TP and TN vs time, and conclude with a reminder that we have ready access to related metadata about analytical lab methods and such.
In [27]:
def get_results_and_values(siteid, variablecode):
v = variables_df[variables_df['VariableCode'] == variablecode]
variableID = v.index[0]
results = read.getResults(siteid=siteid, variableid=variableID, restype="Measurement")
resultIDList = [x.ResultID for x in results]
resultValues = read.getResultValues(resultids=resultIDList, lowercols=False)
return resultValues, results
Fancy plotting, leveraging the Pandas plot method and matplotlib.
In [28]:
# Plot figure and axis set up (just *one* subplot, actually)
f, ax = plt.subplots(1, figsize=(13, 6))
# First plot (left axis)
VariableCode = 'TP'
resultValues_TP, results_TP = get_results_and_values(siteID, VariableCode)
resultValues_TP.plot(x='ValueDateTime', y='DataValue', label=VariableCode,
style='o-', kind='line', ax=ax)
ax.set_ylabel("{0}: {1} ({2})".format(VariableCode, results_TP[0].VariableObj.VariableNameCV,
results_TP[0].UnitsObj.UnitsAbbreviation))
# Second plot (right axis)
VariableCode = 'TN'
resultValues_TN, results_TN = get_results_and_values(siteID, VariableCode)
resultValues_TN.plot(x='ValueDateTime', y='DataValue', label=VariableCode,
style='^-', kind='line', ax=ax,
secondary_y=True)
ax.right_ax.set_ylabel("{0}: {1} ({2})".format(VariableCode, results_TN[0].VariableObj.VariableNameCV,
results_TN[0].UnitsObj.UnitsAbbreviation))
# Tweak the figure
ax.legend(loc='upper left')
ax.right_ax.legend(loc='upper right')
ax.grid(True)
ax.set_xlabel('')
ax.set_title(relatedSite.SamplingFeatureName);
Finally, let's show some useful metadata. Use the Results records and their relationship to Actions (via FeatureActions) to extract and print out the Specimen Analysis methods used for TN and TP. Or at least for the first result for each of the two variables; methods may have varied over time, but the specific method associated with each result is stored in ODM2 and available.
In [29]:
results_faam = lambda results, i: results[i].FeatureActionObj.ActionObj.MethodObj
print("TP METHOD: {0} ({1})".format(results_faam(results_TP, 0).MethodName,
results_faam(results_TP, 0).MethodDescription))
print("TN METHOD: {0} ({1})".format(results_faam(results_TN, 0).MethodName,
results_faam(results_TN, 0).MethodDescription))
In [ ]: