This example shows how to use the ODM2 Python API (odm2api
) to connect to an ODM2 database, retrieve data, and analyze and visualize the data. The database (USU_LittleBearRiver_timeseriesresults_ODM2.sqlite) contains "timeSeriesCoverage"-type results.
This example uses SQLite for the database because it doesn't require a server. However, the ODM2 Python API demonstrated here can alse be used with ODM2 databases implemented in MySQL, PostgreSQL or Microsoft SQL Server.
More details on the ODM2 Python API and its source code and latest development can be found at https://github.com/ODM2/ODM2PythonAPI
Adapted from notebook https://big-cz.github.io/notebook_data_demo/notebooks/2017-06-24-odm2api_sample_fromsqlite/, based on earlier code and an ODM2 database from Jeff Horsburgh's group at Utah State University.
In [1]:
import os
import datetime
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
import pandas as pd
import odm2api
from odm2api.ODMconnection import dbconnection
import odm2api.services.readService as odm2rs
In [2]:
"{} UTC".format(datetime.datetime.utcnow())
Out[2]:
In [3]:
pd.__version__
Out[3]:
odm2api version used to run this notebook:
In [4]:
odm2api.__version__
Out[4]:
This example uses an ODM2 SQLite database file loaded with a sensor-based, high-frequency temperature time series from a site in the Little Bear River, in Logan, Utah, from Utah State University. The database (USU_LittleBearRiver_timeseriesresults_ODM2.sqlite) contains "timeSeriesCoverage"-type results.
The example database is located in the data
sub-directory.
In [5]:
# Assign directory paths and SQLite file name
dbname_sqlite = "USU_LittleBearRiver_timeseriesresults_ODM2.sqlite"
sqlite_pth = os.path.join("data", dbname_sqlite)
In [6]:
try:
session_factory = dbconnection.createConnection('sqlite', sqlite_pth)
read = odm2rs.ReadODM2(session_factory)
print("Database connection successful!")
except Exception as e:
print("Unable to establish connection to the database: ", e)
This section shows some examples of how to use the API to run both simple and more advanced queries on the ODM2 database, as well as how to examine the query output in convenient ways thanks to Python tools. The notebook WaterQualityMeasurements_RetrieveVisualize.ipynb includes more extensive examples of odm2api
-based querying and examinations of the information that is returned.
Simple query functions like getVariables( ) return objects similar to the entities in ODM2, and individual attributes can then be retrieved from the objects returned.
In [7]:
allVars = read.getVariables()
for x in allVars:
print('{}: {}'.format(x.VariableCode, x.VariableNameCV))
In [8]:
sf_lst = read.getSamplingFeatures()
len(sf_lst)
Out[8]:
In [9]:
vars(sf_lst[0])
Out[9]:
In [10]:
print('{}: {}'.format(sf_lst[0].SamplingFeatureCode, sf_lst[0].SamplingFeatureName))
You can also drill down and get objects linked by foreign keys. The API returns related objects in a nested hierarchy so they can be interrogated in an object oriented way. So, if I use the getResults( ) function to return a Result from the database (e.g., a "Time Series" Result), I also get the associated Action that created that Result (e.g., an "Observation" Action).
In [11]:
# What's the total number of results in the database?
len(read.getResults())
Out[11]:
In [12]:
try:
# Call getResults, but return only the first Result
firstResult = read.getResults()[0]
frfa = firstResult.FeatureActionObj
frfaa = firstResult.FeatureActionObj.ActionObj
print("The ResultID for the Result is: {}".format(firstResult.ResultID))
print("The FeatureAction object for the Result is: ", frfa)
print("The Action object for the Result is: ", frfaa)
# Print some Action attributes in a more human readable form
print("\nThe following are some of the attributes for the Action that created the Result: ")
print("ActionTypeCV: {}".format(frfaa.ActionTypeCV))
print("ActionDescription: {}".format(frfaa.ActionDescription))
print("BeginDateTime: {}".format(frfaa.BeginDateTime))
print("EndDateTime: {}".format(frfaa.EndDateTime))
print("MethodName: {}".format(frfaa.MethodObj.MethodName))
print("MethodDescription: {}".format(frfaa.MethodObj.MethodDescription))
except Exception as e:
print("Unable to demo Foreign Key Example: {}".format(e))
In [13]:
# Filering on a single ResultID will invariably return a single result;
# so, get the single element in the returned list
tsResult = read.getResults(ids=[1])[0]
# Examine the object type and content
type(tsResult), vars(tsResult)
Out[13]:
Because all of the objects are returned in a nested form, if you retrieve a result, you can interrogate it to get all of its related attributes. When a Result object is returned, it includes objects that contain information about Variable, Units, ProcessingLevel, and the related Action that created that Result.
In [14]:
print("------- Example of Retrieving Attributes of a Result -------")
try:
firstResult = read.getResults()[0]
frfa = firstResult.FeatureActionObj
print("The following are some of the attributes for the Result retrieved: ")
print("ResultID: {}".format(firstResult.ResultID))
print("ResultTypeCV: {}".format(firstResult.ResultTypeCV))
print("ValueCount: {}".format(firstResult.ValueCount))
print("ProcessingLevel: {}".format(firstResult.ProcessingLevelObj.Definition))
print("SampledMedium: {}".format(firstResult.SampledMediumCV))
print("Variable: {}: {}".format(firstResult.VariableObj.VariableCode,
firstResult.VariableObj.VariableNameCV))
print("AggregationStatisticCV: {}".format(firstResult.AggregationStatisticCV))
print("Units: {}".format(firstResult.UnitsObj.UnitsName))
print("SamplingFeatureID: {}".format(frfa.SamplingFeatureObj.SamplingFeatureID))
print("SamplingFeatureCode: {}".format(frfa.SamplingFeatureObj.SamplingFeatureCode))
except Exception as e:
print("Unable to demo example of retrieving Attributes of a Result: {}".format(e))
The database contains a single time series result (a time series of water temperature sensor data at a single site). Let's use the getResultValues() function to retrieve the time series values for this result by passing in the ResultID. We set the index to ValueDateTime
for convenience.
In [15]:
# Get the values for a particular TimeSeriesResult; a Pandas Dataframe is returned
tsValues = read.getResultValues(resultids=[1], lowercols=False)
tsValues.set_index('ValueDateTime', inplace=True)
tsValues.sort_index(inplace=True)
tsValues.head()
Out[15]:
In [16]:
tsValues['DataValue'].plot()
Out[16]:
In [17]:
fig, ax = plt.subplots(figsize=(12, 4))
tsValues['DataValue'].plot(ax=ax)
ax.set_ylabel('{} ({})'.format(
tsResult.VariableObj.VariableNameCV,
tsResult.UnitsObj.UnitsAbbreviation))
ax.set_xlabel('')
ax.xaxis.set_minor_locator(mpl.dates.MonthLocator())
ax.xaxis.set_minor_formatter(mpl.dates.DateFormatter('%b'))
ax.xaxis.set_major_locator(mpl.dates.YearLocator())
ax.xaxis.set_major_formatter(mpl.dates.DateFormatter('\n%Y'))
In [ ]: