Puerto Rico water quality measurements-results data

Adapted from ODM2 API: Retrieve, manipulate and visualize ODM2 water quality measurement-type data

This example shows how to use the ODM2 Python API (odm2api) to connect to an ODM2 database, retrieve data, and analyze and visualize the data. The database (iUTAHGAMUT_waterquality_measurementresults_ODM2.sqlite) contains "measurement"-type results.

This example uses SQLite for the database because it doesn't require a server. However, the ODM2 Python API demonstrated here can alse be used with ODM2 databases implemented in MySQL, PostgreSQL or Microsoft SQL Server.

More details on the ODM2 Python API and its source code and latest development can be found at https://github.com/ODM2/ODM2PythonAPI

Emilio Mayorga. Last updated 2019-3-27.



In [1]:

    
%matplotlib inline

import os

import matplotlib.pyplot as plt
from shapely.geometry import Point
import pandas as pd
import geopandas as gpd
import folium
from folium.plugins import MarkerCluster

import odm2api
from odm2api.ODMconnection import dbconnection
import odm2api.services.readService as odm2rs









    



/home/mayorga/miniconda/envs/odm2client/lib/python2.7/site-packages/folium/__init__.py:59: UserWarning: This version of folium is the last to support Python 2. Transition to Python 3 to be able to receive updates and fixes. Check out https://python3statement.org/ for more info.
  UserWarning



In [2]:

    
pd.__version__, gpd.__version__, folium.__version__









    Out[2]:





(u'0.24.2', u'0.4.1', u'0.8.3')

odm2api version used to run this notebook:



In [3]:

    
odm2api.__version__









    Out[3]:





u'0.7.1'

Connect to the ODM2 SQLite Database

This example uses an ODM2 SQLite database file loaded with water quality sample data from multiple monitoring sites in the iUTAH Gradients Along Mountain to Urban Transitions (GAMUT) water quality monitoring network. Water quality samples have been collected and analyzed for nitrogen, phosphorus, total coliform, E-coli, and some water isotopes. The database (iUTAHGAMUT_waterquality_measurementresults_ODM2.sqlite) contains "measurement"-type results.

The example database is located in the data sub-directory.



In [4]:

    
# Assign directory paths and SQLite file name
dbname_sqlite = "MariaWaterQualityData.sqlite"

# sqlite_pth = os.path.join("data", dbname_sqlite)
sqlite_pth = dbname_sqlite



In [5]:

    
try:
    session_factory = dbconnection.createConnection('sqlite', sqlite_pth)
    read = odm2rs.ReadODM2(session_factory)
    print("Database connection successful!")
except Exception as e:
    print("Unable to establish connection to the database: ", e)









    



Database connection successful!

Run Some Basic Queries on the ODM2 Database

This section shows some examples of how to use the API to run both simple and more advanced queries on the ODM2 database, as well as how to examine the query output in convenient ways thanks to Python tools.

Simple query functions like getVariables( ) return objects similar to the entities in ODM2, and individual attributes can then be retrieved from the objects returned.

Get all Variables

A simple query with simple output.



In [6]:

    
# Get all of the Variables from the ODM2 database then read the records
# into a Pandas DataFrame to make it easy to view and manipulate
allVars = read.getVariables()

variables_df = pd.DataFrame.from_records([vars(variable) for variable in allVars], 
                                         index='VariableID')
variables_df.head(10)









    Out[6]:







  
    
      
      NoDataValue
      SpeciationCV
      VariableCode
      VariableDefinition
      VariableNameCV
      VariableTypeCV
      _sa_instance_state
    
    
      VariableID
      
      
      
      
      
      
      
    
  
  
    
      1
      -9999.0000000000
      Not Applicable
      pH
      None
      pH
      Water quality
      <sqlalchemy.orm.state.InstanceState object at ...
    
    
      2
      -9999.0000000000
      Not Applicable
      DO
      None
      Oxygen, dissolved
      Water quality
      <sqlalchemy.orm.state.InstanceState object at ...
    
    
      3
      -9999.0000000000
      Not Applicable
      FirstDrawTemp
      None
      Temperature
      Water quality
      <sqlalchemy.orm.state.InstanceState object at ...
    
    
      4
      -9999.0000000000
      Not Applicable
      Free Cl2
      None
      Chlorine
      Water quality
      <sqlalchemy.orm.state.InstanceState object at ...
    
    
      5
      -9999.0000000000
      Not Applicable
      Total Cl2
      None
      Chlorine
      Water quality
      <sqlalchemy.orm.state.InstanceState object at ...
    
    
      6
      -9999.0000000000
      Not Applicable
      Turbidity
      None
      Turbidity
      Water quality
      <sqlalchemy.orm.state.InstanceState object at ...
    
    
      7
      -9999.0000000000
      Not Applicable
      InitialTemp
      None
      Temperature
      Water quality
      <sqlalchemy.orm.state.InstanceState object at ...
    
    
      8
      -9999.0000000000
      Not Applicable
      FinalTemp
      None
      Temperature
      Water quality
      <sqlalchemy.orm.state.InstanceState object at ...
    
    
      9
      -9999.0000000000
      Not Applicable
      Na_filt
      None
      Sodium, dissolved
      Water quality
      <sqlalchemy.orm.state.InstanceState object at ...
    
    
      10
      -9999.0000000000
      Not Applicable
      Mg_filt
      None
      Magnesium, dissolved
      Water quality
      <sqlalchemy.orm.state.InstanceState object at ...

Get all People

Another simple query.



In [7]:

    
allPeople = read.getPeople()
pd.DataFrame.from_records([vars(person) for person in allPeople]).head()









    Out[7]:







  
    
      
      PersonFirstName
      PersonID
      PersonLastName
      PersonMiddleName
      _sa_instance_state
    
  
  
    
      0
      William
      1
      Rhoads
      
      <sqlalchemy.orm.state.InstanceState object at ...
    
    
      1
      Kelsey
      2
      Pieper
      
      <sqlalchemy.orm.state.InstanceState object at ...
    
    
      2
      Graciela
      3
      Ramírez Toro
      
      <sqlalchemy.orm.state.InstanceState object at ...
    
    
      3
      Jeffrey
      4
      Parks
      
      <sqlalchemy.orm.state.InstanceState object at ...

Site Sampling Features: pass arguments to the API query

Some of the API functions accept arguments that let you subset what is returned. For example, I can query the database using the getSamplingFeatures( ) function and pass it a SamplingFeatureType of "Site" to return a list of those SamplingFeatures that are Sites.



In [8]:

    
# Get all of the SamplingFeatures from the ODM2 database that are Sites
siteFeatures = read.getSamplingFeatures(sftype='Site')

# Read Sites records into a Pandas DataFrame
# "if sf.Latitude" is used only to instantiate/read Site attributes)
df = pd.DataFrame.from_records([vars(sf) for sf in siteFeatures if sf.Latitude])

Since we know this is a geospatial dataset (Sites, which have latitude and longitude), we can use more specialized Python tools like GeoPandas (geospatially enabled Pandas) and Folium interactive maps.



In [9]:

    
# Create a GeoPandas GeoDataFrame from Sites DataFrame
ptgeom = [Point(xy) for xy in zip(df['Longitude'], df['Latitude'])]
gdf = gpd.GeoDataFrame(df, geometry=ptgeom, crs={'init': 'epsg:4326'})
gdf.head(5)









    Out[9]:







  
    
      
      ElevationDatumCV
      Elevation_m
      FeatureGeometryWKT
      Latitude
      Longitude
      SamplingFeatureCode
      SamplingFeatureDescription
      SamplingFeatureGeotypeCV
      SamplingFeatureID
      SamplingFeatureName
      SamplingFeatureTypeCV
      SamplingFeatureUUID
      SiteTypeCV
      SpatialReferenceID
      _sa_instance_state
      geometry
    
  
  
    
      0
      EGM96
      749.0
      None
      18.200178
      -66.664513
      SystemD1_SGW
      System D1 Surface water, as far upstream as po...
      None
      1
      System D1 Surface Water
      Site
      5BE27C7B-0A37-49F9-AF8C-075F745FA23E
      Water-distribution system
      4
      <sqlalchemy.orm.state.InstanceState object at ...
      POINT (-66.664513 18.200178)
    
    
      1
      EGM96
      749.0
      None
      18.200178
      -66.664513
      SystemD1_Res
      System D1 Surface water, at the point it is co...
      None
      2
      System D1 Reservoir
      Site
      C279AE16-3EBE-419A-9ACF-ADCCA0E22205
      Water-distribution system
      4
      <sqlalchemy.orm.state.InstanceState object at ...
      POINT (-66.664513 18.200178)
    
    
      2
      EGM96
      749.0
      None
      18.200178
      -66.664513
      SystemD1_HT
      System D1 Surface water, from a holding/settli...
      None
      3
      System D1 Holding Tank
      Site
      1C5BD9E3-10AA-4207-A71A-190030CEDCCE
      Water-distribution system
      4
      <sqlalchemy.orm.state.InstanceState object at ...
      POINT (-66.664513 18.200178)
    
    
      3
      EGM96
      749.0
      None
      18.200178
      -66.664513
      SystemD1_PV
      System D1 Surface water, from a pipe vent valv...
      None
      4
      System D1 Pipe Vent
      Site
      B972B4CB-FF82-44A2-BC55-E53CD0B13B1B
      Water-distribution system
      4
      <sqlalchemy.orm.state.InstanceState object at ...
      POINT (-66.664513 18.200178)
    
    
      4
      EGM96
      749.0
      None
      18.200178
      -66.664513
      SystemD1_ST
      System D1 Surface water, from setting tank (on...
      None
      5
      System D1 Settling Tank
      Site
      33FA9DB6-D749-40A8-977F-A696F26E78A1
      Water-distribution system
      4
      <sqlalchemy.orm.state.InstanceState object at ...
      POINT (-66.664513 18.200178)



In [10]:

    
# Number of records (features) in GeoDataFrame
len(gdf)









    Out[10]:





89



In [11]:

    
# A trivial but easy-to-generate GeoPandas plot
gdf.plot();

A site has a SiteTypeCV. Let's examine the site type distribution, and use that information to create a new GeoDataFrame column to specify a map marker color by SiteTypeCV.



In [12]:

    
gdf['SiteTypeCV'].value_counts()









    Out[12]:





Water-distribution system    67
Stream                       22
Name: SiteTypeCV, dtype: int64

Now we'll create an interactive and helpful Folium map of the sites. This map features:

Automatic panning to the location of the sites (no hard wiring, except for the zoom scale), based on GeoPandas functionality and information from the ODM2 Site Sampling Features
Color coding by SiteTypeCV
Marker clustering
Simple marker pop ups with content from the ODM2 Site Sampling Features



In [13]:

    
gdf["color"] = gdf.apply(lambda feat: 'green' if feat['SiteTypeCV'] == 'Stream' else 'red', axis=1)



In [14]:

    
m = folium.Map(tiles='CartoDB positron')

marker_cluster = MarkerCluster().add_to(m)
for idx, feature in gdf.iterrows():
    folium.Marker(location=[feature.geometry.y, feature.geometry.x], 
                  icon=folium.Icon(color=feature['color']),
                  popup="{0} ({1}): {2}".format(
                      feature['SamplingFeatureCode'], feature['SiteTypeCV'], 
                      feature['SamplingFeatureName'])
                 ).add_to(marker_cluster)


# Set the map extent (bounds) to the extent of the ODM2 sites
m.fit_bounds(m.get_bounds())
    
# Done with setup. Now render the map
m









    Out[14]:



In [15]:

    
# Get the SamplingFeature object for a particular SamplingFeature by passing its SamplingFeatureCode
site_sf_code = 'SystemD1_PV'

sf = read.getSamplingFeatures(codes=[site_sf_code])[0]
type(sf)









    Out[15]:





odm2api.models.Sites



In [16]:

    
# Simple way to examine the content (properties) of a Python object, as if it were a dictionary
vars(sf)









    Out[16]:





{'ElevationDatumCV': u'EGM96',
 'Elevation_m': 749.0,
 'FeatureGeometryWKT': None,
 'Latitude': 18.200178,
 'Longitude': -66.664513,
 'SamplingFeatureCode': u'SystemD1_PV',
 'SamplingFeatureDescription': u'System D1 Surface water, from a pipe vent valve to keep water flowing (open to atmosphere)',
 'SamplingFeatureGeotypeCV': None,
 'SamplingFeatureID': 4,
 'SamplingFeatureName': u'System D1 Pipe Vent',
 'SamplingFeatureTypeCV': u'Site',
 'SamplingFeatureUUID': u'B972B4CB-FF82-44A2-BC55-E53CD0B13B1B',
 'SiteTypeCV': u'Water-distribution system',
 'SpatialReferenceID': 4,
 '_sa_instance_state': <sqlalchemy.orm.state.InstanceState at 0x7f10459e0210>}

You can also drill down and get objects linked by foreign keys. The API returns related objects in a nested hierarchy so they can be interrogated in an object oriented way. So, if I use the getResults( ) function to return a Result from the database (e.g., a "Measurement" Result), I also get the associated Action that created that Result (e.g., a "Specimen analysis" Action).



In [17]:

    
try:
    # Call getResults, but return only the first Result
    firstResult = read.getResults()[0]
    frfa = firstResult.FeatureActionObj
    frfaa = firstResult.FeatureActionObj.ActionObj
    print("The FeatureAction object for the Result is: ", frfa)
    print("The Action object for the Result is: ", frfaa)
    
    # Print some Action attributes in a more human readable form
    print("\nThe following are some of the attributes for the Action that created the Result: ")
    print("ActionTypeCV: {}".format(frfaa.ActionTypeCV))
    print("ActionDescription: {}".format(frfaa.ActionDescription))
    print("BeginDateTime: {}".format(frfaa.BeginDateTime))
    print("EndDateTime: {}".format(frfaa.EndDateTime))
    print("MethodName: {}".format(frfaa.MethodObj.MethodName))
    print("MethodDescription: {}".format(frfaa.MethodObj.MethodDescription))
except Exception as e:
    print("Unable to demo Foreign Key Example: {}".format(e))









    



('The FeatureAction object for the Result is: ', <FeatureActions({'FeatureActionID': 84, 'SamplingFeatureID': 90, 'ActionID': 84})>)
('The Action object for the Result is: ', <Actions({'MethodID': 11, 'ActionDescription': None, 'ActionFileLink': None, 'EndDateTime': None, 'BeginDateTime': datetime.datetime(2018, 4, 26, 0, 0), 'BeginDateTimeUTCOffset': -4, 'ActionTypeCV': u'Specimen analysis', 'ActionID': 84, 'EndDateTimeUTCOffset': None})>)

The following are some of the attributes for the Action that created the Result: 
ActionTypeCV: Specimen analysis
ActionDescription: None
BeginDateTime: 2018-04-26 00:00:00
EndDateTime: None
MethodName: Inorganics Measurements
Unable to demo Foreign Key Example: 'ascii' codec can't encode character u'\u03bc' in position 457: ordinal not in range(128)



In [18]:

    
vars(frfaa)









    Out[18]:





{'ActionDescription': None,
 'ActionFileLink': None,
 'ActionID': 84,
 'ActionTypeCV': u'Specimen analysis',
 'BeginDateTime': datetime.datetime(2018, 4, 26, 0, 0),
 'BeginDateTimeUTCOffset': -4,
 'EndDateTime': None,
 'EndDateTimeUTCOffset': None,
 'MethodID': 11,
 'MethodObj': <Methods({'MethodCode': u'Inorganics', 'MethodID': 11, 'MethodDescription': u'Inorganics were measured by inductively coupled plasma/mass spectrometry (ICP/MS) following Standard Method 3125B. For total inorganic concentrations, samples were aliquoitted in the field directly from 1 L HDPE bottles to 10 mL tubes and were measured after acidification with 2% nitric acid and a minimum of 24 hours holding time. For "soluble" inorganics (operationally defined as passing a 0.45 um filter), unacidified samples were passed through a 0.45\u03bcm pore size filter in the field to differentiate \u201csoluble\u201d metals from particulate and were measured after acidification with 2% nitric acid and a minimum of 24 hours holding time. A nominal standard curve from 0 to 5,000 or 50,000 ppb (depending on the inorganic of interest) was run for calibration. A standard check at 0 and 100 ppb was run every 10 samples for QAQC.', 'OrganizationID': None, 'MethodTypeCV': u'Specimen analysis', 'MethodName': u'Inorganics Measurements', 'MethodLink': None})>,
 '_sa_instance_state': <sqlalchemy.orm.state.InstanceState at 0x7f1043353a10>}



In [19]:

    
vars(frfaa.MethodObj)









    Out[19]:





{'MethodCode': u'Inorganics',
 'MethodDescription': u'Inorganics were measured by inductively coupled plasma/mass spectrometry (ICP/MS) following Standard Method 3125B. For total inorganic concentrations, samples were aliquoitted in the field directly from 1 L HDPE bottles to 10 mL tubes and were measured after acidification with 2% nitric acid and a minimum of 24 hours holding time. For "soluble" inorganics (operationally defined as passing a 0.45 um filter), unacidified samples were passed through a 0.45\u03bcm pore size filter in the field to differentiate \u201csoluble\u201d metals from particulate and were measured after acidification with 2% nitric acid and a minimum of 24 hours holding time. A nominal standard curve from 0 to 5,000 or 50,000 ppb (depending on the inorganic of interest) was run for calibration. A standard check at 0 and 100 ppb was run every 10 samples for QAQC.',
 'MethodID': 11,
 'MethodLink': None,
 'MethodName': u'Inorganics Measurements',
 'MethodTypeCV': u'Specimen analysis',
 'OrganizationID': None,
 '_sa_instance_state': <sqlalchemy.orm.state.InstanceState at 0x7f1043353f90>}

Get a Result and its Attributes

Because all of the objects are returned in a nested form, if you retrieve a result, you can interrogate it to get all of its related attributes. When a Result object is returned, it includes objects that contain information about Variable, Units, ProcessingLevel, and the related Action that created that Result.



In [20]:

    
print("------- Example of Retrieving Attributes of a Result -------")
try:
    firstResult = read.getResults()[0]
    frfa = firstResult.FeatureActionObj
    print("The following are some of the attributes for the Result retrieved: ")
    print("ResultID: {}".format(firstResult.ResultID))
    print("ResultTypeCV: {}".format(firstResult.ResultTypeCV))
    print("ValueCount: {}".format(firstResult.ValueCount))
    print("ProcessingLevel: {}".format(firstResult.ProcessingLevelObj.Definition))
    print("SampledMedium: {}".format(firstResult.SampledMediumCV))
    print("Variable: {}: {}".format(firstResult.VariableObj.VariableCode, 
                                    firstResult.VariableObj.VariableNameCV))
    print("Units: {}".format(firstResult.UnitsObj.UnitsName))
    print("SamplingFeatureID: {}".format(frfa.SamplingFeatureObj.SamplingFeatureID))
    print("SamplingFeatureCode: {}".format(frfa.SamplingFeatureObj.SamplingFeatureCode))
except Exception as e:
    print("Unable to demo example of retrieving Attributes of a Result: {}".format(e))









    



------- Example of Retrieving Attributes of a Result -------
The following are some of the attributes for the Result retrieved: 
ResultID: 1
ResultTypeCV: Measurement
ValueCount: 1
ProcessingLevel: Quality controlled data
SampledMedium: Liquid aqueous
Variable: Na_filt: Sodium, dissolved
Units: parts per billion
SamplingFeatureID: 90
SamplingFeatureCode: D1_SGW



In [21]:

    
vars(firstResult)









    Out[21]:





{'FeatureActionID': 84,
 'FeatureActionObj': <FeatureActions({'ActionID': 84, 'SamplingFeatureID': 90, 'FeatureActionID': 84})>,
 'ProcessingLevelID': 2,
 'ProcessingLevelObj': <ProcessingLevels({'Definition': u'Quality controlled data', 'Explanation': u'Quality controlled data that have passed quality assurance procedures such as routine estimation of timing and sensor calibration or visual inspection and removal of obvious errors. An example is USGS published streamflow records following parsing through USGS quality control procedures.', 'ProcessingLevelCode': u'1', 'ProcessingLevelID': 2})>,
 'ResultDateTime': datetime.datetime(2018, 2, 26, 0, 0),
 'ResultDateTimeUTCOffset': None,
 'ResultID': 1,
 'ResultTypeCV': u'Measurement',
 'ResultUUID': u'1FFB7AB0-E156-4522-944F-24AFD1040F67',
 'SampledMediumCV': u'Liquid aqueous',
 'StatusCV': u'Complete',
 'TaxonomicClassifierID': None,
 'UnitsID': 5,
 'UnitsObj': <Units({'UnitsLink': None, 'UnitsName': u'parts per billion', 'UnitsTypeCV': u'Concentration count per count', 'UnitsAbbreviation': u'ppb', 'UnitsID': 5})>,
 'ValidDateTime': None,
 'ValidDateTimeUTCOffset': None,
 'ValueCount': 1,
 'VariableID': 9,
 'VariableObj': <Variables({'VariableID': 9, 'SpeciationCV': u'Not Applicable', 'VariableNameCV': u'Sodium, dissolved', 'VariableDefinition': None, 'NoDataValue': Decimal('-9999.0000000000'), 'VariableTypeCV': u'Water quality', 'VariableCode': u'Na_filt'})>,
 '_sa_instance_state': <sqlalchemy.orm.state.InstanceState at 0x7f104605f750>}



In [22]:

    
vars(frfa)









    Out[22]:





{'ActionID': 84,
 'ActionObj': <Actions({'MethodID': 11, 'ActionDescription': None, 'ActionFileLink': None, 'EndDateTime': None, 'BeginDateTime': datetime.datetime(2018, 4, 26, 0, 0), 'BeginDateTimeUTCOffset': -4, 'ActionTypeCV': u'Specimen analysis', 'ActionID': 84, 'EndDateTimeUTCOffset': None})>,
 'FeatureActionID': 84,
 'SamplingFeatureID': 90,
 'SamplingFeatureObj': <Specimens({'SamplingFeatureDescription': None, 'SamplingFeatureGeotypeCV': None, 'SamplingFeatureName': None, 'ElevationDatumCV': u'Unknown', 'Elevation_m': None, 'SamplingFeatureUUID': u'67071B3E-28B2-4E3F-A470-CB30D6EF9121', 'SamplingFeatureTypeCV': u'Specimen', 'SamplingFeatureCode': u'D1_SGW', 'SamplingFeatureID': 90, 'FeatureGeometryWKT': None})>,
 '_sa_instance_state': <sqlalchemy.orm.state.InstanceState at 0x7f1043353110>}

The last block of code returns a particular Measurement Result. From that I can get the SamplingFeaureID (in this case 26) for the Specimen from which the Result was generated. But, if I want to figure out which Site the Specimen was collected at, I need to query the database to get the related Site SamplingFeature. I can use getRelatedSamplingFeatures( ) for this. Once I've got the SamplingFeature for the Site, I could get the rest of the SamplingFeature attributes.



In [23]:

    
specimen_sf_id = frfa.SamplingFeatureObj.SamplingFeatureID



In [24]:

    
# specimen-entity attributes only show up after first printing one out explicitly
frfa.SamplingFeatureObj.SpecimenTypeCV









    Out[24]:





u'Grab'



In [25]:

    
vars(frfa.SamplingFeatureObj)









    Out[25]:





{'ElevationDatumCV': u'Unknown',
 'Elevation_m': None,
 'FeatureGeometryWKT': None,
 'IsFieldSpecimen': True,
 'SamplingFeatureCode': u'D1_SGW',
 'SamplingFeatureDescription': None,
 'SamplingFeatureGeotypeCV': None,
 'SamplingFeatureID': 90,
 'SamplingFeatureName': None,
 'SamplingFeatureTypeCV': u'Specimen',
 'SamplingFeatureUUID': u'67071B3E-28B2-4E3F-A470-CB30D6EF9121',
 'SpecimenMediumCV': u'Liquid aqueous',
 'SpecimenTypeCV': u'Grab',
 '_sa_instance_state': <sqlalchemy.orm.state.InstanceState at 0x7f104326b9d0>}



In [26]:

    
# Pass the Sampling Feature ID of the specimen, and the relationship type
relatedSite = read.getRelatedSamplingFeatures(sfid=specimen_sf_id, 
                                              relationshiptype='Was Collected at')[0]



In [27]:

    
vars(relatedSite)









    Out[27]:





{'ElevationDatumCV': u'EGM96',
 'Elevation_m': 749.0,
 'FeatureGeometryWKT': None,
 'Latitude': 18.200178,
 'Longitude': -66.664513,
 'SamplingFeatureCode': u'SystemD1_SGW',
 'SamplingFeatureDescription': u'System D1 Surface water, as far upstream as possible',
 'SamplingFeatureGeotypeCV': None,
 'SamplingFeatureID': 1,
 'SamplingFeatureName': u'System D1 Surface Water',
 'SamplingFeatureTypeCV': u'Site',
 'SamplingFeatureUUID': u'5BE27C7B-0A37-49F9-AF8C-075F745FA23E',
 'SiteTypeCV': u'Water-distribution system',
 'SpatialReferenceID': 4,
 '_sa_instance_state': <sqlalchemy.orm.state.InstanceState at 0x7f1045a40890>}

Return Results and Data Values for a Particular Site/Variable

From the list of Variables returned above and the information about the SamplingFeature I queried above, I know that VariableID = 2 for Total Phosphorus and SiteID = 1 for the Red Butte Creek site at 1300E. I can use the getResults( ) function to get all of the Total Phosphorus results for this site by passing in the VariableID and the SiteID.



In [28]:

    
siteID = relatedSite.SamplingFeatureID



In [29]:

    
results_all_at_site = read.getResults(siteid=siteID, restype="Measurement")



In [30]:

    
len(results_all_at_site)









    Out[30]:





47



In [31]:

    
vars(results_all_at_site[0])









    Out[31]:





{'FeatureActionID': 84,
 'FeatureActionObj': <FeatureActions({'ActionID': 84, 'SamplingFeatureID': 90, 'FeatureActionID': 84})>,
 'ProcessingLevelID': 2,
 'ProcessingLevelObj': <ProcessingLevels({'Definition': u'Quality controlled data', 'Explanation': u'Quality controlled data that have passed quality assurance procedures such as routine estimation of timing and sensor calibration or visual inspection and removal of obvious errors. An example is USGS published streamflow records following parsing through USGS quality control procedures.', 'ProcessingLevelCode': u'1', 'ProcessingLevelID': 2})>,
 'ResultDateTime': datetime.datetime(2018, 2, 26, 0, 0),
 'ResultDateTimeUTCOffset': None,
 'ResultID': 1,
 'ResultTypeCV': u'Measurement',
 'ResultUUID': u'1FFB7AB0-E156-4522-944F-24AFD1040F67',
 'SampledMediumCV': u'Liquid aqueous',
 'StatusCV': u'Complete',
 'TaxonomicClassifierID': None,
 'UnitsID': 5,
 'UnitsObj': <Units({'UnitsLink': None, 'UnitsName': u'parts per billion', 'UnitsTypeCV': u'Concentration count per count', 'UnitsAbbreviation': u'ppb', 'UnitsID': 5})>,
 'ValidDateTime': None,
 'ValidDateTimeUTCOffset': None,
 'ValueCount': 1,
 'VariableID': 9,
 'VariableObj': <Variables({'VariableID': 9, 'SpeciationCV': u'Not Applicable', 'VariableNameCV': u'Sodium, dissolved', 'VariableDefinition': None, 'NoDataValue': Decimal('-9999.0000000000'), 'VariableTypeCV': u'Water quality', 'VariableCode': u'Na_filt'})>,
 '_sa_instance_state': <sqlalchemy.orm.state.InstanceState at 0x7f104605f750>}



In [32]:

    
res_vars = []
for r in results_all_at_site:
    res_vars.append([r.ResultID, r.ResultDateTime, r.VariableID, 
                     r.VariableObj.VariableCode, r.VariableObj.VariableNameCV, r.VariableObj.VariableDefinition])



In [33]:

    
# print out a count of number of results for each variable, plus the date range
# Do this by ingesting into a data frame first



In [34]:

    
res_vars_df = pd.DataFrame(res_vars, columns=['ResultID', 'ResultDateTime', 'VariableID', 'VariableCode', 'VariableNameCV', 'VariableDefinition'])



In [35]:

    
res_vars_df.head()









    Out[35]:







  
    
      
      ResultID
      ResultDateTime
      VariableID
      VariableCode
      VariableNameCV
      VariableDefinition
    
  
  
    
      0
      1
      2018-02-26
      9
      Na_filt
      Sodium, dissolved
      None
    
    
      1
      84
      2018-02-26
      10
      Mg_filt
      Magnesium, dissolved
      None
    
    
      2
      167
      2018-02-26
      11
      Al_filt
      Aluminum, dissolved
      None
    
    
      3
      250
      2018-02-26
      12
      Si_filt
      Silicon, dissolved
      None
    
    
      4
      333
      2018-02-26
      13
      P_filt
      Phosphorus, dissolved
      None



In [36]:

    
res_vars_df.VariableCode.value_counts()









    Out[36]:





pH               2
Free Cl2         2
Total Cl2        2
Fe_filt          1
Si_filt          1
Na_tot           1
Cu_filt          1
Cu_tot           1
S_tot            1
Cl_filt          1
Fe_tot           1
Mg_tot           1
Ba_filt          1
Cr_tot           1
InitialTemp      1
S_filt           1
Al_tot           1
Mn_filt          1
FirstDrawTemp    1
Al_filt          1
Turbidity        1
FinalTemp        1
Si_tot           1
Mg_filt          1
Zn_filt          1
Ba_tot           1
P_tot            1
K_tot            1
Cl_tot           1
Br_tot           1
P_filt           1
Ca_tot           1
Pb_tot           1
Br_filt          1
Cr_filt          1
Pb_filt          1
As_filt          1
Zn_tot           1
As_tot           1
Ca_filt          1
Mn_tot           1
K_filt           1
Na_filt          1
DO               1
Name: VariableCode, dtype: int64

Retrieve the Result (Data) Values, Then Create a Quick Time Series Plot of the Data

Now I can retrieve all of the data values associated with the list of Results I just retrieved. In ODM2, water chemistry measurements are stored as "Measurement" results. Each "Measurement" Result has a single data value associated with it. So, for convenience, the getResultValues( ) function allows you to pass in a list of ResultIDs so you can get the data values for all of them back in a Pandas data frame object, which is easier to work with. Once I've got the data in a Pandas data frame object, I can use the plot( ) function directly on the data frame to create a quick visualization.

10/5/2018. NOTE CURRENT ISSUE REGARDING ValueDateTime RETURNED BY read.getResultValues. There seems to be an unexpected behavior with the data type returned for ValueDateTime for SQLite databases. It should be a datetime, but it's currently a string. This is being investigated. For now, we are converting to a datetime manually in cells 25 and 27 via the statement:

resultValues['ValueDateTime'] = pd.to_datetime(resultValues['ValueDateTime'])

This problem is present in odm2api version 0.7.1, but was not present in Nov. 2017



In [37]:

    
# function that encapsulates the `VariableID`, `getResults` and `getResultValues` queries
def get_results_and_values(siteid, variablecode):
    v = variables_df[variables_df['VariableCode'] == variablecode]
    variableID = v.index[0]
    
    results = read.getResults(siteid=siteid, variableid=variableID, restype="Measurement")
    resultIDList = [x.ResultID for x in results]
    
    # Get all of the data values for the Results in the list created above
    # Call getResultValues, which returns a Pandas Data Frame with the data
    resultValues = read.getResultValues(resultids=resultIDList, lowercols=False)
    resultValues['ValueDateTime'] = pd.to_datetime(resultValues['ValueDateTime'])
    
    return resultValues, results



In [43]:

    
resultValues, results = get_results_and_values(siteID, 'pH')



In [44]:

    
resultValues.head()









    Out[44]:







  
    
      
      ValueID
      ResultID
      DataValue
      ValueDateTime
      ValueDateTimeUTCOffset
    
  
  
    
      0
      2989
      2989
      6.03
      2018-02-26
      -5
    
    
      1
      3476
      3476
      7.03
      2018-02-26
      -5



In [45]:

    
result_select = results[0]



In [46]:

    
# Plot the time sequence of Measurement Result Values 
ax = resultValues.plot(x='ValueDateTime', y='DataValue', title=relatedSite.SamplingFeatureName,
                       kind='line', use_index=True, style='o')
ax.set_ylabel("{0} ({1})".format(result_select.VariableObj.VariableNameCV, 
                                 result_select.UnitsObj.UnitsAbbreviation))
ax.set_xlabel('Date/Time')
ax.legend().set_visible(False)



In [47]:

    
# results_faam = lambda results, i: results[i].FeatureActionObj.ActionObj.MethodObj
method = result_select.FeatureActionObj.ActionObj.MethodObj

print("METHOD:  {0} ({1})".format(method.MethodName, method.MethodDescription))









    



METHOD:  Stagnant pH (Measured using Orion 100-Series pH meter calibrated before each use in pH 4,7. and 10 buffers with automatic temperature control.)



In [ ]:

	NoDataValue	SpeciationCV	VariableCode	VariableDefinition	VariableNameCV	VariableTypeCV	_sa_instance_state
VariableID
1	-9999.0000000000	Not Applicable	pH	None	pH	Water quality	<sqlalchemy.orm.state.InstanceState object at ...
2	-9999.0000000000	Not Applicable	DO	None	Oxygen, dissolved	Water quality	<sqlalchemy.orm.state.InstanceState object at ...
3	-9999.0000000000	Not Applicable	FirstDrawTemp	None	Temperature	Water quality	<sqlalchemy.orm.state.InstanceState object at ...
4	-9999.0000000000	Not Applicable	Free Cl2	None	Chlorine	Water quality	<sqlalchemy.orm.state.InstanceState object at ...
5	-9999.0000000000	Not Applicable	Total Cl2	None	Chlorine	Water quality	<sqlalchemy.orm.state.InstanceState object at ...
6	-9999.0000000000	Not Applicable	Turbidity	None	Turbidity	Water quality	<sqlalchemy.orm.state.InstanceState object at ...
7	-9999.0000000000	Not Applicable	InitialTemp	None	Temperature	Water quality	<sqlalchemy.orm.state.InstanceState object at ...
8	-9999.0000000000	Not Applicable	FinalTemp	None	Temperature	Water quality	<sqlalchemy.orm.state.InstanceState object at ...
9	-9999.0000000000	Not Applicable	Na_filt	None	Sodium, dissolved	Water quality	<sqlalchemy.orm.state.InstanceState object at ...
10	-9999.0000000000	Not Applicable	Mg_filt	None	Magnesium, dissolved	Water quality	<sqlalchemy.orm.state.InstanceState object at ...

	PersonFirstName	PersonID	PersonLastName	_sa_instance_state
0	William	1	Rhoads	<sqlalchemy.orm.state.InstanceState object at ...
1	Kelsey	2	Pieper	<sqlalchemy.orm.state.InstanceState object at ...
2	Graciela	3	Ramírez Toro	<sqlalchemy.orm.state.InstanceState object at ...
3	Jeffrey	4	Parks	<sqlalchemy.orm.state.InstanceState object at ...

	ElevationDatumCV	Elevation_m	FeatureGeometryWKT	Latitude	Longitude	SamplingFeatureCode	SamplingFeatureDescription	SamplingFeatureGeotypeCV	SamplingFeatureID	SamplingFeatureName	SamplingFeatureTypeCV	SamplingFeatureUUID	SiteTypeCV	SpatialReferenceID	_sa_instance_state	geometry
0	EGM96	749.0	None	18.200178	-66.664513	SystemD1_SGW	System D1 Surface water, as far upstream as po...	None	1	System D1 Surface Water	Site	5BE27C7B-0A37-49F9-AF8C-075F745FA23E	Water-distribution system	4	<sqlalchemy.orm.state.InstanceState object at ...	POINT (-66.664513 18.200178)
1	EGM96	749.0	None	18.200178	-66.664513	SystemD1_Res	System D1 Surface water, at the point it is co...	None	2	System D1 Reservoir	Site	C279AE16-3EBE-419A-9ACF-ADCCA0E22205	Water-distribution system	4	<sqlalchemy.orm.state.InstanceState object at ...	POINT (-66.664513 18.200178)
2	EGM96	749.0	None	18.200178	-66.664513	SystemD1_HT	System D1 Surface water, from a holding/settli...	None	3	System D1 Holding Tank	Site	1C5BD9E3-10AA-4207-A71A-190030CEDCCE	Water-distribution system	4	<sqlalchemy.orm.state.InstanceState object at ...	POINT (-66.664513 18.200178)
3	EGM96	749.0	None	18.200178	-66.664513	SystemD1_PV	System D1 Surface water, from a pipe vent valv...	None	4	System D1 Pipe Vent	Site	B972B4CB-FF82-44A2-BC55-E53CD0B13B1B	Water-distribution system	4	<sqlalchemy.orm.state.InstanceState object at ...	POINT (-66.664513 18.200178)
4	EGM96	749.0	None	18.200178	-66.664513	SystemD1_ST	System D1 Surface water, from setting tank (on...	None	5	System D1 Settling Tank	Site	33FA9DB6-D749-40A8-977F-A696F26E78A1	Water-distribution system	4	<sqlalchemy.orm.state.InstanceState object at ...	POINT (-66.664513 18.200178)

	ResultID	ResultDateTime	VariableID	VariableCode	VariableNameCV	VariableDefinition
0	1	2018-02-26	9	Na_filt	Sodium, dissolved	None
1	84	2018-02-26	10	Mg_filt	Magnesium, dissolved	None
2	167	2018-02-26	11	Al_filt	Aluminum, dissolved	None
3	250	2018-02-26	12	Si_filt	Silicon, dissolved	None
4	333	2018-02-26	13	P_filt	Phosphorus, dissolved	None