A Notebook to analyze downloaded gridded climate time-series data

(Case study: the Sauk-Suiattle Watershed )

This data is compiled to digitally observe the Sauk-Suiattle Watershed, powered by HydroShare.

Use this Jupyter Notebook to:
Migrate data sets from prior data download events, Compute daily, monthly, and annual temperature and precipitation statistics,
Visualize precipitation results relative to the forcing data,
Visualize the time-series trends among the gridded cells using different Gridded data products.

A Watershed Dynamics Model by the Watershed Dynamics Research Group in the Civil and Environmental Engineering Department at the University of Washington

1. HydroShare Setup and Preparation

To run this notebook, we must import several libaries. These are listed in order of 1) Python standard libraries, 2) hs_utils library provides functions for interacting with HydroShare, including resource querying, dowloading and creation, and 3) the observatory_gridded_hydromet library that is downloaded with this notebook.



In [ ]:

    
#conda install -c conda-forge basemap-data-hires --yes



In [ ]:

    
# data processing
import os
import pandas as pd, numpy as np, dask, json
import ogh
import geopandas as gpd

# data migration library
from utilities import hydroshare

# plotting and shape libraries
%matplotlib inline

# silencing warning
import warnings
warnings.filterwarnings("ignore")

import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline

# spatial plotting
import fiona
import shapely.ops
from shapely.geometry import MultiPolygon, shape, point, box, Polygon
from mpl_toolkits.basemap import Basemap

# # spatial plotting
# import fiona
# import shapely.ops
# from shapely.geometry import MultiPolygon, shape, point, box, Polygon
# from descartes import PolygonPatch
# from matplotlib.collections import PatchCollection
# from mpl_toolkits.basemap import Basemap



In [ ]:

    
# initialize ogh_meta
meta_file = dict(ogh.ogh_meta())
sorted(meta_file.keys())



In [ ]:

    
sorted(meta_file['dailymet_livneh2013'].keys())

Establish a secure connection with HydroShare by instantiating the hydroshare class that is defined within hs_utils. In addition to connecting with HydroShare, this command also sets and prints environment variables for several parameters that will be useful for saving work back to HydroShare.



In [ ]:

    
notebookdir = os.getcwd()

hs=hydroshare.hydroshare()
homedir = hs.getContentPath(os.environ["HS_RES_ID"])
os.chdir(homedir)
print('Data will be loaded from and save to:'+homedir)

If you are curious about where the data is being downloaded, click on the Jupyter Notebook dashboard icon to return to the File System view. The homedir directory location printed above is where you can find the data and contents you will download to a HydroShare JupyterHub server. At the end of this work session, you can migrate this data to the HydroShare iRods server as a Generic Resource.

2. Get list of gridded climate points for the watershed

This example uses a shapefile with the watershed boundary of the Sauk-Suiattle Basin, which is stored in HydroShare at the following url: https://www.hydroshare.org/resource/c532e0578e974201a0bc40a37ef2d284/.

The data for our processing routines can be retrieved using the getResourceFromHydroShare function by passing in the global identifier from the url above. In the next cell, we download this resource from HydroShare, and identify that the points in this resource are available for downloading gridded hydrometeorology data, based on the point shapefile at https://www.hydroshare.org/resource/ef2d82bf960144b4bfb1bae6242bcc7f/, which is for the extent of North America and includes the average elevation for each 1/16 degree grid cell. The file must include columns with station numbers, latitude, longitude, and elevation. The header of these columns must be FID, LAT, LONG_, and ELEV or RASTERVALU, respectively. The station numbers will be used for the remainder of the code to uniquely reference data from each climate station, as well as to identify minimum, maximum, and average elevation of all of the climate stations. The webserice is currently set to a URL for the smallest geographic overlapping extent - e.g. WRF for Columbia River Basin (to use a limit using data from a FTP service, treatgeoself() would need to be edited in observatory_gridded_hydrometeorology utility).



In [ ]:

    
"""
Sauk
"""
# Watershed extent
hs.getResourceFromHydroShare('c532e0578e974201a0bc40a37ef2d284')
sauk = hs.content['wbdhub12_17110006_WGS84_Basin.shp']

Summarize the file availability from each watershed mapping file



In [ ]:

    
# map the mappingfiles from usecase1
mappingfile1 = os.path.join(homedir,'Sauk_mappingfile.csv')
mappingfile2 = os.path.join(homedir,'Elwha_mappingfile.csv')
mappingfile3 = os.path.join(homedir,'RioSalado_mappingfile.csv')

t1 = ogh.mappingfileSummary(listofmappingfiles = [mappingfile1, mappingfile2, mappingfile3], 
                            listofwatershednames = ['Sauk-Suiattle river','Elwha river','Upper Rio Salado'],
                            meta_file=meta_file)

t1

3. Compare Hydrometeorology

This section performs computations and generates plots of the Livneh 2013, Livneh 2016, and WRF 2014 temperature and precipitation data in order to compare them with each other and observations. The generated plots are automatically downloaded and saved as .png files in the "plots" folder of the user's home directory and inline in the notebook.



In [ ]:

    
# Livneh et al., 2013
dr1 = meta_file['dailymet_livneh2013']

# Salathe et al., 2014
dr2 = meta_file['dailywrf_salathe2014']

# define overlapping time window
dr = ogh.overlappingDates(date_set1=tuple([dr1['start_date'], dr1['end_date']]), 
                          date_set2=tuple([dr2['start_date'], dr2['end_date']]))
dr

INPUT: gridded meteorology from Jupyter Hub folders

Data frames for each set of data are stored in a dictionary. The inputs to gridclim_dict() include the folder location and name of the hydrometeorology data, the file start and end, the analysis start and end, and the elevation band to be included in the analsyis (max and min elevation).

Create a dictionary of climate variables for the long-term mean (ltm) using the default elevation option of calculating a high, mid, and low elevation average. The dictionary here is initialized with the Livneh et al., 2013 dataset with a dictionary output 'ltm_3bands', which is used as an input to the second time we run gridclim_dict(), to add the Salathe et al., 2014 data to the same dictionary.



In [ ]:

    
%%time
ltm_3bands = ogh.gridclim_dict(mappingfile=mappingfile1,
                               metadata=meta_file,
                               dataset='dailymet_livneh2013',
                               file_start_date=dr1['start_date'], 
                               file_end_date=dr1['end_date'],
                               file_time_step=dr1['temporal_resolution'],
                               subset_start_date=dr[0],
                               subset_end_date=dr[1])



In [ ]:

    
%%time
ltm_3bands = ogh.gridclim_dict(mappingfile=mappingfile1,
                               metadata=meta_file,
                               dataset='dailyvic_livneh2013',
                               file_start_date=dr1['start_date'], 
                               file_end_date=dr1['end_date'],
                               file_time_step=dr1['temporal_resolution'],
                               subset_start_date=dr[0],
                               subset_end_date=dr[1],
                               df_dict=ltm_3bands)



In [ ]:

    
sorted(ltm_3bands.keys())



In [ ]:

    
meta = meta_file['dailyvic_livneh2013']['variable_info']

meta_df = pd.DataFrame.from_dict(meta).T

meta_df.loc[['BASEFLOW','RUNOFF'],:]



In [ ]:

    
ltm_3bands['STREAMFLOW_dailyvic_livneh2013']=ltm_3bands['BASEFLOW_dailyvic_livneh2013']+ltm_3bands['RUNOFF_dailyvic_livneh2013']
ltm_3bands['STREAMFLOW_dailyvic_livneh2013']



In [ ]:

    
"""
Sauk-Suiattle
"""
# Watershed extent
hs.getResourceFromHydroShare('c532e0578e974201a0bc40a37ef2d284')
sauk = hs.content['wbdhub12_17110006_WGS84_Basin.shp']

"""
Elwha
"""
# Watershed extent
hs.getResourceFromHydroShare('4aff8b10bc424250b3d7bac2188391e8', )
elwha = hs.content["elwha_ws_bnd_wgs84.shp"]

"""
Upper Rio Salado
"""
# Watershed extent
hs.getResourceFromHydroShare('5c041d95ceb64dce8eb85d2a7db88ed7')
riosalado = hs.content['UpperRioSalado_delineatedBoundary.shp']



In [ ]:

    
# generate surface area for each gridded cell
def computegcSurfaceArea(shapefile, spatial_resolution, vardf):
    """
    Data-driven computation of gridded cell surface area using the list of gridded cells centroids
    
    shapefile: (dir) the path to the study site shapefile for selecting the UTM boundary
    spatial_resolution: (float) the spatial resolution in degree coordinate reference system e.g., 1/16
    vardf: (dataframe) input dataframe that contains FID, LAT and LONG references for each gridded cell centroid
    
    return: (mean surface area in meters-squared, standard deviation in surface area)
    """
    
    # ensure projection into WGS84 longlat values
    ogh.reprojShapefile(shapefile)

    # generate the figure axis
    fig = plt.figure(figsize=(2,2), dpi=500)
    ax1 = plt.subplot2grid((1,1),(0,0))

    # calculate bounding box based on the watershed shapefile
    watershed = gpd.read_file(shapefile)
    watershed['watershed']='watershed'
    watershed = watershed.dissolve(by='watershed')

    # extract area centroid, bounding box info, and dimension shape
    lon0, lat0 = np.array(watershed.centroid.iloc[0])
    minx, miny, maxx, maxy = watershed.bounds.iloc[0]

    # generate traverse mercatur projection
    m = Basemap(projection='tmerc', resolution='h', ax=ax1, lat_0=lat0, lon_0=lon0,
                llcrnrlon=minx, llcrnrlat=miny, urcrnrlon=maxx, urcrnrlat=maxy)

    # generate gridded cell bounding boxes
    midpt_dist=spatial_resolution/2
    cat=vardf.T.reset_index(level=[1,2]).rename(columns={'level_1':'LAT','level_2':'LONG_'})
    geometry = cat.apply(lambda x: 
                         shapely.ops.transform(m, box(x['LONG_']-midpt_dist, x['LAT']-midpt_dist,
                                                      x['LONG_']+midpt_dist, x['LAT']+midpt_dist)), axis=1)

    # compute gridded cell area
    gc_area = geometry.apply(lambda x: x.area)
    plt.gcf().clear()
    return(gc_area.mean(), gc_area.sd())



In [ ]:

    
gcSA = computeMeanSurfaceArea(shapefile=sauk, spatial_resolution=1/16, vardf=ltm_3bands['STREAMFLOW_dailyvic_livneh2013'])

gcSA



In [ ]:

    
# convert mm/s to m/s
df_dict = ltm_3bands
objname = 'STREAMFLOW_dailyvic_livneh2013'
dataset = objname.split('_',1)[1]
gridcell_area = geo_area
exceedance = 10



# convert mmps to mps
mmps = df_dict[objname]
mps = mmps*0.001

# multiply streamflow (mps) with grid cell surface area (m2) to produce volumetric streamflow (cms)
cms = mps.multiply(np.array(geo_area))

# convert m^3/s to cfs; multiply with (3.28084)^3
cfs = cms.multiply((3.28084)**3)

# output to df_dict
df_dict['cfs_'+objname] = cfs

# time-group by month-yearly streamflow volumetric values
monthly_cfs = cfs.groupby(pd.TimeGrouper('M')).sum()
monthly_cfs.index = pd.Series(monthly_cfs.index).apply(lambda x: x.strftime('%Y-%m'))

# output to df_dict
df_dict['monthly_cfs_'+objname] = monthly_cfs




# prepare for Exceedance computations
row_indices = pd.Series(monthly_cfs.index).map(lambda x: pd.datetime.strptime(x, '%Y-%m').month)
months = range(1,13)
Exceed = pd.DataFrame()

# for each month
for eachmonth in months:
    month_index = row_indices[row_indices==eachmonth].index
    month_res = monthly_cfs.iloc[month_index,:].reset_index(drop=True)

    # generate gridded-cell-specific 10% exceedance probability values
    exceed = pd.DataFrame(month_res.apply(lambda x: np.percentile(x, 0.90), axis=0)).T
    
    # append to dataframe
    Exceed = pd.concat([Exceed, exceed])
    
# set index to month order
Exceed = Exceed.set_index(np.array(months))

# output to df_dict
df_dict['EXCEED{0}_{1}'.format(exceedance,dataset)] = Exceed

# return(df_dict)



In [ ]:

    
def monthlyExceedence_cfs (df_dict,
                           daily_streamflow_dfname,
                           gridcell_area,
                           exceedance=10,
                           start_date=None,
                           end_date=None):
    
    """
    streamflow_df: (dataframe) streamflow values in cu. ft per second (row_index: date, col_index: gridcell ID)
    daily_baseflow_df: (dataframe) groundwater flow in cu. ft per second (row_index: date, col_index: gridcell ID)
    daily_surfacerunoff_df: (dateframe) surface flow in cu. ft per second (row_index: date, col_index: gridcell ID)
    start_date: (datetime) start date
    end_date: (datetime) end date
    """
    
    #daily_baseflow_df=None, #'BASEFLOW_dailyvic_livneh2013', 
    #daily_surfacerunoff_df=None, #'RUNOFF_dailyvic_livneh2013',

    ## aggregate each daily streamflow value to a month-yearly sum
    mmps = df_dict[daily_streamflow_dfname]
    ## subset daily streamflow_df to that index range
    if isinstance(start_date, type(None)):
        startyear=0
    if isinstance(end_date, type(None)):
        endyear=len(mmps)-1
    mmps = mmps.iloc[start_date:end_date,:]
    
    # convert mmps to mps
    mps = mmps*0.001

    # multiply streamflow (mps) with grid cell surface area (m2) to produce volumetric streamflow (cms)
    cms = mps.multiply(np.array(geo_area))

    # convert m^3/s to cfs; multiply with (3.28084)^3
    cfs = cms.multiply((3.28084)**3)

    # output to df_dict
    df_dict['cfs_'+objname] = cfs

    # time-group by month-yearly streamflow volumetric values
    monthly_cfs = cfs.groupby(pd.TimeGrouper('M')).sum()
    monthly_cfs.index = pd.Series(monthly_cfs.index).apply(lambda x: x.strftime('%Y-%m'))

    # output to df_dict
    df_dict['monthly_cfs_'+objname] = monthly_cfs

    
    monthly_streamflow_df = daily_streamflow_df.groupby(pd.TimeGrouper("M")).sum()
    
    # loop through each station
    
    for eachcol in monthly_streamflow_df.columns():
        station_moyr = dask.delayed(monthly_streamflow_df.loc[:,eachcol])
        station_moyr
        
        
df_dict = ltm_3bands
objname = 'STREAMFLOW_dailyvic_livneh2013'
dataset = objname.split('_',1)[1]
gridcell_area = geo_area
exceedance = 10

ltm_3bands['STREAMFLOW_dailyvic_livneh2013']=ltm_3bands['BASEFLOW_dailyvic_livneh2013']+ltm_3bands['RUNOFF_dailyvic_livneh2013']
        
#         function [Qex] = monthlyExceedence_cfs(file,startyear,endyear)
# % Load data from specified file
# data = load(file);
# Y=data(:,1);
# MO=data(:,2);
# D=data(:,3);
# t = datenum(Y,MO,D);
# %%%
# % startyear=data(1,1);
# % endyear=data(length(data),1);

# Qnode = data(:,4);

# %  Time control indices that cover selected period
# d_start = datenum(startyear,10,01,23,0,0); % 1 hour early to catch record for that day
# d_end = datenum(endyear,09,30,24,0,0); 
# idx = find(t>=d_start & t<=d_end);

# exds=(1:19)./20;  % Exceedence probabilities from 0.05 to 0.95
# mos=[10,11,12,1:9];
# Qex(1:19,1:12)=0 ; % initialize array
# for imo=1:12;
#     mo=mos(imo);
#     [Y, M, D, H, MI, S] = datevec(t(idx));
#     ind=find(M==mo);  % find all flow values in that month in the period identified
#     Q1=Qnode(idx(ind)); % input is in cfs
#     for iex=1:19
#         Qex(iex,imo)=quantile(Q1,1-exds(iex));
#     end
# end


# end



In [ ]:

    
ltm_3bands['cfs_STREAMFLOW_dailyvic_livneh2013']



In [ ]:

    
ltm_3bands['monthly_cfs_STREAMFLOW_dailyvic_livneh2013']



In [ ]:

    
ltm_3bands['EXCEED10_dailyvic_livneh2013']



In [ ]:

    
# loop through each month to compute the 10% Exceedance Probability
for eachmonth in range(1,13):
    monthlabel = pd.datetime.strptime(str(eachmonth), '%m')

    ogh.renderValuesInPoints(vardf=ltm_3bands['EXCEED10_dailyvic_livneh2013'],
                             vardf_dateindex=eachmonth, 
                             shapefile=sauk.replace('.shp','_2.shp'),
                             outfilepath='sauk{0}exceed10.png'.format(monthlabel.strftime('%b')), 
                             plottitle='Sauk {0} 10% Exceedance Probability'.format(monthlabel.strftime('%B')),
                             colorbar_label='cubic feet per second',
                             cmap='seismic_r')

4. Visualize monthly precipitation spatially using Livneh et al., 2013 Meteorology data

Apply different plotting options:

time-index option
Basemap option
colormap option
projection option



In [ ]:

    
%%time

month=3
monthlabel = pd.datetime.strptime(str(month), '%m')
ogh.renderValuesInPoints(vardf=ltm_3bands['month_PRECIP_dailymet_livneh2013'],
                         vardf_dateindex=month,
                         shapefile=sauk.replace('.shp','_2.shp'), 
                         outfilepath=os.path.join(homedir, 'SaukPrecip{0}.png'.format(monthlabel.strftime('%b'))),
                         plottitle='Sauk-Suiattle watershed'+'\nPrecipitation in '+ monthlabel.strftime('%B'),
                         colorbar_label='Average monthly precipitation (meters)',
                         spatial_resolution=1/16, margin=0.5, epsg=3857,
                         basemap_image='Demographics/USA_Social_Vulnerability_Index',
                         cmap='gray_r')

month=6
monthlabel = pd.datetime.strptime(str(month), '%m')
ogh.renderValuesInPoints(vardf=ltm_3bands['month_PRECIP_dailymet_livneh2013'],
                         vardf_dateindex=month,
                         shapefile=sauk.replace('.shp','_2.shp'), 
                         outfilepath=os.path.join(homedir, 'SaukPrecip{0}.png'.format(monthlabel.strftime('%b'))),
                         plottitle='Sauk-Suiattle watershed'+'\nPrecipitation in '+ monthlabel.strftime('%B'),
                         colorbar_label='Average monthly precipitation (meters)',
                         spatial_resolution=1/16, margin=0.5, epsg=3857,
                         basemap_image='ESRI_StreetMap_World_2D',
                         cmap='gray_r')

month=9
monthlabel = pd.datetime.strptime(str(month), '%m')
ogh.renderValuesInPoints(vardf=ltm_3bands['month_PRECIP_dailymet_livneh2013'],
                         vardf_dateindex=month,
                         shapefile=sauk.replace('.shp','_2.shp'), 
                         outfilepath=os.path.join(homedir, 'SaukPrecip{0}.png'.format(monthlabel.strftime('%b'))),
                         plottitle='Sauk-Suiattle watershed'+'\nPrecipitation in '+ monthlabel.strftime('%B'),
                         colorbar_label='Average monthly precipitation (meters)',
                         spatial_resolution=1/16, margin=0.5, epsg=3857,
                         basemap_image='ESRI_Imagery_World_2D',
                         cmap='gray_r')

month=12
monthlabel = pd.datetime.strptime(str(month), '%m')
ogh.renderValuesInPoints(vardf=ltm_3bands['month_PRECIP_dailymet_livneh2013'],
                         vardf_dateindex=month,
                         shapefile=sauk.replace('.shp','_2.shp'), 
                         outfilepath=os.path.join(homedir, 'SaukPrecip{0}.png'.format(monthlabel.strftime('%b'))),
                         plottitle='Sauk-Suiattle watershed'+'\nPrecipitation in '+ monthlabel.strftime('%B'),
                         colorbar_label='Average monthly precipitation (meters)',
                         spatial_resolution=1/16, margin=0.5, epsg=3857,
                         basemap_image='Elevation/World_Hillshade',
                         cmap='seismic_r')

Visualize monthly precipitation difference between different gridded data products



In [ ]:

    
for month in [3, 6, 9, 12]:
    monthlabel = pd.datetime.strptime(str(month), '%m')
    outfile='SaukLivnehPrecip{0}.png'.format(monthlabel.strftime('%b'))
    
    ax1 = ogh.renderValuesInPoints(vardf=ltm_3bands['month_PRECIP_dailymet_livneh2013'],
                                   vardf_dateindex=month,
                                   shapefile=sauk.replace('.shp','_2.shp'), 
                                   basemap_image='ESRI_Imagery_World_2D',
                                   cmap='seismic_r',
                                   plottitle='Sauk-Suiattle watershed'+'\nPrecipitation in '+monthlabel.strftime('%B'),
                                   colorbar_label='Average monthly precipitation (meters)',
                                   outfilepath=os.path.join(homedir, outfile))

comparison to WRF data from Salathe et al., 2014



In [ ]:

    
%%time
ltm_3bands = ogh.gridclim_dict(mappingfile=mappingfile1,
                               metadata=meta_file,
                               dataset='dailywrf_salathe2014',
                               colvar=None,
                               file_start_date=dr2['start_date'], 
                               file_end_date=dr2['end_date'],
                               file_time_step=dr2['temporal_resolution'],
                               subset_start_date=dr[0],
                               subset_end_date=dr[1],
                               df_dict=ltm_3bands)



In [ ]:

    
for month in [3, 6, 9, 12]:
    monthlabel = pd.datetime.strptime(str(month), '%m')
    outfile='SaukSalathePrecip{0}.png'.format(monthlabel.strftime('%b'))
    
    ax1 = ogh.renderValuesInPoints(vardf=ltm_3bands['month_PRECIP_dailywrf_salathe2014'],
                                   vardf_dateindex=month,
                                   shapefile=sauk.replace('.shp','_2.shp'), 
                                   basemap_image='ESRI_Imagery_World_2D',
                                   cmap='seismic_r',
                                   plottitle='Sauk-Suiattle watershed'+'\nPrecipitation in '+monthlabel.strftime('%B'),
                                   colorbar_label='Average monthly precipitation (meters)',
                                   outfilepath=os.path.join(homedir, outfile))



In [ ]:



In [ ]:

    
def plot_meanTmin(dictionary, loc_name, start_date, end_date):
    # Plot 1: Monthly temperature analysis of Livneh data
    if 'meanmonth_temp_min_liv2013_met_daily' and 'meanmonth_temp_min_wrf2014_met_daily' not in dictionary.keys():
        pass

    # generate month indices
    wy_index=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
    wy_numbers=[10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    month_strings=[ 'Oct', 'Nov', 'Dec', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sept']
    
    # initiate the plot object
    fig, ax=plt.subplots(1,1,figsize=(10, 6))

    if 'meanmonth_temp_min_liv2013_met_daily' in dictionary.keys():
        # Liv2013

        plt.plot(wy_index, dictionary['meanmonth_temp_min_liv2013_met_daily'][wy_numbers],'r-', linewidth=1, label='Liv Temp min')  
    
    if 'meanmonth_temp_min_wrf2014_met_daily' in dictionary.keys():
        # WRF2014
        plt.plot(wy_index, dictionary['meanmonth_temp_min_wrf2014_met_daily'][wy_numbers],'b-',linewidth=1, label='WRF Temp min')
 
    if 'meanmonth_temp_min_livneh2013_wrf2014bc_met_daily' in dictionary.keys():
        # WRF2014
        plt.plot(wy_index, dictionary['meanmonth_temp_min_livneh2013_wrf2014bc_met_daily'][wy_numbers],'g-',linewidth=1, label='WRFbc Temp min')
 
    # add reference line at y=0
    plt.plot([1, 12],[0, 0], 'k-',linewidth=1)

    plt.ylabel('Temp (C)',fontsize=14)
    plt.xlabel('Month',fontsize=14)
    plt.xlim(1,12);
    plt.xticks(wy_index, month_strings);
        
    plt.tick_params(labelsize=12)
    plt.legend(loc='best')
    plt.grid(which='both')
    plt.title(str(loc_name)+'\nMinimum Temperature\n Years: '+str(start_date.year)+'-'+str(end_date.year)+'; Elevation: '+str(dictionary['analysis_elev_min'])+'-'+str(dictionary['analysis_elev_max'])+'m', fontsize=16)
    plt.savefig('monthly_Tmin'+str(loc_name)+'.png')
    plt.show()

6. Compare gridded model to point observations

Read in SNOTEL data - assess available data

If you want to plot observed snotel point precipitation or temperature with the gridded climate data, set to 'Y' Give name of Snotel file and name to be used in figure legends. File format: Daily SNOTEL Data Report - Historic - By individual SNOTEL site, standard sensors (https://www.wcc.nrcs.usda.gov/snow/snotel-data.html)



In [ ]:

    
# Sauk
SNOTEL_file = os.path.join(homedir,'ThunderBasinSNOTEL.txt')
SNOTEL_station_name='Thunder Creek'
SNOTEL_file_use_colsnames = ['Date','Air Temperature Maximum (degF)', 'Air Temperature Minimum (degF)','Air Temperature Average (degF)','Precipitation Increment (in)']
SNOTEL_station_elev=int(4320/3.281) # meters

SNOTEL_obs_daily = ogh.read_daily_snotel(file_name=SNOTEL_file, 
                                         usecols=SNOTEL_file_use_colsnames,
                                         delimiter=',', 
                                         header=58)

# generate the start and stop date
SNOTEL_obs_start_date=SNOTEL_obs_daily.index[0]
SNOTEL_obs_end_date=SNOTEL_obs_daily.index[-1]

# peek
SNOTEL_obs_daily.head(5)

Read in COOP station data - assess available data

https://www.ncdc.noaa.gov/



In [ ]:

    
COOP_file=os.path.join(homedir, 'USC00455678.csv') # Sauk
COOP_station_name='Mt Vernon'
COOP_file_use_colsnames = ['DATE','PRCP','TMAX', 'TMIN','TOBS']
COOP_station_elev=int(4.3) # meters

COOP_obs_daily = ogh.read_daily_coop(file_name=COOP_file,
                                     usecols=COOP_file_use_colsnames,
                                     delimiter=',',
                                     header=0)

# generate the start and stop date
COOP_obs_start_date=COOP_obs_daily.index[0]
COOP_obs_end_date=COOP_obs_daily.index[-1]

# peek
COOP_obs_daily.head(5)



In [ ]:

    
#initiate new dictionary with original data
ltm_0to3000 = ogh.gridclim_dict(metadata=meta_file,
                                mappingfile=mappingfile1,
                                dataset='dailymet_livneh2013',
                                file_start_date=dr1['start_date'], 
                                file_end_date=dr1['end_date'], 
                                subset_start_date=dr[0],
                                subset_end_date=dr[1])

ltm_0to3000 = ogh.gridclim_dict(metadata=meta_file,
                                mappingfile=mappingfile1,
                                dataset='dailywrf_salathe2014',
                                file_start_date=dr2['start_date'], 
                                file_end_date=dr2['end_date'], 
                                subset_start_date=dr[0],
                                subset_end_date=dr[1],
                                df_dict=ltm_0to3000)

sorted(ltm_0to3000.keys())



In [ ]:

    
# read in the mappingfile
mappingfile = mappingfile1

mapdf = pd.read_csv(mappingfile)

# select station by first FID
firstStation = ogh.findStationCode(mappingfile=mappingfile, colvar='FID', colvalue=0)

# select station by elevation
maxElevStation = ogh.findStationCode(mappingfile=mappingfile, colvar='ELEV', colvalue=mapdf.loc[:,'ELEV'].max())
medElevStation = ogh.findStationCode(mappingfile=mappingfile, colvar='ELEV', colvalue=mapdf.loc[:,'ELEV'].median())
minElevStation = ogh.findStationCode(mappingfile=mappingfile, colvar='ELEV', colvalue=mapdf.loc[:,'ELEV'].min())


# print(firstStation, mapdf.iloc[0].ELEV)
# print(maxElevStation, mapdf.loc[:,'ELEV'].max())
# print(medElevStation, mapdf.loc[:,'ELEV'].median())
# print(minElevStation, mapdf.loc[:,'ELEV'].min())

# let's compare monthly averages for TMAX using livneh, salathe, and the salathe-corrected livneh
comp = ['month_TMAX_dailymet_livneh2013',
        'month_TMAX_dailywrf_salathe2014']

obj = dict()
for eachkey in ltm_0to3000.keys():
    if eachkey in comp:
        obj[eachkey] = ltm_0to3000[eachkey]

panel_obj = pd.Panel.from_dict(obj)
panel_obj



In [ ]:

    
comp = ['meanmonth_TMAX_dailymet_livneh2013',
        'meanmonth_TMAX_dailywrf_salathe2014']

obj = dict()
for eachkey in ltm_0to3000.keys():
    if eachkey in comp:
        obj[eachkey] = ltm_0to3000[eachkey]

        df_obj = pd.DataFrame.from_dict(obj)
df_obj



In [ ]:

    
t_res, var, dataset, pub = each.rsplit('_',3)

print(t_res, var, dataset, pub)



In [ ]:

    
ylab_var = meta_file['_'.join([dataset, pub])]['variable_info'][var]['desc']
ylab_unit = meta_file['_'.join([dataset, pub])]['variable_info'][var]['units']

print('{0} {1} ({2})'.format(t_res, ylab_var, ylab_unit))



In [ ]:

    
%%time
comp = [['meanmonth_TMAX_dailymet_livneh2013','meanmonth_TMAX_dailywrf_salathe2014'],
        ['meanmonth_PRECIP_dailymet_livneh2013','meanmonth_PRECIP_dailywrf_salathe2014']]
wy_numbers=[10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9]
month_strings=[ 'Oct', 'Nov', 'Dec', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep']


fig = plt.figure(figsize=(20,5), dpi=500)

ax1 = plt.subplot2grid((2, 2), (0, 0), colspan=1)
ax2 = plt.subplot2grid((2, 2), (1, 0), colspan=1)


# monthly
for eachsumm in df_obj.columns:
    ax1.plot(df_obj[eachsumm])
    

ax1.legend(loc='upper center', bbox_to_anchor=(0.5, -0.05), fancybox=True, shadow=True, ncol=2, fontsize=10)
plt.show()



In [ ]:



In [ ]:

    
df_obj[each].index.apply(lambda x: x+2)



In [ ]:

    
fig, ax = plt.subplots()
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))

lws=[3, 10, 3, 3]
styles=['b--','go-','y--','ro-']

for col, style, lw in zip(comp, styles, lws):
    panel_obj.xs(key=(minElevStation[0][0], minElevStation[0][1], minElevStation[0][2]), axis=2)[col].plot(style=style, lw=lw, ax=ax, legend=True)
ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.05), fancybox=True, shadow=True, ncol=2)
fig.show()
    

    
    
fig, ax = plt.subplots()
lws=[3, 10, 3, 3]
styles=['b--','go-','y--','ro-']

for col, style, lw in zip(comp, styles, lws):
    panel_obj.xs(key=(maxElevStation[0][0], maxElevStation[0][1], maxElevStation[0][2]), 
                 axis=2)[col].plot(style=style, lw=lw, ax=ax, legend=True)

ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.05), fancybox=True, shadow=True, ncol=2)
fig.show()

Set up VIC dictionary (as an example) to compare to available data



In [ ]:

    
vic_dr1 = meta_file['dailyvic_livneh2013']['date_range']
vic_dr2 = meta_file['dailyvic_livneh2015']['date_range']
vic_dr = ogh.overlappingDates(tuple([vic_dr1['start'], vic_dr1['end']]),
                              tuple([vic_dr2['start'], vic_dr2['end']]))

vic_ltm_3bands = ogh.gridclim_dict(mappingfile=mappingfile,
                                   metadata=meta_file,
                                   dataset='dailyvic_livneh2013',
                                   file_start_date=vic_dr1['start'], 
                                   file_end_date=vic_dr1['end'],
                                   file_time_step=vic_dr1['time_step'],
                                   subset_start_date=vic_dr[0],
                                   subset_end_date=vic_dr[1])



In [ ]:

    
sorted(vic_ltm_3bands.keys())

10. Save the results back into HydroShare

Using the hs_utils library, the results of the Geoprocessing steps above can be saved back into HydroShare. First, define all of the required metadata for resource creation, i.e. title, abstract, keywords, content files. In addition, we must define the type of resource that will be created, in this case genericresource.

Note: Make sure you save the notebook at this point, so that all notebook changes will be saved into the new HydroShare resource.



In [ ]:

    
#execute this cell to list the content of the directory
!ls -lt

Create list of files to save to HydroShare. Verify location and names.



In [ ]:

    
!tar -zcf {climate2013_tar} livneh2013
!tar -zcf {climate2015_tar} livneh2015
!tar -zcf {wrf_tar} salathe2014



In [ ]:

    
ThisNotebook='Observatory_Sauk_TreatGeoSelf.ipynb' #check name for consistency
climate2013_tar = 'livneh2013.tar.gz'
climate2015_tar = 'livneh2015.tar.gz'
wrf_tar = 'salathe2014.tar.gz'
mappingfile = 'Sauk_mappingfile.csv'

files=[ThisNotebook, mappingfile, climate2013_tar, climate2015_tar, wrf_tar]



In [ ]:

    
# for each file downloaded onto the server folder, move to a new HydroShare Generic Resource
title = 'Results from testing out the TreatGeoSelf utility'
abstract = 'This the output from the TreatGeoSelf utility integration notebook.'
keywords = ['Sauk', 'climate', 'Landlab','hydromet','watershed'] 
rtype = 'genericresource'  

# create the new resource
resource_id = hs.createHydroShareResource(abstract, 
                                          title,
                                          keywords=keywords, 
                                          resource_type=rtype, 
                                          content_files=files, 
                                          public=False)