In [ ]:

Import water supply data & create supply table

Here we download the raw supply data for a given years from from downscaled CMIP5 hydrology projections (link).

These data include monthly estimates of runoff at a 1/8th degree spatial resolution across the US for the period of 1950 to 2099. Estimates are provided for 21 different climate projection ensembles applied to the Variable Infiltration Capacity (VIC) Macroscale Hydrologic Model (link); see the PDF document for a complete list. For demonstration purposes, this project uses the National Center for Atmospheric Research CCSM4 2.6 projection ensembles as the base data for water supply figures.

The steps involved include:

  • Download monthly runoff (total_runoff), in NetCDF format, from a central data repository (link) for a given sample year (e.g. 2000).
  • For the year:

    • Extract the monthly runoff data from the downloaded NetCDF files into a 4-dimensional NumPy array (time, parameter value, latitude, longitude).

    • Collapse the time dimension (months) into annual sums, resulting in a 3-dimensional array for each parameter, i.e. a single annual value for each 1/8th degree coordinate pair: rows = latitudes, columns = longitudes.

    • Re-lable columns as longitude values and insert a column of latitude values. Then melt the table into a listing of lat, long, and value.

    • Spatially join state FIPS codes to the data frame, using a county shapefile stored in the data folder.

  • Summarize supply values on FIPS to create a table that can be joined to other county level data:
YEAR FIPS Runoff
2000 01001 0

In [2]:
#Import libraries
import sys, os, glob, time, datetime, urllib
import numpy as np
import pandas as pd
import geopandas
import netCDF4

Set inputs and outputs

The requiered inputs include the year to process, and the location of the county feature class (used to sample and summarize the 1/8th degree data to the county level)


In [3]:
#Year to run
year = 2000

#Set filename locations
countyFN = '../Data/cb_2016_us_county_5m.shp'

#Output file locations
tidyFN = '../Data/SupplyTableTidy.csv'
outputFN = '../Data/SupplyTable.csv'

The code blocks below pulls the runoff data from the CMIP5 data ftp server as individual NetCDF4 (nc) files. Each nc file stores monthly values (n=12) across a 1/8th degree geographic grid (463 x 222). Monthly values are summed to create a data frame ('dfParam') where columns represent longitude, rows represent latitude, and the value is the runoff in mm/year. This data frame, in turn, is melted to generate a new data frame ('df') listing lat/long pairs and the parameter value associated at that location.

After each parameter data frame is created, they are joined together to create a listing of lat/long pairs, followed by mm/year of runoff, precipitation, evapotranspiration, and soil moisutre content, respectively for


In [ ]:
#Get URLs for NCAR 2.6 scenario ensembles: runoff(ro), precipitation(pr), evapotranspiration(et), soil moisture (sm)
baseURL = 'ftp://gdo-dcp.ucllnl.org/pub/dcp/archive/cmip5/hydro/BCSD_mon_VIC_nc/ccsm4_rcp26_r1i1p1/'
roURL = baseURL + "conus_c5.ccsm4_rcp26_r1i1p1.monthly.total_runoff.{}.nc".format(year)

In [ ]:
#Save as a dictionary (for clearer scripting)
paramDict = {'runoff': roURL, 'precip': prURL, 'et':etURL, 'soil':smURL}

In [5]:
#These lines fix an issue with slow network connections
import socket
socket.setdefaulttimeout(30)

#Loop through each file and create an annual sum array; add it to a dictionary
for param, url in paramDict.items():
    print "->Downloading {} data for year {}".format(param, year),

    #Retrieve the data file from the ftp server
    urllib.urlretrieve(url,"tmpData.nc")

    #Convert to netCDF object
    nc = netCDF4.Dataset("tmpData.nc",mode='r')

    #Get the parameter name and its values
    param_name = nc.variables.keys()[-1]
    param_vals = nc.variables.values()[-1]

    #Collapse the monthly values into a single 3d data frame
    dfParam = pd.DataFrame(param_vals[:,:,:].sum(axis=0))

    #Create latitude and longitude arrays (for the first element only)
    if url == roURL:
        dfLats = pd.DataFrame(nc.variables['latitude'][:])
        dfLons = pd.DataFrame(nc.variables['longitude'][:])

    #Close the nc object
    nc.close()

    #Delete the nc file
    os.remove("tmpData.nc")

    #Update
    urllib.urlcleanup()
    print "  ...**complete!**"

    #Melt the data into a 3 column, 2d data frame
    '''At this point, the dfParam data frame contains columns for each 1/8d longitude
       and rows for each 1/8d of latitude. The 'melt' procedure below collapses this into
       a three column table of lat,lon,value for the current parameter (e.g. runoff)
    '''
    dfParam.columns = dfLons[0].values.tolist() #Set column names to longitudes
    dfParam['LAT'] = dfLats[0].values.tolist()  #Add column of longitudes
    df = pd.melt(dfParam,id_vars=['LAT'],var_name='LON',value_name=param_name)

    #Append to dataframe, if not the first element
    '''Each attribute will become its own column in the final dataframe. So after assembling
       the first dataframe (runoff), we can just append the others to it. We also join the FIPS
       codes (created in a previous script) to the runoff dataframe so we can summarize by states
       or counties later. 
    '''
    if param == 'runoff': #If its the first in the series of runoff, precip, et, soil moisture...
        #Copy to the year dataframe
        dfYear = df.copy(deep=True)
    else:
        #Add column to the year data frame
        dfYear[param_name] = df[param_name]

#Add year to the master data frame
dfYear.insert(1,'YEAR',year)

#Return the dataframe
return dfYear

In [4]:
#Function for geocoding FIPS using geopandas
def addFIPS(df,countyFN):
    '''This function uses geopandas to convert lat/lon columns into a point object and then
       spatially join this point object with a shapefile of counties to extract the FIPS code
       for each location in the dataframe.
    '''
    #Import modules; NOTE this function requires geopandas and shapely
    import geopandas as gpd
    from geopandas import GeoDataFrame, read_file
    from geopandas.tools import sjoin
    from shapely.geometry import Point, mapping, shape
    
    #Add a geometry field to the data frame, setting the value as a new Shapely point object
    df['geometry'] = df.apply(lambda z: Point(z.LON, z.LAT), axis=1)
    
    #Create a geopandas dataframe from the dataframe created above
    gdfPoints = gpd.GeoDataFrame(df)
    
    #Create a geopandas dataframe from the counties file
    gdfPolygons = gpd.GeoDataFrame.from_file(countyFN)
    
    #Set the coordinate system of the points equal to the polygons
    gdfPoints.crs = gdfPolygons.crs
    
    #Execute the spatial join of polygon attributes to the point objects
    dfMerged=sjoin(gdfPoints, gdfPolygons, how='left', op='within')
    
    #Compute total area from the area of land and water
    dfMerged['Area'] = dfMerged.ALAND + dfMerged.AWATER
    
    #Drop unneeded columns
    dfMerged.drop(['geometry','index_right','ALAND','AWATER','AFFGEOID','COUNTYFP','COUNTYNS','LSAD'],axis=1,inplace=True)
    
    #Drop rows with no data (usually falling outside the US)
    dfMerged.dropna(inplace=True)
    
    #Return the merged dataframe
    return dfMerged

In [6]:
#Retrieve data frames for each sample year, using the function above
df2000 = getSupplyData(2000)
df2005 = getSupplyData(2005)
df2010 = getSupplyData(2010)


->Downloading runoff data for year 2000   ...**complete!**
->Downloading et data for year 2000   ...**complete!**
->Downloading precip data for year 2000   ...**complete!**
->Downloading soil data for year 2000   ...**complete!**
->Downloading runoff data for year 2005   ...**complete!**
->Downloading et data for year 2005   ...**complete!**
->Downloading precip data for year 2005   ...**complete!**
->Downloading soil data for year 2005   ...**complete!**
->Downloading runoff data for year 2010   ...**complete!**
->Downloading et data for year 2010   ...**complete!**
->Downloading precip data for year 2010   ...**complete!**
->Downloading soil data for year 2010   ...**complete!**

In [ ]:
#Use the addFIPS function to add FIPS values to the table
df2000 = addFIPS(df2000,countyFN)
df2005 = addFIPS(df2005,countyFN) 
df2010 = addFIPS(df2010,countyFN)

In [ ]:
#Concatenate the tables
dfAllYears = pd.concat([df2000,df2005,df2010],ignore_index=True)

In [ ]:
#Rename FIPS columns
dfAllYears.rename(columns={'GEOID':'FIPS','STATEFP':'STATEFIPS'},inplace=True)
dfAllYears.to_csv(tidyFN,index=False,encoding='utf8')

In [ ]:
#Compute mean values for each county
groupDict = {'total_runoff':['count','sum']}
dfCounty = dfAllYears.groupby(('YEAR','STATEFIPS','FIPS','Area'))['total_runoff','pr','et','smc'].mean()

#Convert indexes back to columns
dfCounty.reset_index(inplace=True)

#Convert mm/Year * county area (m2) into MGal/day - to match use
'''m = [mm] / 1000; m * [m2] = m3; m3 / 3785 = MGal'''
for param in ('total_runoff','pr','et','smc'):
    dfCounty[param] = (dfCounty[param] / 1000.0) * dfCounty.Area / 3785.0 / 365.0

In [ ]:
#Compute supply as precip - evapotranspiration
dfCounty['Supply'] = dfCounty.pr - dfCounty.et

In [ ]:
#Remove Area field
dfCounty.drop(['Area'],axis=1,inplace=True)

In [ ]:
#Write the table to the file
dfCounty.to_csv(outputFN,encoding='utf8',index=False)

In [ ]:
dfCounty.head()

In [ ]:
#Summarize by state
dfState = dfCounty.groupby(('YEAR','STATEFIPS'))['Supply'].sum()

In [ ]:
stateSummary = dfState.unstack().T
stateSummary.to_csv('../../Data/StateSupply.csv')

In [ ]:
import us

In [ ]:
stateSummary.reset_index(inplace=True)
stateSummary['MGALperDay'] = stateSummary['STATEFIPS'].apply(lambda x: us.states.lookup(x))

In [ ]:
stateSummary[['MGALperDay',2000,2005,2010]].to_csv('../../Data/StateSupply.csv',index=False)