CZOData Example 1: Read and plot CZO_DisplayFile_v1 with Pandas. Read from a "CZO Display File v1", convert to Pandas DataFrame and plot time series. Writen by Anthony Aufdenkampe, Friday Dec. 13, 2013.


In [4]:
# Import all required Python libraries and modules
import pandas as pd
import matplotlib.pyplot as plt

In [5]:
# Create a list of file paths for the CZO Display Files (v1) to read
#   Examples here available at http://czo.stroudcenter.org/data/ or http://criticalzone.org/christina/data/

file_paths = ['/Users/aaufdenkampe/Documents/Python/EnviroDataScripts/CZODisplayParsePlot/ExampleData/CRB_WCC_STAGEFLOW_2011.csv', 
              '/Users/aaufdenkampe/Documents/Python/EnviroDataScripts/CZODisplayParsePlot/ExampleData/CRB_WCC_STAGEFLOW_2012.csv'
              ]

In [6]:
# A For loop that reads each file using the Pandas "read_csv" function, 
#   then appends the resulting DataFrame object to a list called "data_frames".

data_frames = []
for file_path in file_paths:
    df = pd.read_csv(file_path, header=0, skipinitialspace=True, skiprows=[1], index_col=0, na_values=[-9999], parse_dates=True)
    data_frames.append(df)

In [7]:
# Concatenate all the DataFrames in the "data_frames" list into a single DataFrame
df = pd.concat(data_frames)

In [8]:
df


Out[8]:
Gage Height (ft) from Continuous record Gage Height (ft) Discharge (cfs) from Continuous record Discharge (cfs)
Date Time (EST)
2011-01-01 00:00:00 3.056 NaN 2.9 NaN
2011-01-01 00:15:00 3.056 NaN 2.9 NaN
2011-01-01 00:30:00 3.055 NaN 2.89 NaN
2011-01-01 00:45:00 3.055 NaN 2.88 NaN
2011-01-01 01:00:00 3.057 NaN 2.91 NaN
2011-01-01 01:15:00 3.057 NaN 2.91 NaN
2011-01-01 01:30:00 3.056 NaN 2.91 NaN
2011-01-01 01:45:00 3.055 NaN 2.89 NaN
2011-01-01 02:00:00 3.054 NaN 2.88 NaN
2011-01-01 02:15:00 3.054 NaN 2.87 NaN
2011-01-01 02:30:00 3.054 NaN 2.87 NaN
2011-01-01 02:45:00 3.056 NaN 2.9 NaN
2011-01-01 03:00:00 3.055 NaN 2.89 NaN
2011-01-01 03:15:00 3.054 NaN 2.88 NaN
2011-01-01 03:30:00 3.055 NaN 2.89 NaN
2011-01-01 03:45:00 3.055 NaN 2.89 NaN
2011-01-01 04:00:00 3.055 NaN 2.89 NaN
2011-01-01 04:15:00 3.056 NaN 2.9 NaN
2011-01-01 04:30:00 3.054 NaN 2.88 NaN
2011-01-01 04:45:00 3.053 NaN 2.86 NaN
2011-01-01 05:00:00 3.054 NaN 2.87 NaN
2011-01-01 05:15:00 3.053 NaN 2.85 NaN
2011-01-01 05:30:00 3.055 NaN 2.88 NaN
2011-01-01 05:45:00 3.053 NaN 2.86 NaN
2011-01-01 06:00:00 3.054 NaN 2.87 NaN
2011-01-01 06:15:00 3.054 NaN 2.87 NaN
2011-01-01 06:30:00 3.054 NaN 2.88 NaN
2011-01-01 06:45:00 3.053 NaN 2.86 NaN
2011-01-01 07:00:00 3.054 NaN 2.88 NaN
2011-01-01 07:15:00 3.054 NaN 2.87 NaN
... ... ... ... ...
2012-12-31 21:30:00 3.034 NaN 1.95 NaN
2012-12-31 21:35:00 3.034 NaN 1.96 NaN
2012-12-31 21:40:00 3.035 NaN 1.97 NaN
2012-12-31 21:45:00 3.033 NaN 1.95 NaN
2012-12-31 21:50:00 3.034 NaN 1.95 NaN
2012-12-31 21:55:00 3.036 NaN 1.97 NaN
2012-12-31 22:00:00 3.034 NaN 1.95 NaN
2012-12-31 22:05:00 3.033 NaN 1.94 NaN
2012-12-31 22:10:00 3.034 NaN 1.95 NaN
2012-12-31 22:15:00 3.032 NaN 1.94 NaN
2012-12-31 22:20:00 3.033 NaN 1.94 NaN
2012-12-31 22:25:00 3.034 NaN 1.96 NaN
2012-12-31 22:30:00 3.034 NaN 1.96 NaN
2012-12-31 22:35:00 3.033 NaN 1.95 NaN
2012-12-31 22:40:00 3.033 NaN 1.94 NaN
2012-12-31 22:45:00 3.033 NaN 1.94 NaN
2012-12-31 22:50:00 3.033 NaN 1.94 NaN
2012-12-31 22:55:00 3.033 NaN 1.94 NaN
2012-12-31 23:00:00 3.033 NaN 1.95 NaN
2012-12-31 23:05:00 3.034 NaN 1.95 NaN
2012-12-31 23:10:00 3.033 NaN 1.94 NaN
2012-12-31 23:15:00 3.033 NaN 1.94 NaN
2012-12-31 23:20:00 3.033 NaN 1.94 NaN
2012-12-31 23:25:00 3.036 NaN 1.98 NaN
2012-12-31 23:30:00 3.033 NaN 1.94 NaN
2012-12-31 23:35:00 3.033 NaN 1.94 NaN
2012-12-31 23:40:00 3.033 NaN 1.94 NaN
2012-12-31 23:45:00 3.033 NaN 1.94 NaN
2012-12-31 23:50:00 3.033 NaN 1.94 NaN
2012-12-31 23:55:00 3.034 NaN 1.95 NaN

199843 rows × 4 columns


In [9]:
df.index


Out[9]:
DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 00:15:00',
               '2011-01-01 00:30:00', '2011-01-01 00:45:00',
               '2011-01-01 01:00:00', '2011-01-01 01:15:00',
               '2011-01-01 01:30:00', '2011-01-01 01:45:00',
               '2011-01-01 02:00:00', '2011-01-01 02:15:00',
               ...
               '2012-12-31 23:10:00', '2012-12-31 23:15:00',
               '2012-12-31 23:20:00', '2012-12-31 23:25:00',
               '2012-12-31 23:30:00', '2012-12-31 23:35:00',
               '2012-12-31 23:40:00', '2012-12-31 23:45:00',
               '2012-12-31 23:50:00', '2012-12-31 23:55:00'],
              dtype='datetime64[ns]', name=u'Date Time (EST)', length=199843, freq=None)

In [10]:
df.index = df.index.tz_localize('EST')

In [11]:
df.index


Out[11]:
DatetimeIndex(['2011-01-01 00:00:00-05:00', '2011-01-01 00:15:00-05:00',
               '2011-01-01 00:30:00-05:00', '2011-01-01 00:45:00-05:00',
               '2011-01-01 01:00:00-05:00', '2011-01-01 01:15:00-05:00',
               '2011-01-01 01:30:00-05:00', '2011-01-01 01:45:00-05:00',
               '2011-01-01 02:00:00-05:00', '2011-01-01 02:15:00-05:00',
               ...
               '2012-12-31 23:10:00-05:00', '2012-12-31 23:15:00-05:00',
               '2012-12-31 23:20:00-05:00', '2012-12-31 23:25:00-05:00',
               '2012-12-31 23:30:00-05:00', '2012-12-31 23:35:00-05:00',
               '2012-12-31 23:40:00-05:00', '2012-12-31 23:45:00-05:00',
               '2012-12-31 23:50:00-05:00', '2012-12-31 23:55:00-05:00'],
              dtype='datetime64[ns, EST]', name=u'Date Time (EST)', length=199843, freq=None)

In [12]:
df.dtypes


Out[12]:
Gage Height (ft) from Continuous record     object
Gage Height (ft)                           float64
Discharge (cfs) from Continuous record      object
Discharge (cfs)                            float64
dtype: object

In [13]:
df['Gage Height (ft) from Continuous record'] = pd.to_numeric(df['Gage Height (ft) from Continuous record'], errors='coerce')
df['Discharge (cfs) from Continuous record'] = pd.to_numeric(df['Discharge (cfs) from Continuous record'], errors='coerce')

In [14]:
df.dtypes


Out[14]:
Gage Height (ft) from Continuous record    float64
Gage Height (ft)                           float64
Discharge (cfs) from Continuous record     float64
Discharge (cfs)                            float64
dtype: object

In [15]:
df.head(n=5)


Out[15]:
Gage Height (ft) from Continuous record Gage Height (ft) Discharge (cfs) from Continuous record Discharge (cfs)
Date Time (EST)
2011-01-01 00:00:00-05:00 3.056 NaN 2.90 NaN
2011-01-01 00:15:00-05:00 3.056 NaN 2.90 NaN
2011-01-01 00:30:00-05:00 3.055 NaN 2.89 NaN
2011-01-01 00:45:00-05:00 3.055 NaN 2.88 NaN
2011-01-01 01:00:00-05:00 3.057 NaN 2.91 NaN

In [16]:
df.columns


Out[16]:
Index([u'Gage Height (ft) from Continuous record', u'Gage Height (ft)',
       u'Discharge (cfs) from Continuous record', u'Discharge (cfs)'],
      dtype='object')

In [17]:
%matplotlib inline

In [24]:
df.plot()


Out[24]:
<matplotlib.axes._subplots.AxesSubplot at 0x122fea850>

In [35]:
ax = df['Discharge (cfs) from Continuous record'].plot(title=file_path, style='b', logy=True, ylim=(1,1000), legend=True)
ax.set_ylabel(u'Discharge (cfs) from Continuous record', color='b')

ax2 = df['Gage Height (ft) from Continuous record'].plot(secondary_y=True, style='g', legend=True)
ax2.set_ylabel(u'Gage Height (ft) from Continuous record', color='g')


Out[35]:
Text(0,0.5,u'Gage Height (ft) from Continuous record')

Export to HydroShare Time Series compatible CSV

The data frame needs to be conversted to a csv format that can be read into HydyroShare. See https://help.hydroshare.org/hydroshare-resource-types/time-series/understanding-what-file-types-can-be-uploaded-into-a-time-series-resource/


In [68]:
df_export = df[['Gage Height (ft) from Continuous record', 'Discharge (cfs) from Continuous record']]
df_export.index.names = ['ValueDateTime']

In [69]:
df_export = df_export.rename(index=str, columns={"Discharge (cfs) from Continuous record": "discharge_cfs", "Gage Height (ft) from Continuous record": "stage_ft"})

In [74]:
df_export.head(n=5)


Out[74]:
stage_ft discharge_cfs
ValueDateTime
2011-01-01 00:00:00-05:00 3.056 2.90
2011-01-01 00:15:00-05:00 3.056 2.90
2011-01-01 00:30:00-05:00 3.055 2.89
2011-01-01 00:45:00-05:00 3.055 2.88
2011-01-01 01:00:00-05:00 3.057 2.91

In [75]:
df_export.to_csv('/Users/aaufdenkampe/Documents/Python/EnviroDataScripts/CZODisplayParsePlot/ExampleData/CRB_WCC_STAGEFLOW_from_df.csv')

In [ ]:
### Unfortunately, I can't get the CSV file to automatically parse into a HydroShare Time Series resource!