In [1]:
%pylab inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from imp import reload
pd.set_option('display.max_rows', 370)
pd.set_option('display.max_columns', 90)
pd.set_option('display.width', 200)
In [2]:
import hypsometry
reload(hypsometry)
filename = "modscag_gf_by_year/v01/IN_Hunza_at_Danyour.0100m.modscag_gf_covered_area_by_elev.day.2001to2001.v2.asc"
v01 = hypsometry.Hypsometry()
v01.read( filename, verbose=True )
In [4]:
v01.data.drop?
In [ ]:
import hypsometry
reload(hypsometry)
filename = "modscag_gf_by_year/v02/IN_Hunza_at_Danyour.0100m.modscag_gf_covered_area_by_elev.day.2001to2001.txt"
v02 = hypsometry.Hypsometry()
v02.read( filename, verbose=True )
In [ ]:
fig, ax = plt.subplots(2,1)
v01.data.ix['2001-06-01'].plot( ax=ax[0], title='SCAv01', kind='barh' )
v02.data.ix['2001-06-01'].plot( ax=ax[1], title='SCAv02', kind='barh' )
In [ ]:
diff = v02.data.ix['2001-01-01'].values - v01.data.ix['2001-01-01'].values
fig, ax = plt.subplots(1,1)
plt.plot( diff )
#diff
In [ ]:
diff = v02.data['3400.0'].values - v01.data['3400.'].values
fig, ax = plt.subplots(1,1)
plt.plot(diff)
We have defined a standard CHARIS hypsometry file that we use for inputs and outputs on the CHARIS melt models. It is a simple ASCII format, that is human-readable. This notebook walks through the steps to read a snow-cover hypsometry file into python, and examine the data in the file.
In [ ]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
My hypsometry data format consists of:
1) 0 or more comment lines (beginning with #)
2) first data line reports how many elevation bands in the file
3) second data line reports the lower bound of each elevation band (in meters)
4) remaining lines of file contain:
yyyy mm dd doy {list of data values, one for each elevation band}
Here is the beginning of the file we will be reading:
Read the file elevation information first:
Count and skip comments.
Read the first 2 lines of data to get the elevations to use for headers:
In [ ]:
filename = 'test.sca_by_elev.txt'
import re
lines = open( filename ).readlines()
num_comments = 0
regex_leading_comment = re.compile( r'#' )
for line in lines:
part = regex_leading_comment.match( line )
if None == part:
break
else:
num_comments += 1
print str( num_comments ) + " comments found"
print "Next lines are:"
print lines[ num_comments ]
print lines[ num_comments + 1 ]
So I have the number of elevation bands, and one long string with the list of bottom elevations at each band. Next, I want to split this long string into a list of individual strings that I will eventually use for column headers:
In [ ]:
el_labels = lines[ num_comments + 1 ].split()
#el_labels
I add column headers for the date part of each record:
In [ ]:
names = [ 'yyyy', 'mm', 'dd', 'doy' ] + el_labels
names
Now read the data part of the file into a DataFrame, and give the reader the names of the columns to use:
N.B. Use header=None in order to not use anything in the file for the header, and pass in names array for column headers.
N.B. Tell it to skip the comments and the two leading rows before reading real data.
In [ ]:
df = pd.read_table( filename, sep="\s+", skiprows=num_comments+2, header=None, names = names, index_col='doy' )
df
Figure out how many lines of comments and header junk there are, and slice the DataFrame for just the data part.
In [ ]:
print df[names[50:]].head()
Retrieve specific rows by using the ix indexing field:
In [ ]:
df.ix[4]
In [ ]:
df.columns
In [ ]:
type( df['yyyy'] )
In [ ]:
df['yyyy'] # equivalent to df.yyyy
In [ ]:
new = df.ix[4:]
new
In [ ]:
new.reindex(np.arange(365))
In [ ]: