In [1]:
%pylab inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from imp import reload
pd.set_option('display.max_rows', 370)
pd.set_option('display.max_columns', 90)
pd.set_option('display.width', 200)


Populating the interactive namespace from numpy and matplotlib

In [2]:
import hypsometry
reload(hypsometry)
filename = "modscag_gf_by_year/v01/IN_Hunza_at_Danyour.0100m.modscag_gf_covered_area_by_elev.day.2001to2001.v2.asc"
v01 = hypsometry.Hypsometry()
v01.read( filename, verbose=True )


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-a3dbb3633e8f> in <module>()
      3 filename = "modscag_gf_by_year/v01/IN_Hunza_at_Danyour.0100m.modscag_gf_covered_area_by_elev.day.2001to2001.v2.asc"
      4 v01 = hypsometry.Hypsometry()
----> 5 v01.read( filename, verbose=True )

/Users/brodzik/charis/ipynb/hypsometry.pyc in read(self, filename, verbose)
     94 
     95         self.data.set_index( [ 'Date' ], inplace=True )
---> 96         self.data.drop( [ 'yyyy', 'mm', 'dd' ], axis=1, inplace=True )
     97 
     98         if verbose:

TypeError: drop() got an unexpected keyword argument 'inplace'

In [4]:
v01.data.drop?

In [ ]:
import hypsometry
reload(hypsometry)
filename = "modscag_gf_by_year/v02/IN_Hunza_at_Danyour.0100m.modscag_gf_covered_area_by_elev.day.2001to2001.txt"
v02 = hypsometry.Hypsometry()
v02.read( filename, verbose=True )

In [ ]:
fig, ax = plt.subplots(2,1)
v01.data.ix['2001-06-01'].plot( ax=ax[0], title='SCAv01', kind='barh' )
v02.data.ix['2001-06-01'].plot( ax=ax[1], title='SCAv02', kind='barh' )

In [ ]:
diff = v02.data.ix['2001-01-01'].values - v01.data.ix['2001-01-01'].values
fig, ax = plt.subplots(1,1)
plt.plot( diff )
#diff

In [ ]:
diff = v02.data['3400.0'].values - v01.data['3400.'].values
fig, ax = plt.subplots(1,1)
plt.plot(diff)

Reading a CHARIS hypsometry file

We have defined a standard CHARIS hypsometry file that we use for inputs and outputs on the CHARIS melt models. It is a simple ASCII format, that is human-readable. This notebook walks through the steps to read a snow-cover hypsometry file into python, and examine the data in the file.


In [ ]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

My hypsometry data format consists of:

1) 0 or more comment lines (beginning with #)

2) first data line reports how many elevation bands in the file

3) second data line reports the lower bound of each elevation band (in meters)

4) remaining lines of file contain:

yyyy mm dd doy {list of data values, one for each elevation band}

Here is the beginning of the file we will be reading:

# MODSCAG_GF SCA by elevation # elevation values are meters, at bottom of elevation band # data values are areas, in square km 63 1400.0 1500.0 1600.0 1700.0 1800.0 1900.0 2000.0 2100.0 2200.0 2300.0 2400.0 2500.0 2600.0 2700.0 2800.0 2900.0 3000.0 3100.0 3200.0 3300.0 3400.0 3500.0 3600.0 3700.0 3800.0 3900.0 4000.0 4100.0 4200.0 4300.0 4400.0 4500.0 4600.0 4700.0 4800.0 4900.0 5000.0 5100.0 5200.0 5300.0 5400.0 5500.0 5600.0 5700.0 5800.0 5900.0 6000.0 6100.0 6200.0 6300.0 6400.0 6500.0 6600.0 6700.0 6800.0 6900.0 7000.0 7100.0 7200.0 7300.0 7400.0 7500.0 7600.0 2001 1 1 1 0.000000 0.350881 0.288029 0.107759 0.723593 1.075053 1.479878 1.888137 2.819928 6.869486 11.475485 18.770580 23.539839 37.571785 45.258991 58.313545 75.618950 98.407509 109.481323 124.540115 139.901047 159.709427 183.575012 205.393692 223.748779 255.835587 271.589386 311.170837 343.559784 384.976685 403.775177 442.533600 468.865540 520.615601 511.643433 549.194824 527.820190 544.116089 520.179016 451.494202 394.106781 308.375214 237.771515 170.135086 115.261742 86.346596 69.249130 51.323139 41.376740 38.618759 34.043415 24.366207 24.095816 21.965036 14.542485 11.176612 12.601063 8.180685 5.567430 4.136129 3.105402 1.911621 0.488112 2001 1 2 2 0.017387 1.454463 1.212285 0.466432 1.221172 1.763915 2.692399 3.270561 3.555070 7.739756 12.009012 19.845352 23.796310 37.705723 45.120823 57.757401 74.229279 96.657829 106.962242 122.813690 137.407364 156.885284 179.434433 200.203598 218.428314 249.800186 264.601166 304.201996 334.951447 374.930878 392.742737 430.429871 454.205292 503.627411 494.258118 529.328857 508.478699 524.046875 501.057251 436.041229 381.136139 298.980896 231.039047 165.357971 112.498474 84.321381 67.829727 50.306320 40.380802 37.814663 33.566395 23.861654 23.514660 21.417894 14.294745 10.944931 12.290499 8.058737 5.433397 4.107623 3.032999 1.839174 0.458876

Read the file elevation information first:

Count and skip comments.

Read the first 2 lines of data to get the elevations to use for headers:


In [ ]:
filename = 'test.sca_by_elev.txt'
import re
lines = open( filename ).readlines()
num_comments = 0
regex_leading_comment = re.compile( r'#' )
for line in lines:
    part = regex_leading_comment.match( line )
    if None == part:
        break
    else:
        num_comments += 1
print str( num_comments ) + " comments found"
print "Next lines are:"
print lines[ num_comments ]
print lines[ num_comments + 1 ]

So I have the number of elevation bands, and one long string with the list of bottom elevations at each band. Next, I want to split this long string into a list of individual strings that I will eventually use for column headers:


In [ ]:
el_labels = lines[ num_comments + 1 ].split()
#el_labels

I add column headers for the date part of each record:


In [ ]:
names = [ 'yyyy', 'mm', 'dd', 'doy' ] + el_labels
names

Now read the data part of the file into a DataFrame, and give the reader the names of the columns to use:

N.B. Use header=None in order to not use anything in the file for the header, and pass in names array for column headers.

N.B. Tell it to skip the comments and the two leading rows before reading real data.


In [ ]:
df = pd.read_table( filename, sep="\s+", skiprows=num_comments+2, header=None, names = names, index_col='doy' )
df

Figure out how many lines of comments and header junk there are, and slice the DataFrame for just the data part.


In [ ]:
print df[names[50:]].head()

Retrieve specific rows by using the ix indexing field:


In [ ]:
df.ix[4]

In [ ]:
df.columns

In [ ]:
type( df['yyyy'] )

In [ ]:
df['yyyy'] # equivalent to df.yyyy

In [ ]:
new = df.ix[4:]
new

In [ ]:
new.reindex(np.arange(365))

In [ ]: