Learning Python



In [1]:

    
import numpy as np
import pandas as pd
import matplotlib
import scipy
import scipy.stats
import json
import requests
from pprint import pprint
from ggplot import *
import csv

Strings



In [2]:

    
"string "*4









    Out[2]:





'string string string string '



In [3]:

    
'String'[4]









    Out[3]:





'n'



In [4]:

    
len('strings')









    Out[4]:





7



In [5]:

    
'Search String'.find('r')









    Out[5]:





3



In [6]:

    
'Search String'.find('S', 2)









    Out[6]:





7



In [7]:

    
'Search String'.find('S', 8)









    Out[7]:





-1



In [8]:

    
'String'[1:4]









    Out[8]:





'tri'



In [9]:

    
'String'.split('i')









    Out[9]:





['Str', 'ng']



In [10]:

    
ord('b')









    Out[10]:





98



In [11]:

    
chr(98)









    Out[11]:





'b'



In [12]:

    
str(45)









    Out[12]:





'45'

Lists

We can create a list by writing range(start, stop, step)



In [13]:

    
range(0, 10, 3)









    Out[13]:





[0, 3, 6, 9]



In [14]:

    
range(0, -10, -1)









    Out[14]:





[0, -1, -2, -3, -4, -5, -6, -7, -8, -9]



In [15]:

    
range(1, 0)









    Out[15]:





[]



In [16]:

    
range(10)









    Out[16]:





[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]



In [17]:

    
range(1, 11)









    Out[17]:





[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]



In [18]:

    
range(10)









    Out[18]:





[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

You can modify a list.



In [19]:

    
list = [1, 2, 3, 4, 5, 6]
list[2] = 33
list









    Out[19]:





[1, 2, 33, 4, 5, 6]



In [20]:

    
list = [1, 2, 3, 4, 5, 6]
list1 = list[2:6] 
list1









    Out[20]:





[3, 4, 5, 6]



In [21]:

    
list = [1, 2, 3, 4, 5, 6]
list.append('element')
list









    Out[21]:





[1, 2, 3, 4, 5, 6, 'element']



In [22]:

    
list1 = [1, 2, 3]
list2 = [4, 5]
list1+list2









    Out[22]:





[1, 2, 3, 4, 5]



In [23]:

    
list = [1, 2, 3, 4, 'x', 'a']
list.pop(4)









    Out[23]:





'x'



In [24]:

    
list = [1, 2, 3, 4, 'x', 'a']
list.pop()









    Out[24]:





'a'



In [25]:

    
[].pop()









    



---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-25-818409a8e42d> in <module>()
----> 1 [].pop()

IndexError: pop from empty list



In [26]:

    
list = [1, 2, 3, 4, 'x', 'a']
list.index('x')









    Out[26]:





4



In [27]:

    
list1 = [1, 2, 3, 4, 'x', 'a']
'a' not in list1, 10 not in list1









    Out[27]:





(False, True)

Dictionaries

A Dictionary provides a mapping between keys, which can be values of any immutable type, and values, which can be any value. Because a Dictionary is implemented using a hash table, the time to lookup a value does not increase (significantly) even when the number of keys increases.

Constructing a Dictionary. A Dictionary is a set of zero or more key-value pairs, surrounded by squiggly braces:



In [28]:

    
x = { 'key1': 'apple', 'key2': 'banana', 'key3': 'pear'}
x['key3']









    Out[28]:





'pear'



In [29]:

    
x = { 'key1': 'apple', 'key2': 'banana', 'key3': 'pear'}
'key4' in x, 'key2' in x









    Out[29]:





(False, True)



In [30]:

    
import time
time.clock()









    Out[30]:





0.941415



In [31]:

    
?except









    



Object `except` not found.

Pandas dataframe

It extends the Python's basic data structures and allows better memory management for data analysis. Its more useful than R data.frame data structure. Also Series(). Check this link to understand Pandas datastructures better: http://pandas.pydata.org/pandas-docs/stable/dsintro.html



In [32]:

    
import pandas as pd
records = pd.read_csv('turnstile_data_master_with_weather.csv')
frame = pd.DataFrame(records)
frame[:2]









    Out[32]:






  
    
      
      Unnamed: 0
      UNIT
      DATEn
      TIMEn
      Hour
      DESCn
      ENTRIESn_hourly
      EXITSn_hourly
      maxpressurei
      maxdewpti
      ...
      meandewpti
      meanpressurei
      fog
      rain
      meanwindspdi
      mintempi
      meantempi
      maxtempi
      precipi
      thunder
    
  
  
    
      0
      0
      R001
      2011-05-01
      01:00:00
      1
      REGULAR
      0
      0
      30.31
      42
      ...
      39
      30.27
      0
      0
      5
      50
      60
      69
      0
      0
    
    
      1
      1
      R001
      2011-05-01
      05:00:00
      5
      REGULAR
      217
      553
      30.31
      42
      ...
      39
      30.27
      0
      0
      5
      50
      60
      69
      0
      0
    
  

2 rows × 22 columns



In [33]:

    
frame[['Hour', 'UNIT']][:2]



In [34]:

    
import statsmodels.api as sm
X = frame[['Hour']]
X = sm.add_constant(X)
X    
model = sm.OLS(frame['ENTRIESn_hourly'], X)
results = model.fit()
b0, b1 = results.params[0], results.params[1] 
b0, b1
prediction = b0 + frame['Hour']*b1 
prediction[:2]









    Out[34]:





0    506.663932
1    744.608606
Name: Hour, dtype: float64



In [35]:

    
print('maximum per hour =', frame['ENTRIESn_hourly'].max())
print('minimum per hour =', frame['ENTRIESn_hourly'].min())
print('average per hour =', frame['ENTRIESn_hourly'].mean())
print('median per hour =', frame['ENTRIESn_hourly'].median())









    



('maximum per hour =', 51839.0)
('minimum per hour =', 0.0)
('average per hour =', 1095.3484778440481)
('median per hour =', 279.0)

Below I will try to subset a pandas dataframe in many ways.



In [36]:

    
frame['ENTRIESn_hourly'][5:20] # To select one column in a dataframe with the range of index numbers.









    Out[36]:





5     3372
6        0
7       42
8       50
9      316
10     633
11     639
12       0
13       0
14       0
15       0
16       0
17       0
18       0
19       0
Name: ENTRIESn_hourly, dtype: float64



In [37]:

    
UNIT_counts = frame['UNIT'].value_counts()
print(UNIT_counts[0:5])  # Note that they are listed in a descending order of count by default. 
                         # But the output is not a data frame itself.
UNIT_counts[0:10].mean()









    



R549    12198
R550     6881
R541     5922
R540     4420
R543     4146
dtype: int64






    Out[37]:





4579.4



In [38]:

    
clean_frame = frame['UNIT'].fillna('Missing') ## This is to assign a value to those cells in "unit" column that are missing.
clean_frame[clean_frame == ''] = 'Unknown'  ## To assign a value to those cells that have a value ''.

Now we are going to plot a graph.



In [39]:

    
%matplotlib inline  
## Without this line of code, the plot will not show up.
import matplotlib.pyplot as plt
UNIT_counts[:20].plot(kind='barh', rot=0)









    Out[39]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f52d4737bd0>



In [40]:

    
results = pd.Series([x.split('-')[2] for x in frame.DATEn.dropna()])
results[:6]









    Out[40]:





0    01
1    01
2    01
3    01
4    01
5    01
dtype: object



In [41]:

    
results = pd.Series([x.split('-')[2] for x in frame.DATEn.dropna()])
results[:6]









    Out[41]:





0    01
1    01
2    01
3    01
4    01
5    01
dtype: object



In [42]:

    
frame.DATEn.dropna()[:10]









    Out[42]:





0    2011-05-01
1    2011-05-01
2    2011-05-01
3    2011-05-01
4    2011-05-01
5    2011-05-01
6    2011-05-01
7    2011-05-01
8    2011-05-01
9    2011-05-01
Name: DATEn, dtype: object



In [43]:

    
frame.DATEn.max(), frame.DATEn.min() ## dataframe-dot-variablename is almost like a Series.









    Out[43]:





('2011-05-30', '2011-05-01')



In [44]:

    
len(frame[frame.DATEn.notnull()]), len(frame), len(frame.DATEn.dropna())









    Out[44]:





(131951, 131951, 131951)



In [45]:

    
x = np.where(frame['DATEn'].str.contains('11-05-15'), 'May', 'Not May')
x









    Out[45]:





array(['Not May', 'Not May', 'Not May', ..., 'Not May', 'Not May',
       'Not May'], 
      dtype='|S7')



In [46]:

    
frame.ix[0] ## This shows one of the entries (the first entry in this example) of the data frame.









    Out[46]:





Unnamed: 0                  0
UNIT                     R001
DATEn              2011-05-01
TIMEn                01:00:00
Hour                        1
DESCn                 REGULAR
ENTRIESn_hourly             0
EXITSn_hourly               0
maxpressurei            30.31
maxdewpti                  42
mindewpti                  35
minpressurei            30.23
meandewpti                 39
meanpressurei           30.27
fog                         0
rain                        0
meanwindspdi                5
mintempi                   50
meantempi                  60
maxtempi                   69
precipi                     0
thunder                     0
Name: 0, dtype: object



In [47]:

    
m = frame.pivot_table('meantempi', 'UNIT', 'rain', aggfunc = 'mean')
m[:5]



In [48]:

    
newtable = frame.groupby('UNIT').size() ## Here .size() counts the number of times a certain UNIT appeared in the dataframe.
## This can be used to find frequencies of each UNIT. The output is a Series and is not a data frame.
newtable[:5]









    Out[48]:





UNIT
R001    186
R002    183
R003    172
R004    176
R005    175
dtype: int64



In [49]:

    
y = frame.groupby(['ENTRIESn_hourly', x])
y
agg_counts = y.size().unstack().fillna(0)
agg_counts[:10]









    Out[49]:






  
    
      
      May
      Not May
    
    
      ENTRIESn_hourly
      
      
    
  
  
    
      0
      531
      12810
    
    
      1
      75
      1692
    
    
      2
      57
      1198
    
    
      3
      42
      978
    
    
      4
      35
      863
    
    
      5
      26
      787
    
    
      6
      27
      721
    
    
      7
      36
      664
    
    
      8
      29
      636
    
    
      9
      21
      571
    
    
      10
      18
      567



In [50]:

    
busyunits = frame.index[newtable>= 173]
mean_ratings = m.ix[busyunits]

Merging pandas data frames.

You can merge two files using the pandas function "merge". E.g. data = pd.merge(pd.merge(ratings, users), movies) . Here we are merging three data frames - ratings, users, and movies. Pandas will figure out how to merge them but the resulting data frame "data" will contain all the variables that each of the tables separately had.

CSV Reader/Writer Tutorial



In [51]:

    
import csv
# Create file input object f_in to work with in_data.csv file.
f_in = open('in_data.csv', 'r')
# Create file output object f_out to write to the new 'out_data.csv'
f_out = open('out_data.csv', 'w')

#Create csv readers and writers based on our file objects
reader_in = csv.reader(f_in, delimiter=',')
writer_out = csv.writer(f_out, delimiter=',')

# Skip the first line because it contains headers
reader_in.next()

for line in reader_in:
    type_chocolate = line[0]
    # For each line, the format will be:
    # type_choco, batch_id, cocoa, milk, sugar
    line_1=[type_chocolate, line[1], line[2], line[3], line[4]]
    line_2=[type_chocolate, line[5], line[6], line[7], line[8]]
    writer_out.writerow(line_1)
    writer_out.writerow(line_2)
    
f_in.close()
f_out.close()









    



---------------------------------------------------------------------------
IOError                                   Traceback (most recent call last)
<ipython-input-51-1ab8083b00a2> in <module>()
      1 import csv
      2 # Create file input object f_in to work with in_data.csv file.
----> 3 f_in = open('in_data.csv', 'r')
      4 # Create file output object f_out to write to the new 'out_data.csv'
      5 f_out = open('out_data.csv', 'w')

IOError: [Errno 2] No such file or directory: 'in_data.csv'

Reading data

You can open a remote file from a url in python by the following method:



In [52]:

    
from urllib2 import urlopen
url = 'http://www.ats.ucla.edu/stat/data/hsb2.csv'
fp = urlopen(url)
df = pd.read_table(url, delimiter =",")
df[:4]

Lets try to do some analysis. I want to load some csv file and do analysis. The file is located at '/media/anirban/Ubuntu/Downloads/turnstile_data_master_with_weather.csv' .



In [53]:

    
df = pd.read_csv('turnstile_data_master_with_weather.csv')

The code below does not work. Need to figure out how to generate a plot.



In [54]:

    
def entries_histogram(turnstile_weather):
    '''
    Before we perform any analysis, it might be useful to take a
    look at the data we're hoping to analyze. More specifically, let's 
    examine the hourly entries in our NYC subway data and determine what
    distribution the data follows. This data is stored in a dataframe
    called turnstile_weather under the ['ENTRIESn_hourly'] column.
    
    Let's plot two histograms on the same axes to show hourly
    entries when raining vs. when not raining. Here's an example on how
    to plot histograms with pandas and matplotlib:
    turnstile_weather['column_to_graph'].hist()
    
    Your histogram may look similar to bar graph in the instructor notes below.
    
    You can read a bit about using matplotlib and pandas to plot histograms here:
    http://pandas.pydata.org/pandas-docs/stable/visualization.html#histograms
    
    You can see the information contained within the turnstile weather data here:
    https://www.dropbox.com/s/meyki2wl9xfa7yk/turnstile_data_master_with_weather.csv
    '''
    
    plt.figure()
    pd.DataFrame.hist(turnstile_weather, column='ENTRIESn_hourly', by='rain', bins=100) # your code here to plot a historgram for hourly entries when it is raining
    #turnstile_weather['ENTRIESn_hourly'].hist() # your code here to plot a historgram for hourly entries when it is not raining
    return plt

print(entries_histogram(df))









    



<module 'matplotlib.pyplot' from '/home/anirban/anaconda/lib/python2.7/site-packages/matplotlib/pyplot.pyc'>






    





<matplotlib.figure.Figure at 0x7f52d4661410>



In [55]:

    
def mann_whitney_plus_means(turnstile_weather):
    '''
    This function will consume the turnstile_weather dataframe containing
    our final turnstile weather data. 
    
    You will want to take the means and run the Mann Whitney U-test on the 
    ENTRIESn_hourly column in the turnstile_weather dataframe.
    
    This function should return:
        1) the mean of entries with rain
        2) the mean of entries without rain
        3) the Mann-Whitney U-statistic and p-value comparing the number of entries
           with rain and the number of entries without rain
    
    You should feel free to use scipy's Mann-Whitney implementation, and you 
    might also find it useful to use numpy's mean function.
    
    Here are the functions' documentation:
    http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mannwhitneyu.html
    http://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html
    
    You can look at the final turnstile weather data at the link below:
    https://www.dropbox.com/s/meyki2wl9xfa7yk/turnstile_data_master_with_weather.csv
    '''
    
    ### YOUR CODE HERE ###
    turnstile_weather1 = turnstile_weather.loc[turnstile_weather['rain']==0]    
    turnstile_weather2 = turnstile_weather.loc[turnstile_weather['rain']==1]  
    with_rain_mean = turnstile_weather2['ENTRIESn_hourly'].mean()
    without_rain_mean = turnstile_weather1['ENTRIESn_hourly'].mean()
    U, p = scipy.stats.mannwhitneyu(turnstile_weather1['ENTRIESn_hourly'], turnstile_weather2['ENTRIESn_hourly'])
    return with_rain_mean, without_rain_mean, U, p

mann_whitney_plus_means(df)









    Out[55]:





(1105.4463767458733, 1090.278780151855, 1924409167.0, 0.024999912793489721)



In [56]:

    
import numpy as np
import pandas
from ggplot import *

"""
In this question, you need to:
1) implement the compute_cost() and gradient_descent() procedures
2) Select features (in the predictions procedure) and make predictions.

"""

def normalize_features(df):
    """
    Normalize the features in the data set.
    """
    mu = df.mean()
    sigma = df.std()
    
    if (sigma == 0).any():
        raise Exception("One or more features had the same value for all samples, and thus could " + \
                         "not be normalized. Please do not include features with only a single value " + \
                         "in your model.")
    df_normalized = (df - df.mean()) / df.std()

    return df_normalized, mu, sigma

def compute_cost(features, values, theta):
    """
    Compute the cost function given a set of features / values, 
    and the values for our thetas.
    
    This can be the same code as the compute_cost function in the lesson #3 exercises,
    but feel free to implement your own.
    """
    
    # your code here
    m = len(values)
    sum_of_square_errors = np.square(np.dot(features, theta) - values).sum()
    cost = sum_of_square_errors / (2*m)

    return cost

def gradient_descent(features, values, theta, alpha, num_iterations):
    """
    Perform gradient descent given a data set with an arbitrary number of features.
    
    This can be the same gradient descent code as in the lesson #3 exercises,
    but feel free to implement your own.
    """
    
    m = len(values)
    cost_history = []

    for i in range(num_iterations):
        # your code here
        predicted_values = np.dot(features, theta)
        theta = theta - alpha / m * np.dot((predicted_values - values), features)
        
        cost = compute_cost(features, values, theta)
        cost_history.append(cost)
    return theta, pandas.Series(cost_history)

def predictions(dataframe):
    '''
    The NYC turnstile data is stored in a pandas dataframe called weather_turnstile.
    Using the information stored in the dataframe, let's predict the ridership of
    the NYC subway using linear regression with gradient descent.
    
    You can download the complete turnstile weather dataframe here:
    https://www.dropbox.com/s/meyki2wl9xfa7yk/turnstile_data_master_with_weather.csv    
    
    Your prediction should have a R^2 value of 0.40 or better.
    You need to experiment using various input features contained in the dataframe. 
    We recommend that you don't use the EXITSn_hourly feature as an input to the 
    linear model because we cannot use it as a predictor: we cannot use exits 
    counts as a way to predict entry counts. 
    
    Note: Due to the memory and CPU limitation of our Amazon EC2 instance, we will
    give you a random subet (~15%) of the data contained in 
    turnstile_data_master_with_weather.csv. You are encouraged to experiment with 
    this computer on your own computer, locally. 
    
    
    If you'd like to view a plot of your cost history, uncomment the call to 
    plot_cost_history below. The slowdown from plotting is significant, so if you 
    are timing out, the first thing to do is to comment out the plot command again.
    
    If you receive a "server has encountered an error" message, that means you are 
    hitting the 30-second limit that's placed on running your program. Try using a 
    smaller number for num_iterations if that's the case.
    
    If you are using your own algorithm/models, see if you can optimize your code so 
    that it runs faster.
    '''
    # Select Features (try different features!)
    features = dataframe[['Hour']]
    # features = dataframe[['rain', 'precipi', 'Hour']]
    
    # Add UNIT to features using dummy variables
    dummy_units = pandas.get_dummies(dataframe['UNIT'], prefix='unit')
    features = features.join(dummy_units)
    
    # Values
    values = dataframe['ENTRIESn_hourly']
    m = len(values)

    features, mu, sigma = normalize_features(features)
    features['ones'] = np.ones(m) # Add a column of 1s (y intercept)
    
    # Convert features and values to numpy arrays
    features_array = np.array(features)
    values_array = np.array(values)

    # Set values for alpha, number of iterations.
    alpha = 0.1 # please feel free to change this value
    num_iterations = 75 # please feel free to change this value

    # Initialize theta, perform gradient descent
    theta_gradient_descent = np.zeros(len(features.columns))
    theta_gradient_descent, cost_history = gradient_descent(features_array, 
                                                            values_array, 
                                                            theta_gradient_descent, 
                                                            alpha, 
                                                            num_iterations)
    
    plot = None
    # -------------------------------------------------
    # Uncomment the next line to see your cost history
    # -------------------------------------------------
    plot = plot_cost_history(alpha, cost_history)
    # 
    # Please note, there is a possibility that plotting
    # this in addition to your calculation will exceed 
    # the 30 second limit on the compute servers.
    
    predictions = np.dot(features_array, theta_gradient_descent)
    return predictions, plot


def plot_cost_history(alpha, cost_history):
   """This function is for viewing the plot of your cost history.
   You can run it by uncommenting this

       plot_cost_history(alpha, cost_history) 

   call in predictions.
   
   If you want to run this locally, you should print the return value
   from this function.
   """
   cost_df = pandas.DataFrame({
      'Cost_History': cost_history,
      'Iteration': range(len(cost_history))
   })
   return ggplot(cost_df, aes('Iteration', 'Cost_History')) + \
      geom_point() + ggtitle('Cost History for alpha = %.3f' % alpha )



In [57]:

    
import numpy as np
import scipy
import matplotlib.pyplot as plt

def plot_residuals(turnstile_weather, predictions):
    '''
    Using the same methods that we used to plot a histogram of entries
    per hour for our data, why don't you make a histogram of the residuals
    (that is, the difference between the original hourly entry data and the predicted values).
    Try different binwidths for your histogram.

    Based on this residual histogram, do you have any insight into how our model
    performed?  Reading a bit on this webpage might be useful:

    http://www.itl.nist.gov/div898/handbook/pri/section2/pri24.htm
    '''
    
    plt.figure()
    (turnstile_weather['ENTRIESn_hourly'] - predictions).hist(bins=30)
    return plt



In [58]:

    
frame = pandas.read_csv('turnstile_data_master_with_weather.csv')
frame[:2]
frame.dtypes
#newtable = (frame.groupby(frame['Hour'], 'ENTRIESn_hourly').mean())









    Out[58]:





Unnamed: 0           int64
UNIT                object
DATEn               object
TIMEn               object
Hour                 int64
DESCn               object
ENTRIESn_hourly    float64
EXITSn_hourly      float64
maxpressurei       float64
maxdewpti          float64
mindewpti          float64
minpressurei       float64
meandewpti         float64
meanpressurei      float64
fog                float64
rain               float64
meanwindspdi       float64
mintempi           float64
meantempi          float64
maxtempi           float64
precipi            float64
thunder            float64
dtype: object



In [59]:

    
d = pandas.DataFrame(frame.groupby('Hour').sum())
d.index
ggplot(d, aes(d.index, 'ENTRIESn_hourly')) + geom_line() + xlab('Time of day (0000 to 2300 hrs)')









    












    Out[59]:





<ggplot: (8749609106077)>



In [73]:

    
d = pandas.DataFrame(frame.groupby('UNIT').sum())
d = d.sort(columns='ENTRIESn_hourly')
d = d[d > 800000]
#d['ENTRIESn_hourly']
d = pd.DataFrame({'UNIT': (d.sort(columns='ENTRIESn_hourly').index), 'entries': d.sort(columns='ENTRIESn_hourly')['ENTRIESn_hourly']})
#d['UNIT']
#d['UNIT'].astype(basestring)
#d.dtypes
ggplot(d, aes('UNIT', 'entries')) + geom_bar(stat="bar")









    












    Out[73]:





<ggplot: (8749561163241)>



In [61]:

    
d









    Out[61]:






  
    
      
      Unnamed: 0
      Hour
      ENTRIESn_hourly
      EXITSn_hourly
      maxpressurei
      maxdewpti
      mindewpti
      minpressurei
      meandewpti
      meanpressurei
      fog
      rain
      meanwindspdi
      mintempi
      meantempi
      maxtempi
      precipi
      thunder
    
    
      UNIT
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      R464
      10870424
      1660
      0
      0
      4895.85
      9329
      7873
      4872.84
      8592
      4884.84
      26
      54
      911
      9162
      10486
      11711
      26.31
      0
    
    
      R338
      11627848
      1773
      4464
      19759
      5255.28
      10043
      8471
      5230.36
      9252
      5243.37
      33
      60
      984
      9840
      11263
      12583
      32.65
      0
    
    
      R418
      11548550
      1923
      6524
      8692
      5285.88
      10046
      8461
      5261.03
      9243
      5274.00
      29
      56
      946
      9889
      11333
      12668
      24.78
      0
    
    
      R459
      7923399
      836
      7289
      280
      2488.36
      5332
      4768
      2476.37
      5057
      2482.77
      27
      42
      466
      5060
      5641
      6165
      26.64
      0
    
    
      R415
      12218121
      2051
      7921
      2615
      5586.00
      10641
      8959
      5560.11
      9793
      5573.59
      30
      61
      1036
      10442
      11959
      13365
      31.58
      0
    
    
      R417
      10785538
      1583
      15210
      16072
      4476.48
      8822
      7623
      4455.00
      8232
      4466.30
      30
      60
      802
      8468
      9620
      10679
      31.08
      0
    
    
      R336
      10643192
      1639
      18478
      52448
      4864.93
      9237
      7765
      4842.33
      8492
      4853.96
      27
      56
      900
      9063
      10379
      11598
      29.59
      0
    
    
      R165
      11902235
      2230
      21880
      0
      5585.83
      10591
      8901
      5560.01
      9734
      5573.43
      30
      60
      1032
      10423
      11941
      13348
      31.08
      0
    
    
      R455
      11194578
      1705
      23708
      73534
      5045.27
      9637
      8139
      5021.77
      8881
      5034.03
      29
      56
      928
      9452
      10814
      12073
      30.31
      0
    
    
      R448
      12574457
      2174
      24229
      26700
      5856.45
      11093
      9287
      5829.22
      10172
      5843.34
      31
      63
      1090
      10910
      12503
      13979
      33.93
      0
    
    
      R419
      11922346
      1967
      24793
      31219
      5375.77
      10261
      8658
      5350.84
      9450
      5363.80
      30
      60
      990
      10061
      11510
      12852
      31.08
      0
    
    
      R337
      11011460
      1718
      28336
      21881
      5015.30
      9541
      8029
      4991.95
      8774
      5004.15
      28
      55
      927
      9363
      10719
      11976
      29.42
      0
    
    
      R454
      11236511
      1646
      29557
      118009
      4985.72
      9544
      8067
      4962.41
      8801
      4974.55
      29
      55
      934
      9356
      10696
      11937
      29.14
      0
    
    
      R360
      11660626
      1938
      31873
      35844
      5345.97
      10179
      8576
      5321.41
      9370
      5334.32
      30
      57
      981
      10006
      11454
      12798
      30.67
      0
    
    
      R357
      11885790
      1980
      34666
      17614
      5405.64
      10308
      8694
      5380.62
      9492
      5393.64
      30
      60
      996
      10116
      11574
      12924
      31.08
      0
    
    
      R313
      10572689
      1674
      34931
      32896
      4865.04
      9257
      7810
      4842.40
      8526
      4854.12
      27
      54
      902
      9086
      10394
      11608
      27.08
      0
    
    
      R263
      13710059
      2083
      35386
      60335
      6035.92
      11675
      9853
      6007.16
      10756
      6022.26
      38
      73
      1093
      11357
      12977
      14479
      34.56
      0
    
    
      R003
      10695020
      1747
      35938
      25333
      5165.86
      9800
      8244
      5141.81
      9012
      5154.34
      29
      59
      956
      9608
      10988
      12264
      32.47
      0
    
    
      R348
      11578581
      1757
      38677
      13855
      5164.60
      9901
      8378
      5140.66
      9127
      5153.17
      29
      58
      949
      9710
      11106
      12400
      28.40
      0
    
    
      R422
      11845948
      1945
      39345
      24172
      5315.08
      10154
      8558
      5290.45
      9347
      5303.23
      31
      61
      981
      9944
      11371
      12693
      31.15
      0
    
    
      R416
      11860334
      1971
      41975
      41137
      5405.80
      10287
      8680
      5380.70
      9474
      5393.76
      31
      61
      993
      10100
      11560
      12910
      31.15
      0
    
    
      R038
      11574537
      1828
      43130
      24775
      5315.63
      10207
      8597
      5289.89
      9401
      5303.38
      34
      62
      982
      9975
      11394
      12708
      30.81
      0
    
    
      R354
      10844147
      1651
      43911
      39602
      4925.60
      9413
      7929
      4902.64
      8663
      4914.72
      28
      54
      897
      9236
      10564
      11794
      26.43
      0
    
    
      R426
      12065480
      2004
      46162
      15585
      5465.77
      10410
      8784
      5440.46
      9588
      5453.63
      30
      60
      1004
      10220
      11697
      13065
      31.08
      0
    
    
      R329
      11613171
      1929
      46398
      47947
      5315.71
      10133
      8554
      5291.01
      9333
      5303.87
      30
      60
      977
      9942
      11376
      12703
      31.08
      0
    
    
      R427
      11919265
      1950
      49506
      15033
      5345.62
      10230
      8650
      5320.84
      9430
      5333.74
      30
      60
      978
      10024
      11462
      12794
      31.08
      0
    
    
      R456
      12465008
      1895
      50038
      111841
      5615.02
      10720
      9046
      5588.85
      9873
      5602.78
      36
      67
      1010
      10507
      12012
      13404
      33.50
      0
    
    
      R358
      10890065
      1668
      50605
      49832
      4955.56
      9462
      7977
      4932.37
      8708
      4944.49
      28
      55
      917
      9257
      10592
      11831
      29.42
      0
    
    
      R247
      10805864
      1686
      50847
      48550
      4895.39
      9370
      7921
      4872.71
      8645
      4884.59
      28
      57
      897
      9177
      10483
      11692
      27.75
      0
    
    
      R316
      12257938
      2018
      53690
      53649
      5555.95
      10604
      8938
      5530.14
      9761
      5543.54
      31
      61
      1025
      10409
      11908
      13297
      31.18
      0
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      R177
      12007055
      2067
      877492
      669116
      5645.63
      10674
      8948
      5619.73
      9805
      5633.16
      30
      61
      1064
      10514
      12042
      13457
      31.75
      0
    
    
      R031
      12324257
      1939
      879536
      829266
      5826.61
      11079
      9349
      5799.82
      10206
      5813.73
      34
      66
      1074
      10872
      12446
      13901
      33.91
      0
    
    
      R541
      394130501
      67800
      887765
      800570
      177843.49
      338541
      285322
      177021.48
      311646
      177448.90
      981
      1971
      32855
      332517
      380521
      424956
      1024.51
      0
    
    
      R057
      12244522
      1927
      895935
      852758
      5765.79
      10926
      9172
      5739.19
      10030
      5753.13
      31
      60
      1071
      10778
      12345
      13795
      31.08
      0
    
    
      R044
      14378430
      2292
      897321
      905942
      6606.74
      12629
      10738
      6576.89
      11671
      6592.39
      34
      75
      1189
      12392
      14141
      15756
      37.33
      0
    
    
      R021
      13091705
      2060
      921754
      884099
      6186.95
      11717
      9940
      6158.49
      10810
      6173.25
      31
      61
      1117
      11537
      13235
      14809
      33.26
      0
    
    
      R108
      11751477
      1835
      936992
      650427
      5495.26
      10497
      8854
      5469.68
      9666
      5482.97
      32
      62
      1011
      10287
      11767
      13137
      32.86
      0
    
    
      R178
      12018686
      2036
      945340
      917008
      5586.01
      10669
      9013
      5560.21
      9832
      5573.63
      33
      64
      1040
      10446
      11950
      13343
      36.18
      0
    
    
      R051
      11856788
      1903
      951879
      798118
      5646.71
      10728
      9048
      5620.52
      9881
      5634.09
      30
      62
      1048
      10527
      12047
      13454
      31.92
      0
    
    
      R047
      12090204
      2090
      953822
      1086742
      5676.31
      10810
      9109
      5650.00
      9948
      5663.68
      32
      63
      1052
      10609
      12139
      13557
      31.92
      0
    
    
      R168
      12404027
      2105
      987022
      933722
      5735.48
      10931
      9228
      5708.80
      10067
      5722.71
      32
      64
      1070
      10718
      12260
      13688
      34.77
      0
    
    
      R025
      12765840
      2116
      1038087
      898776
      6156.93
      11625
      9753
      6127.89
      10673
      6143.05
      35
      64
      1131
      11457
      13149
      14723
      31.85
      0
    
    
      R138
      12326904
      2046
      1038214
      1044483
      5645.78
      10796
      9119
      5619.56
      9952
      5633.14
      31
      63
      1047
      10583
      12097
      13500
      32.02
      0
    
    
      R175
      11793293
      2009
      1065336
      914655
      5525.70
      10504
      8819
      5500.15
      9652
      5513.43
      30
      61
      1020
      10315
      11815
      13207
      31.58
      0
    
    
      R452
      12218145
      2010
      1068004
      1003399
      5525.89
      10521
      8882
      5500.42
      9693
      5513.68
      31
      61
      1017
      10326
      11820
      13203
      33.26
      0
    
    
      R540
      290707102
      50326
      1118847
      1060240
      132742.06
      252031
      212151
      132134.60
      231871
      132448.28
      701
      1453
      24628
      247726
      283674
      316970
      764.52
      0
    
    
      R023
      12306738
      1955
      1272679
      1313745
      5825.82
      11075
      9349
      5799.09
      10197
      5812.97
      30
      64
      1098
      10883
      12431
      13863
      33.76
      0
    
    
      R293
      12330995
      2273
      1288343
      982321
      5766.38
      10946
      9209
      5739.74
      10066
      5753.74
      31
      60
      1066
      10768
      12344
      13805
      31.08
      0
    
    
      R029
      12336330
      1985
      1293260
      741054
      5886.10
      11168
      9367
      5859.25
      10255
      5873.20
      31
      64
      1082
      10982
      12584
      14072
      32.13
      0
    
    
      R020
      13347879
      2086
      1348414
      1359083
      6215.55
      11872
      10053
      6186.44
      10950
      6201.53
      32
      70
      1187
      11625
      13270
      14790
      35.78
      0
    
    
      R018
      15239532
      2547
      1389878
      1153646
      6937.73
      13283
      11203
      6902.12
      12251
      6920.60
      48
      87
      1257
      12906
      14758
      16475
      36.83
      0
    
    
      R012
      12663820
      2107
      1564752
      1138409
      6278.81
      11868
      10025
      6249.92
      10937
      6264.95
      32
      65
      1173
      11647
      13381
      15000
      34.84
      0
    
    
      R011
      12444789
      1951
      1582914
      1280674
      5917.53
      11276
      9499
      5889.95
      10383
      5904.18
      35
      67
      1086
      11055
      12657
      14138
      31.77
      0
    
    
      R055
      12410558
      1963
      1607534
      1482767
      5856.78
      11134
      9404
      5829.97
      10260
      5843.86
      30
      64
      1092
      10930
      12514
      13980
      33.09
      0
    
    
      R179
      13279493
      2104
      1618261
      1598984
      6065.56
      11605
      9744
      6037.05
      10667
      6051.81
      34
      70
      1112
      11365
      13017
      14551
      34.94
      0
    
    
      R046
      13711891
      2175
      1695150
      1498240
      6275.16
      12005
      10223
      6246.29
      11097
      6261.35
      36
      78
      1226
      11724
      13365
      14883
      46.17
      0
    
    
      R033
      13684746
      2076
      1711663
      1112844
      6066.10
      11723
      9935
      6038.68
      10827
      6052.88
      31
      64
      1115
      11475
      13106
      14622
      34.01
      0
    
    
      R022
      13475756
      2103
      1796932
      1562407
      6246.50
      11889
      10013
      6218.34
      10922
      6232.85
      32
      62
      1149
      11738
      13467
      15081
      33.33
      0
    
    
      R084
      16343447
      2877
      1809423
      1415233
      7658.28
      14523
      12230
      7623.18
      13355
      7641.88
      42
      79
      1427
      14327
      16411
      18342
      44.07
      0
    
    
      R170
      15435071
      2813
      2887918
      2743585
      7296.23
      13885
      11673
      7262.49
      12766
      7279.81
      40
      85
      1342
      13573
      15535
      17354
      45.53
      0
    
  

465 rows × 18 columns



In [ ]:



In [ ]:

rain	0.0	1.0
UNIT
R001	64.927419	62.661290
R002	64.933884	62.758065
R003	64.522124	62.661017
R004	65.008772	62.645161
R005	65.191667	62.636364

	Unnamed: 0	UNIT	DATEn	TIMEn	Hour	DESCn	ENTRIESn_hourly	EXITSn_hourly	maxpressurei	maxdewpti	...	meandewpti	meanpressurei	fog	rain	meanwindspdi	mintempi	meantempi	maxtempi	precipi	thunder
0	0	R001	2011-05-01	01:00:00	1	REGULAR	0	0	30.31	42	...	39	30.27	0	0	5	50	60	69	0	0
1	1	R001	2011-05-01	05:00:00	5	REGULAR	217	553	30.31	42	...	39	30.27	0	0	5	50	60	69	0	0

	May	Not May
ENTRIESn_hourly
0	531	12810
1	75	1692
2	57	1198
3	42	978
4	35	863
5	26	787
6	27	721
7	36	664
8	29	636
9	21	571
10	18	567

	id	female	race	ses	schtyp	prog	read	write	math	science	socst
0	70	0	4	1	1	1	57	52	41	47	57
1	121	1	4	2	1	3	68	59	53	63	61
2	86	0	4	3	1	1	44	33	54	58	31
3	141	0	4	3	1	3	63	44	47	53	56

	Unnamed: 0	Hour	ENTRIESn_hourly	EXITSn_hourly	maxpressurei	maxdewpti	mindewpti	minpressurei	meandewpti	meanpressurei	fog	rain	meanwindspdi	mintempi	meantempi	maxtempi	precipi	thunder
UNIT
R464	10870424	1660	0	0	4895.85	9329	7873	4872.84	8592	4884.84	26	54	911	9162	10486	11711	26.31	0
R338	11627848	1773	4464	19759	5255.28	10043	8471	5230.36	9252	5243.37	33	60	984	9840	11263	12583	32.65	0
R418	11548550	1923	6524	8692	5285.88	10046	8461	5261.03	9243	5274.00	29	56	946	9889	11333	12668	24.78	0
R459	7923399	836	7289	280	2488.36	5332	4768	2476.37	5057	2482.77	27	42	466	5060	5641	6165	26.64	0
R415	12218121	2051	7921	2615	5586.00	10641	8959	5560.11	9793	5573.59	30	61	1036	10442	11959	13365	31.58	0
R417	10785538	1583	15210	16072	4476.48	8822	7623	4455.00	8232	4466.30	30	60	802	8468	9620	10679	31.08	0
R336	10643192	1639	18478	52448	4864.93	9237	7765	4842.33	8492	4853.96	27	56	900	9063	10379	11598	29.59	0
R165	11902235	2230	21880	0	5585.83	10591	8901	5560.01	9734	5573.43	30	60	1032	10423	11941	13348	31.08	0
R455	11194578	1705	23708	73534	5045.27	9637	8139	5021.77	8881	5034.03	29	56	928	9452	10814	12073	30.31	0
R448	12574457	2174	24229	26700	5856.45	11093	9287	5829.22	10172	5843.34	31	63	1090	10910	12503	13979	33.93	0
R419	11922346	1967	24793	31219	5375.77	10261	8658	5350.84	9450	5363.80	30	60	990	10061	11510	12852	31.08	0
R337	11011460	1718	28336	21881	5015.30	9541	8029	4991.95	8774	5004.15	28	55	927	9363	10719	11976	29.42	0
R454	11236511	1646	29557	118009	4985.72	9544	8067	4962.41	8801	4974.55	29	55	934	9356	10696	11937	29.14	0
R360	11660626	1938	31873	35844	5345.97	10179	8576	5321.41	9370	5334.32	30	57	981	10006	11454	12798	30.67	0
R357	11885790	1980	34666	17614	5405.64	10308	8694	5380.62	9492	5393.64	30	60	996	10116	11574	12924	31.08	0
R313	10572689	1674	34931	32896	4865.04	9257	7810	4842.40	8526	4854.12	27	54	902	9086	10394	11608	27.08	0
R263	13710059	2083	35386	60335	6035.92	11675	9853	6007.16	10756	6022.26	38	73	1093	11357	12977	14479	34.56	0
R003	10695020	1747	35938	25333	5165.86	9800	8244	5141.81	9012	5154.34	29	59	956	9608	10988	12264	32.47	0
R348	11578581	1757	38677	13855	5164.60	9901	8378	5140.66	9127	5153.17	29	58	949	9710	11106	12400	28.40	0
R422	11845948	1945	39345	24172	5315.08	10154	8558	5290.45	9347	5303.23	31	61	981	9944	11371	12693	31.15	0
R416	11860334	1971	41975	41137	5405.80	10287	8680	5380.70	9474	5393.76	31	61	993	10100	11560	12910	31.15	0
R038	11574537	1828	43130	24775	5315.63	10207	8597	5289.89	9401	5303.38	34	62	982	9975	11394	12708	30.81	0
R354	10844147	1651	43911	39602	4925.60	9413	7929	4902.64	8663	4914.72	28	54	897	9236	10564	11794	26.43	0
R426	12065480	2004	46162	15585	5465.77	10410	8784	5440.46	9588	5453.63	30	60	1004	10220	11697	13065	31.08	0
R329	11613171	1929	46398	47947	5315.71	10133	8554	5291.01	9333	5303.87	30	60	977	9942	11376	12703	31.08	0
R427	11919265	1950	49506	15033	5345.62	10230	8650	5320.84	9430	5333.74	30	60	978	10024	11462	12794	31.08	0
R456	12465008	1895	50038	111841	5615.02	10720	9046	5588.85	9873	5602.78	36	67	1010	10507	12012	13404	33.50	0
R358	10890065	1668	50605	49832	4955.56	9462	7977	4932.37	8708	4944.49	28	55	917	9257	10592	11831	29.42	0
R247	10805864	1686	50847	48550	4895.39	9370	7921	4872.71	8645	4884.59	28	57	897	9177	10483	11692	27.75	0
R316	12257938	2018	53690	53649	5555.95	10604	8938	5530.14	9761	5543.54	31	61	1025	10409	11908	13297	31.18	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
R177	12007055	2067	877492	669116	5645.63	10674	8948	5619.73	9805	5633.16	30	61	1064	10514	12042	13457	31.75	0
R031	12324257	1939	879536	829266	5826.61	11079	9349	5799.82	10206	5813.73	34	66	1074	10872	12446	13901	33.91	0
R541	394130501	67800	887765	800570	177843.49	338541	285322	177021.48	311646	177448.90	981	1971	32855	332517	380521	424956	1024.51	0
R057	12244522	1927	895935	852758	5765.79	10926	9172	5739.19	10030	5753.13	31	60	1071	10778	12345	13795	31.08	0
R044	14378430	2292	897321	905942	6606.74	12629	10738	6576.89	11671	6592.39	34	75	1189	12392	14141	15756	37.33	0
R021	13091705	2060	921754	884099	6186.95	11717	9940	6158.49	10810	6173.25	31	61	1117	11537	13235	14809	33.26	0
R108	11751477	1835	936992	650427	5495.26	10497	8854	5469.68	9666	5482.97	32	62	1011	10287	11767	13137	32.86	0
R178	12018686	2036	945340	917008	5586.01	10669	9013	5560.21	9832	5573.63	33	64	1040	10446	11950	13343	36.18	0
R051	11856788	1903	951879	798118	5646.71	10728	9048	5620.52	9881	5634.09	30	62	1048	10527	12047	13454	31.92	0
R047	12090204	2090	953822	1086742	5676.31	10810	9109	5650.00	9948	5663.68	32	63	1052	10609	12139	13557	31.92	0
R168	12404027	2105	987022	933722	5735.48	10931	9228	5708.80	10067	5722.71	32	64	1070	10718	12260	13688	34.77	0
R025	12765840	2116	1038087	898776	6156.93	11625	9753	6127.89	10673	6143.05	35	64	1131	11457	13149	14723	31.85	0
R138	12326904	2046	1038214	1044483	5645.78	10796	9119	5619.56	9952	5633.14	31	63	1047	10583	12097	13500	32.02	0
R175	11793293	2009	1065336	914655	5525.70	10504	8819	5500.15	9652	5513.43	30	61	1020	10315	11815	13207	31.58	0
R452	12218145	2010	1068004	1003399	5525.89	10521	8882	5500.42	9693	5513.68	31	61	1017	10326	11820	13203	33.26	0
R540	290707102	50326	1118847	1060240	132742.06	252031	212151	132134.60	231871	132448.28	701	1453	24628	247726	283674	316970	764.52	0
R023	12306738	1955	1272679	1313745	5825.82	11075	9349	5799.09	10197	5812.97	30	64	1098	10883	12431	13863	33.76	0
R293	12330995	2273	1288343	982321	5766.38	10946	9209	5739.74	10066	5753.74	31	60	1066	10768	12344	13805	31.08	0
R029	12336330	1985	1293260	741054	5886.10	11168	9367	5859.25	10255	5873.20	31	64	1082	10982	12584	14072	32.13	0
R020	13347879	2086	1348414	1359083	6215.55	11872	10053	6186.44	10950	6201.53	32	70	1187	11625	13270	14790	35.78	0
R018	15239532	2547	1389878	1153646	6937.73	13283	11203	6902.12	12251	6920.60	48	87	1257	12906	14758	16475	36.83	0
R012	12663820	2107	1564752	1138409	6278.81	11868	10025	6249.92	10937	6264.95	32	65	1173	11647	13381	15000	34.84	0
R011	12444789	1951	1582914	1280674	5917.53	11276	9499	5889.95	10383	5904.18	35	67	1086	11055	12657	14138	31.77	0
R055	12410558	1963	1607534	1482767	5856.78	11134	9404	5829.97	10260	5843.86	30	64	1092	10930	12514	13980	33.09	0
R179	13279493	2104	1618261	1598984	6065.56	11605	9744	6037.05	10667	6051.81	34	70	1112	11365	13017	14551	34.94	0
R046	13711891	2175	1695150	1498240	6275.16	12005	10223	6246.29	11097	6261.35	36	78	1226	11724	13365	14883	46.17	0
R033	13684746	2076	1711663	1112844	6066.10	11723	9935	6038.68	10827	6052.88	31	64	1115	11475	13106	14622	34.01	0
R022	13475756	2103	1796932	1562407	6246.50	11889	10013	6218.34	10922	6232.85	32	62	1149	11738	13467	15081	33.33	0
R084	16343447	2877	1809423	1415233	7658.28	14523	12230	7623.18	13355	7641.88	42	79	1427	14327	16411	18342	44.07	0
R170	15435071	2813	2887918	2743585	7296.23	13885	11673	7262.49	12766	7279.81	40	85	1342	13573	15535	17354	45.53	0

	id	female	race	ses	schtyp	prog	read	write	math	science	socst
0	70	0	4	1	1	1	57	52	41	47	57
1	121	1	4	2	1	3	68	59	53	63	61
2	86	0	4	3	1	1	44	33	54	58	31
3	141	0	4	3	1	3	63	44	47	53	56

	id	female	race	ses	schtyp	prog	read	write	math	science	socst
0	70	0	4	1	1	1	57	52	41	47	57
1	121	1	4	2	1	3	68	59	53	63	61
2	86	0	4	3	1	1	44	33	54	58	31
3	141	0	4	3	1	3	63	44	47	53	56