In [8]:
%matplotlib inline
import sys
import numpy as np
import pandas as pd
import pandas.io.data
import matplotlib.pyplot as plt
print sys.version
print "Pandas:", pd.__version__


2.7.11 |Continuum Analytics, Inc.| (default, Feb 16 2016, 09:58:36) [MSC v.1500 64 bit (AMD64)]
Pandas: 0.17.0

Check this out.

pd.read_html can read tables into dataframes, in this case we are reading the second table, second column, all but the first row, and using that to name the columns in the df.


In [11]:
the_url = 'http://stat-computing.org/dataexpo/2009/the-data.html'
datedict = {'Date': ['Year','Month','DayofMonth']}
df = pd.read_csv('../june_airplane_data.csv', header=None, parse_dates= datedict, names=pd.read_html(the_url)[1][1][1:])

In [10]:
df.head()


Out[10]:
Year Month DayofMonth DayOfWeek DepTime CRSDepTime ArrTime CRSArrTime UniqueCarrier FlightNum ... TaxiIn TaxiOut Cancelled CancellationCode Diverted CarrierDelay WeatherDelay NASDelay SecurityDelay LateAircraftDelay
0 1994 6 5 7 1542 1540 1819 1815 US 236 ... NaN NaN 0 NaN 0 NaN NaN NaN NaN NaN
1 1994 6 6 1 1549 1540 1831 1815 US 236 ... NaN NaN 0 NaN 0 NaN NaN NaN NaN NaN
2 1994 6 7 2 1540 1540 1803 1815 US 236 ... NaN NaN 0 NaN 0 NaN NaN NaN NaN NaN
3 1994 6 8 3 1541 1540 1808 1815 US 236 ... NaN NaN 0 NaN 0 NaN NaN NaN NaN NaN
4 1994 6 9 4 1541 1540 1835 1815 US 236 ... NaN NaN 0 NaN 0 NaN NaN NaN NaN NaN

5 rows × 29 columns


In [12]:
ranges = [0,600,1200,1800,2400]
labels=['Early Morning', 'Morning', 'Early Afternoon', 'Evening']

In [13]:
df['DepTime2'] = pd.cut(df.DepTime, ranges, labels=labels).astype('category')

In [16]:
df['ArrTime2'] = pd.cut(df.ArrTime, ranges, labels=labels).astype('category')

In [17]:
df.ArrTime2.head()


Out[17]:
0    Evening
1    Evening
2    Evening
3    Evening
4    Evening
Name: ArrTime2, dtype: category
Categories (4, object): [Early Morning < Morning < Early Afternoon < Evening]

In [19]:
df[['DepTime2', 'ArrTime2']].describe()


Out[19]:
DepTime2 ArrTime2
count 423805 422641
unique 4 4
top (600, 1200] Early Afternoon
freq 163514 156806