In [10]:
import pandas as pd
import os
import seaborn as sns
%matplotlib inline

Pedestrian counts

Get the data

http://bit.ly/1vm5nJ5

  • pedestrian_count_bridge.csv
  • pedestrian_count_station.csv

Things to do

  • Load the data and make sure it has a datetime index
  • Consolidate data into one dataframe
  • Plot a couple location counts together
  • Plot daily totals for pedestrians
  • Plot Monthly totals in individual sub-plots
  • Q) Which location has had the most traffic this year?
  • Q) Which location had the least traffic June?
  • Q) Which location had the most overall traffic at 8am on a weekday?
  • Plot a Histogram of total pedestrians per by hour on 24-7-2013 in Flagstaff Station
  • Q) Which location has had the biggest percentage change year on yearfrom 2012->2013
  • Q) What is the overall busiest day of the week for foot traffic
  • Q) What was the single busiest day, when and where
  • Q) Make a boxplot of the daily counts at each location

Load the data and make sure it has a datetime index


In [69]:
bridge = pd.read_csv('pedestrian_count_bridge.csv')
bridge = bridge.set_index(pd.DatetimeIndex(bridge.pop('datetime')))
station = pd.read_csv('pedestrian_count_station.csv')
station = station.set_index(pd.DatetimeIndex(station.pop('datetime')))

or


In [70]:
bridge = pd.read_csv('pedestrian_count_bridge.csv', index_col=0, parse_dates=[0])
station = pd.read_csv('pedestrian_count_station.csv', index_col=0, parse_dates=[0])

In [71]:
station.head()


Out[71]:
Flagstaff Station Southern Cross Station Flinders St Station Underpass
datetime
2012-01-01 00:00:00 157 325 3726
2012-01-01 01:00:00 105 246 3082
2012-01-01 02:00:00 39 131 2784
2012-01-01 03:00:00 34 135 1915
2012-01-01 04:00:00 26 56 959

In [21]:
bridge.head()


Out[21]:
Webb Bridge Princes Bridge Sandridge Bridge
datetime
2012-01-01 00:00:00 325 4171 479
2012-01-01 01:00:00 169 2843 1011
2012-01-01 02:00:00 52 2153 1346
2012-01-01 03:00:00 58 967 1034
2012-01-01 04:00:00 18 404 532

Consolidate data into one dataframe


In [27]:
all_dat = bridge.join(station, how='inner')

In [72]:
all_dat.head()


Out[72]:
Webb Bridge Princes Bridge Sandridge Bridge Flagstaff Station Southern Cross Station Flinders St Station Underpass
datetime
2012-01-01 00:00:00 325 4171 479 157 325 3726
2012-01-01 01:00:00 169 2843 1011 105 246 3082
2012-01-01 02:00:00 52 2153 1346 39 131 2784
2012-01-01 03:00:00 58 967 1034 34 135 1915
2012-01-01 04:00:00 18 404 532 26 56 959

Plot a couple location counts together


In [73]:
bridge.plot()


Out[73]:
<matplotlib.axes._subplots.AxesSubplot at 0x11798de50>

Plot daily totals for pedestrians


In [74]:
bridge.resample('D', how='sum').plot()


Out[74]:
<matplotlib.axes._subplots.AxesSubplot at 0x1088c6110>

Plot Monthly totals in individual sub-plots


In [75]:
bridge.resample('M', how='sum').plot(subplots=True, figsize=(12, 10))


Out[75]:
array([<matplotlib.axes._subplots.AxesSubplot object at 0x11436eed0>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x116c4fa50>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x1173b4650>], dtype=object)

Q) Which one has had the most traffic this year?

A) Princes Bridge


In [76]:
all_dat['2014':].sum().sort(inplace=False)


Out[76]:
Webb Bridge                       20148205
Sandridge Bridge                  47711031
Southern Cross Station            64873641
Flagstaff Station                 96613925
Flinders St Station Underpass    106842183
Princes Bridge                   123709670
dtype: float64

Q) Which location had the least traffic June?

A) Webb Bridge


In [77]:
all_dat['2014-06'].sum().sort(inplace=False)
# OR all_dat[all_dat.index.month == 6]


Out[77]:
Webb Bridge                       2150769
Sandridge Bridge                  5155049
Southern Cross Station            7294983
Flagstaff Station                10243709
Flinders St Station Underpass    11657484
Princes Bridge                   13526645
dtype: float64

Q) Which location had the most overall traffic at 8am on a weekday?

A) Flagstaff Station


In [78]:
all_dat[(all_dat.index.weekday < 5)
        & (all_dat.index.hour == 8)].sum().sort(inplace=False)


Out[78]:
Webb Bridge                       263917
Sandridge Bridge                  501395
Princes Bridge                    792703
Southern Cross Station           1360759
Flinders St Station Underpass    1381658
Flagstaff Station                2518030
dtype: float64

Plot a Histogram of total pedestrians per by hour for July 2013 in Flagstaff Station


In [79]:
dt = station.ix['2014-07']['Flagstaff Station']
dt.groupby(dt.index.hour).sum().plot(kind='bar')


Out[79]:
<matplotlib.axes._subplots.AxesSubplot at 0x118783990>

Q) Which location has had the biggest percentage change year on yearfrom 2012->2013

A) Flinders St


In [93]:
changes = (all_dat.resample('A', how='sum').diff()['2013'].values
           / all_dat.resample('A', how='sum')['2012'].values) * 100
pd.DataFrame(changes, columns=all_dat.columns)


Out[93]:
Webb Bridge Princes Bridge Sandridge Bridge Flagstaff Station Southern Cross Station Flinders St Station Underpass
0 13.952665 11.12876 6.574006 0.877868 14.384528 -22.142738

Q) What is the overall busiest day of the week for foot traffic

A) Friday!


In [100]:
all_dat.sum(axis=1).groupby(all_dat.index.dayofweek).sum().sort(inplace=False, ascending=False)


Out[100]:
6    55048427
5    55299765
2    77188589
0    79280516
3    82554956
1    85241115
4    88645828
dtype: float64

Q) What was the single busiest day, when and where

A) 2014-04-25 Princes Bridge


In [129]:
day_dat = all_dat.groupby(all_dat.index.date).sum()

In [133]:
for c in day_dat.columns:
    dt = day_dat[c].idxmax()
    mx = day_dat[c].max()
    print '{}\t{:>35}\t{:9.2f}'.format(dt, c, mx)


2014-04-15	                        Webb Bridge	137280.00
2014-04-25	                     Princes Bridge	1253640.00
2014-09-26	                   Sandridge Bridge	311760.00
2014-10-04	                  Flagstaff Station	770448.00
2014-05-06	             Southern Cross Station	454944.00
2014-09-26	      Flinders St Station Underpass	723288.00

Make a boxplot of the daily counts at each location


In [158]:
import pylab as plt
import numpy as np
fig, ax = plt.subplots(1, figsize=(12, 10))
_ = day_dat.boxplot(grid=False, ax=ax)
ax.set_ylim(0, np.percentile(day_dat.values.flatten(), 82.5))
ax.set_xticklabels(day_dat.columns, rotation=90)


Out[158]:
[<matplotlib.text.Text at 0x11bc1f950>,
 <matplotlib.text.Text at 0x1198134d0>,
 <matplotlib.text.Text at 0x11bf47410>,
 <matplotlib.text.Text at 0x11bf47b90>,
 <matplotlib.text.Text at 0x11bc1fad0>,
 <matplotlib.text.Text at 0x10fd137d0>]

In [146]:
np.percentile(day_dat.values.flatten(), 90)


Out[146]:
413932.80000000005

In [ ]: