workflow for data Analysis project.

An exploratory analysis of freemont bridge bike data in Seattle WA


In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn')

from module_code.data import get_fremont_data

In [2]:
df = get_fremont_data()

In [3]:
df.head()


Out[3]:
West East Total
Date
2012-10-03 00:00:00 4.0 9.0 13.0
2012-10-03 01:00:00 4.0 6.0 10.0
2012-10-03 02:00:00 1.0 1.0 2.0
2012-10-03 03:00:00 2.0 3.0 5.0
2012-10-03 04:00:00 6.0 1.0 7.0

In [4]:
df.resample('W').sum().plot() # ugly looking graphs. Change to seaborn.


Out[4]:
<matplotlib.axes._subplots.AxesSubplot at 0x10e4797b8>

In [5]:
# resample daily and find the rolling sum of 365 days.
ax = df.resample('D').sum().rolling(365).sum().plot()


What we got was the annual trend


In [6]:
ax = df.resample('D').sum().rolling(365).sum().plot()
ax.set_ylim(0, None)


Out[6]:
(0, 1059460.05)

In [7]:
df.groupby(df.index.time).mean().plot()


Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x10d5b3438>

West side congested in the morning while the east side is congested in the everning.

People coming into the city in the morning and going out of the city in the afternoon.j


In [8]:
pivoted = df.pivot_table('Total', index=df.index.time, columns=df.index.date)
pivoted.iloc[:5,:5]


Out[8]:
2012-10-03 2012-10-04 2012-10-05 2012-10-06 2012-10-07
00:00:00 13.0 18.0 11.0 15.0 11.0
01:00:00 10.0 3.0 8.0 15.0 17.0
02:00:00 2.0 9.0 7.0 9.0 3.0
03:00:00 5.0 3.0 4.0 3.0 6.0
04:00:00 7.0 8.0 9.0 5.0 3.0

In [9]:
pivoted.plot(legend = False, alpha = 0.01)


Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x10e46f048>

In [12]:
get_fremont_data??