Question 1: Average Temp by Season

Compute the average temperature by season ('season_desc'). (The temperatures are numbers between 0 and 1, but don't worry about that. Let's say that's the Shellman temperature scale.)


In [4]:
from pandas import Series, DataFrame

In [5]:
import pandas as pd

In [6]:
import numpy as np

In [7]:
weather = pd.read_table('daily_weather.tsv')

In [8]:
stations = pd.read_table('stations.tsv')

In [9]:
usage = pd.read_table('usage_2012.tsv')

In [10]:
newseasons = {'Summer': 'Spring', 'Spring': 'Winter', 'Fall': 'Summer', 'Winter': 'Fall'}

In [11]:
weather['season_desc'] = weather['season_desc'].map(newseasons)

In [12]:
pd.pivot_table(weather, 'temp', 'season_desc', aggfunc=np.average)


Out[12]:
season_desc
Fall      0.419368
Spring    0.554557
Summer    0.711445
Winter    0.321700
Name: temp, dtype: float64

Question 2: Number of Rentals by Month

Various of the columns represent dates or datetimes, but out of the box pd.read_table won't treat them correctly. This makes it hard to (for example) compute the number of rentals by month. Fix the dates and compute the number of rentals by month.


In [13]:
weather['Month'] = pd.DatetimeIndex(weather.date).month

In [14]:
pd.pivot_table(weather, 'total_riders', 'Month', aggfunc=np.sum)


Out[14]:
Month
1      96744
2     103137
3     164875
4     174224
5     195865
6     202830
7     203607
8     214503
9     218573
10    198841
11    152664
12    123713
Name: total_riders, dtype: int64

Question 3: Rental Variance by Temperature

Investigate how the number of rentals varies with temperature. Is this trend constant across seasons? Across months?


In [15]:
pd.concat([weather['temp'], weather['total_riders']], axis=1).corr()


Out[15]:
temp total_riders
temp 1.000000 0.713793
total_riders 0.713793 1.000000

In [16]:
weather[['temp', 'total_riders', 'Month']].groupby('Month').corr()


Out[16]:
temp total_riders
Month
1 temp 1.000000 0.689495
total_riders 0.689495 1.000000
2 temp 1.000000 0.716206
total_riders 0.716206 1.000000
3 temp 1.000000 0.735575
total_riders 0.735575 1.000000
4 temp 1.000000 0.533387
total_riders 0.533387 1.000000
5 temp 1.000000 0.065599
total_riders 0.065599 1.000000
6 temp 1.000000 -0.330884
total_riders -0.330884 1.000000
7 temp 1.000000 -0.184704
total_riders -0.184704 1.000000
8 temp 1.000000 0.288264
total_riders 0.288264 1.000000
9 temp 1.000000 -0.418753
total_riders -0.418753 1.000000
10 temp 1.000000 0.466666
total_riders 0.466666 1.000000
11 temp 1.000000 0.511232
total_riders 0.511232 1.000000
12 temp 1.000000 0.690062
total_riders 0.690062 1.000000

Question 4: User Data

There are various types of users in the usage data sets. What sorts of things can you say about how they use the bikes differently?


In [19]:
pd.concat([weather['temp'], weather['no_casual_riders'], weather['no_reg_riders']], axis=1, keys=['temp', 'Non-Regulars', 'Regulars']).corr()


Out[19]:
temp Non-Regulars Regulars
temp 1.000000 0.542253 0.607425
Non-Regulars 0.542253 1.000000 0.274984
Regulars 0.607425 0.274984 1.000000

As the temp is higher, Regulars are more likely to ride.


In [17]:
pd.concat([weather['is_work_day'], weather['no_casual_riders'], weather['no_reg_riders']], axis=1, keys=['Is_Workday', 'Non-Regulars', 'Regulars']).corr()


Out[17]:
Is_Workday Non-Regulars Regulars
Is_Workday 1.000000 -0.539919 0.437003
Non-Regulars -0.539919 1.000000 0.274984
Regulars 0.437003 0.274984 1.000000

Regulars have a much higher usage rate on Working Days.


In [18]:
pd.concat([weather['is_holiday'], weather['no_casual_riders'], weather['no_reg_riders']], axis=1, keys=['Is_Holiday', 'Non-Regulars', 'Regulars']).corr()


Out[18]:
Is_Holiday Non-Regulars Regulars
Is_Holiday 1.00000 0.029720 -0.164190
Non-Regulars 0.02972 1.000000 0.274984
Regulars -0.16419 0.274984 1.000000

Weak correlation of Regular rider data during Holiday dates.