Question 1: Average Temp by Season

Compute the average temperature by season ('season_desc'). (The temperatures are numbers between 0 and 1, but don't worry about that. Let's say that's the Shellman temperature scale.)



In [4]:

    
from pandas import Series, DataFrame



In [5]:

    
import pandas as pd



In [6]:

    
import numpy as np



In [7]:

    
weather = pd.read_table('daily_weather.tsv')



In [8]:

    
stations = pd.read_table('stations.tsv')



In [9]:

    
usage = pd.read_table('usage_2012.tsv')



In [10]:

    
newseasons = {'Summer': 'Spring', 'Spring': 'Winter', 'Fall': 'Summer', 'Winter': 'Fall'}



In [11]:

    
weather['season_desc'] = weather['season_desc'].map(newseasons)



In [12]:

    
pd.pivot_table(weather, 'temp', 'season_desc', aggfunc=np.average)









    Out[12]:





season_desc
Fall      0.419368
Spring    0.554557
Summer    0.711445
Winter    0.321700
Name: temp, dtype: float64

Question 2: Number of Rentals by Month

Various of the columns represent dates or datetimes, but out of the box pd.read_table won't treat them correctly. This makes it hard to (for example) compute the number of rentals by month. Fix the dates and compute the number of rentals by month.



In [13]:

    
weather['Month'] = pd.DatetimeIndex(weather.date).month



In [14]:

    
pd.pivot_table(weather, 'total_riders', 'Month', aggfunc=np.sum)









    Out[14]:





Month
1      96744
2     103137
3     164875
4     174224
5     195865
6     202830
7     203607
8     214503
9     218573
10    198841
11    152664
12    123713
Name: total_riders, dtype: int64

Question 3: Rental Variance by Temperature

Investigate how the number of rentals varies with temperature. Is this trend constant across seasons? Across months?



In [15]:

    
pd.concat([weather['temp'], weather['total_riders']], axis=1).corr()









    Out[15]:






  
    
      
      temp
      total_riders
    
  
  
    
      temp
      1.000000
      0.713793
    
    
      total_riders
      0.713793
      1.000000



In [16]:

    
weather[['temp', 'total_riders', 'Month']].groupby('Month').corr()









    Out[16]:






  
    
      
      
      temp
      total_riders
    
    
      Month
      
      
      
    
  
  
    
      1
      temp
      1.000000
      0.689495
    
    
      total_riders
      0.689495
      1.000000
    
    
      2
      temp
      1.000000
      0.716206
    
    
      total_riders
      0.716206
      1.000000
    
    
      3
      temp
      1.000000
      0.735575
    
    
      total_riders
      0.735575
      1.000000
    
    
      4
      temp
      1.000000
      0.533387
    
    
      total_riders
      0.533387
      1.000000
    
    
      5
      temp
      1.000000
      0.065599
    
    
      total_riders
      0.065599
      1.000000
    
    
      6
      temp
      1.000000
      -0.330884
    
    
      total_riders
      -0.330884
      1.000000
    
    
      7
      temp
      1.000000
      -0.184704
    
    
      total_riders
      -0.184704
      1.000000
    
    
      8
      temp
      1.000000
      0.288264
    
    
      total_riders
      0.288264
      1.000000
    
    
      9
      temp
      1.000000
      -0.418753
    
    
      total_riders
      -0.418753
      1.000000
    
    
      10
      temp
      1.000000
      0.466666
    
    
      total_riders
      0.466666
      1.000000
    
    
      11
      temp
      1.000000
      0.511232
    
    
      total_riders
      0.511232
      1.000000
    
    
      12
      temp
      1.000000
      0.690062
    
    
      total_riders
      0.690062
      1.000000

Question 4: User Data

There are various types of users in the usage data sets. What sorts of things can you say about how they use the bikes differently?



In [19]:

    
pd.concat([weather['temp'], weather['no_casual_riders'], weather['no_reg_riders']], axis=1, keys=['temp', 'Non-Regulars', 'Regulars']).corr()









    Out[19]:






  
    
      
      temp
      Non-Regulars
      Regulars
    
  
  
    
      temp
      1.000000
      0.542253
      0.607425
    
    
      Non-Regulars
      0.542253
      1.000000
      0.274984
    
    
      Regulars
      0.607425
      0.274984
      1.000000

As the temp is higher, Regulars are more likely to ride.



In [17]:

    
pd.concat([weather['is_work_day'], weather['no_casual_riders'], weather['no_reg_riders']], axis=1, keys=['Is_Workday', 'Non-Regulars', 'Regulars']).corr()









    Out[17]:






  
    
      
      Is_Workday
      Non-Regulars
      Regulars
    
  
  
    
      Is_Workday
      1.000000
      -0.539919
      0.437003
    
    
      Non-Regulars
      -0.539919
      1.000000
      0.274984
    
    
      Regulars
      0.437003
      0.274984
      1.000000

Regulars have a much higher usage rate on Working Days.



In [18]:

    
pd.concat([weather['is_holiday'], weather['no_casual_riders'], weather['no_reg_riders']], axis=1, keys=['Is_Holiday', 'Non-Regulars', 'Regulars']).corr()









    Out[18]:






  
    
      
      Is_Holiday
      Non-Regulars
      Regulars
    
  
  
    
      Is_Holiday
      1.00000
      0.029720
      -0.164190
    
    
      Non-Regulars
      0.02972
      1.000000
      0.274984
    
    
      Regulars
      -0.16419
      0.274984
      1.000000

		temp	total_riders
Month
1	temp	1.000000	0.689495
1	total_riders	0.689495	1.000000
2	temp	1.000000	0.716206
2	total_riders	0.716206	1.000000
3	temp	1.000000	0.735575
3	total_riders	0.735575	1.000000
4	temp	1.000000	0.533387
4	total_riders	0.533387	1.000000
5	temp	1.000000	0.065599
5	total_riders	0.065599	1.000000
6	temp	1.000000	-0.330884
6	total_riders	-0.330884	1.000000
7	temp	1.000000	-0.184704
7	total_riders	-0.184704	1.000000
8	temp	1.000000	0.288264
8	total_riders	0.288264	1.000000
9	temp	1.000000	-0.418753
9	total_riders	-0.418753	1.000000
10	temp	1.000000	0.466666
10	total_riders	0.466666	1.000000
11	temp	1.000000	0.511232
11	total_riders	0.511232	1.000000
12	temp	1.000000	0.690062
12	total_riders	0.690062	1.000000

	temp	Non-Regulars	Regulars
temp	1.000000	0.542253	0.607425
Non-Regulars	0.542253	1.000000	0.274984
Regulars	0.607425	0.274984	1.000000

	Is_Workday	Non-Regulars	Regulars
Is_Workday	1.000000	-0.539919	0.437003
Non-Regulars	-0.539919	1.000000	0.274984
Regulars	0.437003	0.274984	1.000000

	Is_Holiday	Non-Regulars	Regulars
Is_Holiday	1.00000	0.029720	-0.164190
Non-Regulars	0.02972	1.000000	0.274984
Regulars	-0.16419	0.274984	1.000000

Question 1: Average Temp by Season

Compute the average temperature by season ('season_desc'). (The temperatures are numbers between 0 and 1, but don't worry about that. Let's say that's the Shellman temperature scale.)

Question 2: Number of Rentals by Month

Various of the columns represent dates or datetimes, but out of the box pd.read_table won't treat them correctly. This makes it hard to (for example) compute the number of rentals by month. Fix the dates and compute the number of rentals by month.

Question 3: Rental Variance by Temperature

Investigate how the number of rentals varies with temperature. Is this trend constant across seasons? Across months?

Question 4: User Data

There are various types of users in the usage data sets. What sorts of things can you say about how they use the bikes differently?

As the temp is higher, Regulars are more likely to ride.

Regulars have a much higher usage rate on Working Days.

Weak correlation of Regular rider data during Holiday dates.