1. Compute the average temperature by season ('season_desc'). (The temperatures are numbers between 0 and 1, but don't worry about that. Let's say that's the Shellman temperature scale.)

In [1]:
import pandas as pd
import numpy as np
from pandas import Series, DataFrame

In [2]:
weather = pd.read_table('daily_weather.tsv')

In [3]:
weather.groupby('season_desc').agg({'temp': np.mean})


Out[3]:
temp
season_desc
Fall 0.711445
Spring 0.321700
Summer 0.554557
Winter 0.419368

In [4]:
fix = weather.replace("Fall", "Summer_").replace("Summer", "Spring_").replace("Winter", "Fall_").replace("Spring", "Winter_")

In [6]:
weather.groupby('season_desc').agg({'temp': np.mean})


Out[6]:
temp
season_desc
Fall 0.711445
Spring 0.321700
Summer 0.554557
Winter 0.419368
  1. Various of the columns represent dates or datetimes, but out of the box pd.read_table won't treat them correctly. This makes it hard to (for example) compute the number of rentals by month. Fix the dates and compute the number of rentals by month.

In [9]:
weather['months'] = pd.DatetimeIndex(weather.date).month

In [10]:
weather.groupby('months').agg({'total_riders': np.sum})


Out[10]:
total_riders
months
1 96744
2 103137
3 164875
4 174224
5 195865
6 202830
7 203607
8 214503
9 218573
10 198841
11 152664
12 123713

weather[['total_riders', 'temp']].corr()

3.Investigate how the number of rentals varies with temperature. Is this trend constant across seasons? Across months?


In [11]:
weather[['total_riders', 'temp', 'months']].groupby('months').corr()


Out[11]:
temp total_riders
months
1 temp 1.000000 0.689495
total_riders 0.689495 1.000000
2 temp 1.000000 0.716206
total_riders 0.716206 1.000000
3 temp 1.000000 0.735575
total_riders 0.735575 1.000000
4 temp 1.000000 0.533387
total_riders 0.533387 1.000000
5 temp 1.000000 0.065599
total_riders 0.065599 1.000000
6 temp 1.000000 -0.330884
total_riders -0.330884 1.000000
7 temp 1.000000 -0.184704
total_riders -0.184704 1.000000
8 temp 1.000000 0.288264
total_riders 0.288264 1.000000
9 temp 1.000000 -0.418753
total_riders -0.418753 1.000000
10 temp 1.000000 0.466666
total_riders 0.466666 1.000000
11 temp 1.000000 0.511232
total_riders 0.511232 1.000000
12 temp 1.000000 0.690062
total_riders 0.690062 1.000000

weather[['total_riders', 'temp', 'season_desc']].groupby('season_desc').corr()


In [12]:
weather[['no_casual_riders', 'no_reg_riders', 'temp']].corr()


Out[12]:
no_casual_riders no_reg_riders temp
no_casual_riders 1.000000 0.274984 0.542253
no_reg_riders 0.274984 1.000000 0.607425
temp 0.542253 0.607425 1.000000

4.There are various types of users in the usage data sets. What sorts of things can you say about how they use the bikes differently?


In [13]:
weather[['no_casual_riders', 'no_reg_riders']].corr()


Out[13]:
no_casual_riders no_reg_riders
no_casual_riders 1.000000 0.274984
no_reg_riders 0.274984 1.000000

In [16]:
weather[['is_holiday', 'total_riders']].sum()


Out[16]:
is_holiday           11
total_riders    2049576
dtype: int64

In [15]:
weather[['is_holiday', 'total_riders']].corr()


Out[15]:
is_holiday total_riders
is_holiday 1.000000 -0.118134
total_riders -0.118134 1.000000

In [ ]:


In [ ]: