Compute the average temperature by season ('season_desc'). (The temperatures are numbers between 0 and 1, but don't worry about that. Let's say that's the Shellman temperature scale.)
I get
season_desc
Fall 0.711445
Spring 0.321700
Summer 0.554557
Winter 0.419368
Which clearly looks wrong. Figure out what's wrong with the original data and fix it.
In [22]:
from pandas import DataFrame, Series
In [23]:
import pandas as pd
In [24]:
import numpy as np
In [25]:
weather_data = pd.read_table('data/daily_weather.tsv')
In [26]:
season_mapping = {'Spring': 'Winter', 'Winter': 'Fall', 'Fall': 'Summer', 'Summer': 'Spring'}
In [27]:
def fix_seasons(x):
return season_mapping[x]
In [28]:
weather_data['season_desc'] = weather_data['season_desc'].apply(fix_seasons)
In [29]:
weather_data.pivot_table(index='season_desc', values='temp', aggfunc=np.mean)
Out[29]:
In this case, a pivot table is not really required, so a simple use of groupby
and mean()
will do the job.
In [30]:
weather_data.groupby('season_desc')['temp'].mean()
Out[30]:
In [31]:
weather_data['Month'] = pd.DatetimeIndex(weather_data.date).month
In [32]:
weather_data.groupby('Month')['total_riders'].sum()
Out[32]:
In [33]:
pd.concat([weather_data['temp'], weather_data['total_riders']], axis=1).corr()
Out[33]:
Check how correlation between temp and total riders varies across months.
In [34]:
weather_data[['total_riders', 'temp', 'Month']].groupby('Month').corr()
Out[34]:
Check how correlation between temp and total riders varies across seasons.
In [35]:
weather_data[['total_riders', 'temp', 'season_desc']].groupby('season_desc').corr()
Out[35]:
Investigate total riders by month versus average monthly temp.
In [36]:
month_riders = weather_data.groupby('Month')['total_riders'].sum()
In [37]:
month_avg_temp = weather_data.groupby('Month')['temp'].mean()
In [38]:
pd.concat([month_riders, month_avg_temp], axis=1)
Out[38]:
Investigate total riders by season versus average seasonal temp.
In [39]:
season_riders = weather_data.groupby('season_desc')['total_riders'].sum()
In [40]:
season_temp = weather_data.groupby('season_desc')['temp'].mean()
In [41]:
pd.concat([season_riders, season_temp], axis=1)
Out[41]:
Investigate correlations between casual and reg riders on work days and holidays.
In [42]:
weather_data[['no_casual_riders', 'no_reg_riders', 'is_work_day', 'is_holiday']].corr()
Out[42]:
Investigate correlations between casual and reg riders and windspeed.
In [43]:
weather_data[['no_casual_riders', 'no_reg_riders', 'windspeed']].corr()
Out[43]:
In [44]:
usage = pd.read_table('data/usage_2012.tsv')
Compare average rental duration between customer types.
In [45]:
usage.groupby('cust_type')['duration_mins'].mean()
Out[45]: