Estimate a regression using the Capital Bikeshare data
We'll be working with a dataset from Capital Bikeshare that was used in a Kaggle competition (data dictionary).
Get started on this competition through Kaggle Scripts
Bike sharing systems are a means of renting bicycles where the process of obtaining membership, rental, and bike return is automated via a network of kiosk locations throughout a city. Using these systems, people are able rent a bike from a one location and return it to a different place on an as-needed basis. Currently, there are over 500 bike-sharing programs around the world.
The data generated by these systems makes them attractive for researchers because the duration of travel, departure location, arrival location, and time elapsed is explicitly recorded. Bike sharing systems therefore function as a sensor network, which can be used for studying mobility in a city. In this competition, participants are asked to combine historical usage patterns with weather data in order to forecast bike rental demand in the Capital Bikeshare program in Washington, D.C.
In [1]:
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
# read the data and set the datetime as the index
import zipfile
with zipfile.ZipFile('../datasets/bikeshare.csv.zip', 'r') as z:
f = z.open('bikeshare.csv')
bikes = pd.read_csv(f, index_col='datetime', parse_dates=True)
# "count" is a method, so it's best to name that column something else
bikes.rename(columns={'count':'total'}, inplace=True)
bikes.head()
Out[1]:
In [2]:
bikes.shape
Out[2]:
In [3]:
# Pandas scatter plot
bikes.plot(kind='scatter', x='temp', y='total', alpha=0.2)
Out[3]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]: