In [1]:
URL = "https://data.seattle.gov/api/views/65db-xm6k/rows.csv?accessType=DOWNLOAD"
In [2]:
from urllib.request import urlretrieve
urlretrieve(URL, "Fremont.csv")
Out[2]:
In [3]:
!head Freemont.csv
In [4]:
import pandas as pd
data = pd.read_csv("Fremont.csv")
data.head()
Out[4]:
In [5]:
data = pd.read_csv("Fremont.csv", index_col="Date", parse_dates=True)
data.head()
Out[5]:
In [6]:
%matplotlib inline
data.index = pd.to_datetime(data.index)
data.plot()
Out[6]:
In [7]:
data.resample('W').sum().plot()
Out[7]:
In [8]:
import matplotlib.pyplot as plt
plt.style.use("seaborn")
data.resample("W").sum().plot()
Out[8]:
In [9]:
data.columns = ["West", "East"]
data.resample("W").sum().plot()
Out[9]:
Look for Annual Trend; growth-decline over ridership
Let's try a rolling window. Over 365 days rolling sum
In [10]:
data.resample("D").sum().rolling(365).sum().plot()
Out[10]:
They don't go all the way to zero so let's set the y lenght to zero to none. current maxima.
In [11]:
ax = data.resample("D").sum().rolling(365).sum().plot()
ax.set_ylim(0, None)
Out[11]:
There seems to be a offset between the left and right sidewalk. Let's plot them. See their trends.
In [12]:
data["Total"] = data["West"] + data["East"]
ax = data.resample("D").sum().rolling(365).sum().plot()
ax.set_ylim(0, None)
Out[12]:
Somehow the east and west side trends are reversed so the total bike rides across the bridge hover around 1 million and pretty accurent over the last couple of years +- couple percent.
Let's group by time of day and let's take it's mean and plot it.
In [13]:
data.groupby(data.index.time).mean().plot()
Out[13]:
Let's see the whole data set in this way not just this average. We will do a pivot table.
In [14]:
pivoted = data.pivot_table("Total", index=data.index.time, columns=data.index.date)
pivoted.iloc[:5, :5]
Out[14]:
We now have a 2d data set. Each column is a day and each row is an hour during that day. Let's take legend off and plot it.
In [15]:
pivoted.plot(legend=False)
Out[15]:
Let's reduce transparency to see better.
In [16]:
pivoted.plot(legend=False,alpha=0.01)
Out[16]:
Let's do a quick, "Restart & Run All" to check if the analysis is reproduced to same way we did.
Let's upload this to GitHub. Let's create a new repository, initialize it with a readme, add a python .gitignore and add MIT license. Create the repository.
Get the https key from the repository.
Open the terminal to the same directory as we are writing this in.
git clone {whatever-the-copied-thing-is}
cd JupyterWorkflow to see that readme and license are there.
mv JupyterWorkflow.ipynb JupyterWorkflow
cd JupyterWorkflow
git status to see that the newly copied document isn't updated with
git add JupyterNotebook.ipynbb
git commit - m "Add initial analysis notebook
git push origin master
git status
the notebook should be on github now.
We have Fremont.csv, data in out github repository and we are okay for now because this data is small but with a larger data set, we wan't to actually make sure we don't want to accidentally commit this data "Freemont.csv" into the repository.
open up the .gitignore file that we created while setting up the github repository
Add this to the bottom telling to ignore it to git status
Freemont.csv
git add .gitignore git commit -m "Add data to git ignore" git status git push origin master