First we import some libraries. Libraries (such as matplotlib, os, pandas, urllib) allow you to do more things than with Python's base functionality. They contain functions that have specific purposes. Some libraries contain functions that allow you to calculate the minimum, maximum, mean, standard deviation and so on. Other libraries are focused on making graphs and stuff to allow you present your findings in a quick and clean way.
In the following, we use:
Matplotlib allows you to make plots and graphs
Pandas is the Python library which contains a lot of data analysis functionality
urllib is a python library that contains functions which allow you to pull data from websites
In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn')
In the section below, we first import some libraries we will use. Next, we define a variable called URL, this is a webadress to our data set. The dataset contains the number of cyclists passing over the Fremont Bridge in Seattle in both directions every hour, since 2012 (I think).
We then define a function (remember, a function is a bunch of code that does 1 specific thing) which imports the data set from the given URL, called get_fremont_data. The function stores the data set in "data" and then spits out the data set so we can use it!
In [13]:
import os
import pandas as pd
from urllib.request import urlretrieve
URL = 'https://data.seattle.gov/api/views/65db-xm6k/rows.csv?accessType=DOWNLOAD'
def get_fremont_data(filename = 'Fremont.csv', url=URL, force_download = False):
if force_download or not os.path.exists(filename):
urlretrieve(url, filename)
data = pd.read_csv('Fremont.csv', index_col='Date', parse_dates=True)
data.columns = ['Cyclists Westwards', 'Cyclists Eastwards']
data['Total'] = data['Cyclists Westwards'] + data['Cyclists Eastwards']
return data
Important to note is that if we make a function which does a thing, it doesnt do that thing untill you actually call the function, calling a function is easy: just type its name and assign the function output to a variable name (which we creatively call data here.
If we would call the function without assigning it to a variable name, it will still spit out the table you see below, we just would not be able to do anything with the table, as it has no name we can call.
In [16]:
data = get_fremont_data()
data
Out[16]:
So great! We have not imported our dataset and we know it exists because we can show it. To find out what is actually interesting about the cyclists, we can plot the weekly sum of cyclists in both directions, as well as the total number of cyclists.
Interesting to see is the wave pattern (sinusoid) to the amount of cyclists on a yearly basis. Apparently, there are a lot fewer people who take the bycicle in the winter than there are in the summer. This isnt that surprising.
In [17]:
data.resample('W').sum().plot();
The graph below is boring, just skip it
In [18]:
ax = data.resample('D').sum().rolling(365).sum().plot();
ax.set_ylim(0,None)
Out[18]:
The graph below shows an interesting pattern which looks like the daily commute! The number of cyclists going west is much higher in the morning than in the late afternoon. The opposite is true for the number of cyclists going east.
Apparently, people live to the East of the bridge, and work to the West of the bridge!
In [19]:
data.groupby(data.index.time).mean().plot();
We now create a new table to plot some different things.
In [20]:
pivoted = data.pivot_table('Total', index=data.index.time, columns = data.index.date)
pivoted.iloc[:5, :5]
Out[20]:
This last plot is shows a bit more information. The plot is a graph for every day over the past couple of years - number of cyclists per hour of every day. It again shows the 2 peaks - one in the morning and one in the late afternoon. Obviously, this represents the daily commute.
However, in between the peaks, we can see a wave occuring in the middle of the day. This is probably the people that are going out for a bike ride in the middle of the day during the weekends!
In [21]:
pivoted.plot(legend=False, alpha = 0.02);
Zijn fietsers over een brug nou echt zo interessant? Nee, boeit me geen flikker. Maar het is wel een intuitive manier om te kijken naar een grote data set en hoe je er op een vrij simpele manier informatie uit kan halen over het dagelijkse/wekelijkse fietsgedrag van mensen.
Je kan dezelfde dingen toepassen op fabrieksprocessen (afstellingen van machines vs. productiviteit), criminaliteit in een stad en waar de politie zich op welke momenten van de dag afhankelijk van de dag van de week/de maand/het jaar het beste kan bevinden.
Mocht ik iets vinden wat meer in jouw straatje past wil ik wel eens kijken naar of ik er iets interessants uit kan plukken.