forked from the tutorial at EuroScipy 2015 by Joris Van den Bossche (Ghent University, Belgium)
Licensed under CC BY 4.0 Creative Commons
If you want to follow along, this is a notebook that you can view or run yourself:
pandas
>= 0.15.2 (easy solution is using Anaconda)Some imports:
In [7]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
pd.options.display.max_rows = 8
In [2]:
data = pd.read_csv('data/airbase_data.csv', index_col=0, parse_dates=True, na_values='-9999')
In [3]:
data
Out[3]:
to answering questions about this data in a few lines of code:
Does the air pollution show a decreasing trend over the years?
In [4]:
data['1999':].resample('A').plot(ylim=[0,100])
Out[4]:
How many exceedances of the limit values?
In [5]:
exceedances = data > 200
exceedances = exceedances.groupby(exceedances.index.year).sum()
ax = exceedances.loc[2005:].plot(kind='bar')
ax.axhline(18, color='k', linestyle='--')
Out[5]:
What is the difference in diurnal profile between weekdays and weekend?
In [6]:
data['weekday'] = data.index.weekday
data['weekend'] = data['weekday'].isin([5, 6])
data_weekend = data.groupby(['weekend', data.index.hour])['FR04012'].mean().unstack(level=0)
data_weekend.plot()
Out[6]:
We will come back to these example, and build them up step by step.
For data-intensive work in Python the Pandas library has become essential.
What is pandas
?
R
's data.frame
in Python.It's documentation: http://pandas.pydata.org/pandas-docs/stable/
.dropna()
, pd.isnull()
)concat
, join
)groupby
functionalitystack
, pivot
)