Copyright 2019 Allen B. Downey
MIT License: https://opensource.org/licenses/MIT
In [77]:
%matplotlib inline
import numpy as np
import pandas as pd
import random
import matplotlib.pyplot as plt
This article suggests that a smooth curve is a better way to show noisy polling data over time.
Here's their before and after:
And here's their data:
In [78]:
df = pd.read_csv('Economist_brexit.csv', header=3, parse_dates=[0])
df.index = df['Date']
df.head()
Out[78]:
In [79]:
df.tail()
Out[79]:
The following function uses StatsModels to put a smooth curve through a time series (and stuff the results back into a Pandas Series)
In [80]:
from statsmodels.nonparametric.smoothers_lowess import lowess
def make_lowess(series):
endog = series.values
exog = series.index.values
smooth = lowess(endog, exog)
index, data = np.transpose(smooth)
return pd.Series(data, index=pd.to_datetime(index))
Here's what the graph looks like.
In [81]:
options = dict(marker='o', linewidth=0, alpha=0.3, label='')
df['% responding right'].plot(color='C0', **options)
df['% responding wrong'].plot(color='C1', **options)
right = make_lowess(df['% responding right'])
right.plot(label='Right')
wrong = make_lowess(df['% responding wrong'])
wrong.plot(label='Wrong')
plt.legend();
In [ ]:
In [ ]: