Using statsmodels lowess

MIT License: https://opensource.org/licenses/MIT



In [77]:

    
%matplotlib inline

import numpy as np
import pandas as pd

import random

import matplotlib.pyplot as plt

This article suggests that a smooth curve is a better way to show noisy polling data over time.

Here's their before and after:

And here's their data:



In [78]:

    
df = pd.read_csv('Economist_brexit.csv', header=3, parse_dates=[0])
df.index = df['Date']
df.head()









    Out[78]:







  
    
      
      Date
      % responding right
      % responding wrong
    
    
      Date
      
      
      
    
  
  
    
      2016-02-08
      2016-02-08
      46
      42
    
    
      2016-09-08
      2016-09-08
      45
      44
    
    
      2016-08-17
      2016-08-17
      46
      43
    
    
      2016-08-23
      2016-08-23
      45
      43
    
    
      2016-08-31
      2016-08-31
      47
      44



In [79]:

    
df.tail()









    Out[79]:







  
    
      
      Date
      % responding right
      % responding wrong
    
    
      Date
      
      
      
    
  
  
    
      2018-08-13
      2018-08-13
      43
      47
    
    
      2018-08-14
      2018-08-14
      43
      45
    
    
      2018-08-21
      2018-08-21
      41
      47
    
    
      2018-08-29
      2018-08-29
      42
      47
    
    
      2018-04-09
      2018-04-09
      42
      48

The following function uses StatsModels to put a smooth curve through a time series (and stuff the results back into a Pandas Series)



In [80]:

    
from statsmodels.nonparametric.smoothers_lowess import lowess

def make_lowess(series):
    endog = series.values
    exog = series.index.values

    smooth = lowess(endog, exog)
    index, data = np.transpose(smooth)
    
    return pd.Series(data, index=pd.to_datetime(index))

Here's what the graph looks like.



In [81]:

    
options = dict(marker='o', linewidth=0, alpha=0.3, label='')

df['% responding right'].plot(color='C0', **options)
df['% responding wrong'].plot(color='C1', **options)

right = make_lowess(df['% responding right'])
right.plot(label='Right')

wrong = make_lowess(df['% responding wrong'])
wrong.plot(label='Wrong')

plt.legend();



In [ ]:



In [ ]:

	Date	% responding right	% responding wrong
Date
2016-02-08	2016-02-08	46	42
2016-09-08	2016-09-08	45	44
2016-08-17	2016-08-17	46	43
2016-08-23	2016-08-23	45	43
2016-08-31	2016-08-31	47	44

	Date	% responding right	% responding wrong
Date
2018-08-13	2018-08-13	43	47
2018-08-14	2018-08-14	43	45
2018-08-21	2018-08-21	41	47
2018-08-29	2018-08-29	42	47
2018-04-09	2018-04-09	42	48