Example of the SuperSmoother

This notebook gives an example for how to use the Python implementation of Friedman's Supersmoother at http://github.com/jakevdp/supersmoother/

Supersmoother is a non-parametric locally-linear smooth in which the size of the local neighborhood is tuned to the characteristics of the data. It was introduced in 1984 by JH Friedman in a paper titled "A Variable Span Smoother" (pdf)


In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

# Use seaborn for plotting defaults.
# This can be safely commented-out if seaborn is not installed
import seaborn; seaborn.set()

In [2]:
# Install the package
# source at http://github.com/jakevdp/supersmoother
# or use ``pip install supersmoother``
from supersmoother import SuperSmoother, LinearSmoother

Generating the Data


In [3]:
def make_test_set(N=200, rseed_x=None, rseed_y=None):
    """Code to generate the test set from Friedman 1984"""
    rng_x = np.random.RandomState(rseed_x)
    rng_y = np.random.RandomState(rseed_y)
    x = rng_x.rand(N)
    dy = x
    y = np.sin(2 * np.pi * (1 - x) ** 2) + dy * rng_y.randn(N)
    return x, y, dy

In [4]:
# Generate and visualize the data
t, y, dy = make_test_set(rseed_x=0, rseed_y=1)
plt.errorbar(t, y, dy, fmt='o', alpha=0.3);


This is data generated the same way as in Friedman's paper. The nice part of this is that the data have a variety of errors, and the second derivative of the true model changes appreciably over the range:

Fitting the SuperSmoother

Here is how to fit the supersmoother to the data:


In [5]:
# fit the supersmoother model
model = SuperSmoother()
model.fit(t, y, dy)

# find the smoothed fit to the data
tfit = np.linspace(0, 1, 1000)
yfit = model.predict(tfit)

Now we'll visualize this smoothed curve:


In [6]:
# Show the smoothed model of the data
plt.errorbar(t, y, dy, fmt='o', alpha=0.3)
plt.plot(tfit, yfit, '-k');


Exploring the component smooths

The supersmoother is based on initial smooths where the size of each local neighborhood is some fraction $f$ of the total dataset. In analogy with audio frequencies, Friedman calls these the tweeter $(f = 0.05)$, the midrange $(f = 0.2)$, and the woofer $(f = 0.5)$. We can visualize these individual fits here:


In [7]:
plt.errorbar(t, y, dy, fmt='o', alpha=0.3)

for smooth in model.primary_smooths:
    plt.plot(tfit, smooth.predict(tfit),
             label='span = {0:.2f}'.format(smooth.span))
plt.legend();


The final supersmoother value uses cross-validation to select the best smoothing value at each time for the dataset. We can show these smoothed span values as follows:


In [8]:
t = np.linspace(0, 1, 1000)
plt.plot(t, model.span(t))
plt.xlabel('t')
plt.ylabel('smoothed span value');


These spans are fit to the particular data realization. Friedman chose to use 1000 realizations of the smoother to get a better estimate of how the span varies. We'll do the same here:


In [9]:
N = 1000
span = span2 = 0

tfit = np.linspace(0, 1, 100)

for rseed in np.arange(N):
    t, y, dy = make_test_set(rseed_x=0, rseed_y=rseed)
    model = SuperSmoother().fit(t, y, dy)
    span += model.span(tfit)
    span2 += model.span(tfit) ** 2
    
mean = span / N
std = np.sqrt(span2 / N - mean ** 2)
plt.plot(tfit, mean)
plt.fill_between(tfit, mean - std, mean + std, alpha=0.3)
plt.xlabel('t')
plt.ylabel('resulting span');


Bass Enhancement

The degree of smoothing can be tuned with the bass enhancement feature. This is a number $\alpha$ which lies between 0 and 10, with 10 being a much smoother curve


In [10]:
rng = np.random.RandomState(0)
t = rng.rand(200)
dy = 0.5
y = np.sin(5 * np.pi * t ** 2) + dy * rng.randn(200)

plt.errorbar(t, y, dy, fmt='o', alpha=0.3)

for alpha in [0, 8, 10]:
    smoother = SuperSmoother(alpha=alpha)
    smoother.fit(t, y, dy)
    plt.plot(tfit, smoother.predict(tfit),
             label='alpha = {0}'.format(alpha))
plt.legend(loc=2);


The effect of the smoothing is not linear: i.e. a change from 0 to 1 has much less effect than a change from 9 to 10.