Stationarity and detrending (ADF/KPSS)

Stationarity means that the statistical properties of a time series i.e. mean, variance and covariance do not change over time. Many statistical models require the series to be stationary to make effective and precise predictions.

Two statistical tests would be used to check the stationarity of a time series – Augmented Dickey Fuller (“ADF”) test and Kwiatkowski-Phillips-Schmidt-Shin (“KPSS”) test. A method to convert a non-stationary time series into stationary series shall also be used.

This first cell imports standard packages and sets plots to appear inline.



In [ ]:

    
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm

Sunspots dataset is used. It contains yearly (1700-2008) data on sunspots from the National Geophysical Data Center.



In [ ]:

    
sunspots = sm.datasets.sunspots.load_pandas().data

Some preprocessing is carried out on the data. The "YEAR" column is used in creating index.



In [ ]:

    
sunspots.index = pd.Index(sm.tsa.datetools.dates_from_range('1700', '2008'))
del sunspots["YEAR"]

The data is plotted now.



In [ ]:

    
sunspots.plot(figsize=(12,8))

ADF test

ADF test is used to determine the presence of unit root in the series, and hence helps in understand if the series is stationary or not. The null and alternate hypothesis of this test are:

Null Hypothesis: The series has a unit root.

Alternate Hypothesis: The series has no unit root.

If the null hypothesis in failed to be rejected, this test may provide evidence that the series is non-stationary.

A function is created to carry out the ADF test on a time series.



In [ ]:

    
from statsmodels.tsa.stattools import adfuller
def adf_test(timeseries):
    print ('Results of Dickey-Fuller Test:')
    dftest = adfuller(timeseries, autolag='AIC')
    dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
    for key,value in dftest[4].items():
       dfoutput['Critical Value (%s)'%key] = value
    print (dfoutput)

KPSS test

KPSS is another test for checking the stationarity of a time series. The null and alternate hypothesis for the KPSS test are opposite that of the ADF test.

Null Hypothesis: The process is trend stationary.

Alternate Hypothesis: The series has a unit root (series is not stationary).

A function is created to carry out the KPSS test on a time series.



In [ ]:

    
from statsmodels.tsa.stattools import kpss
def kpss_test(timeseries):
    print ('Results of KPSS Test:')
    kpsstest = kpss(timeseries, regression='c', nlags=None)
    kpss_output = pd.Series(kpsstest[0:3], index=['Test Statistic','p-value','Lags Used'])
    for key,value in kpsstest[3].items():
        kpss_output['Critical Value (%s)'%key] = value
    print (kpss_output)

The ADF tests gives the following results – test statistic, p value and the critical value at 1%, 5% , and 10% confidence intervals.

ADF test is now applied on the data.



In [ ]:

    
adf_test(sunspots['SUNACTIVITY'])

Based upon the significance level of 0.05 and the p-value of ADF test, the null hypothesis can not be rejected. Hence, the series is non-stationary.

The KPSS tests gives the following results – test statistic, p value and the critical value at 1%, 5% , and 10% confidence intervals.

KPSS test is now applied on the data.



In [ ]:

    
kpss_test(sunspots['SUNACTIVITY'])

Based upon the significance level of 0.05 and the p-value of the KPSS test, the null hypothesis can not be rejected. Hence, the series is stationary as per the KPSS test.

It is always better to apply both the tests, so that it can be ensured that the series is truly stationary. Possible outcomes of applying these stationary tests are as follows:

Case 1: Both tests conclude that the series is not stationary - The series is not stationary
Case 2: Both tests conclude that the series is stationary - The series is stationary
Case 3: KPSS indicates stationarity and ADF indicates non-stationarity - The series is trend stationary. Trend needs to be removed to make series strict stationary. The detrended series is checked for stationarity.
Case 4: KPSS indicates non-stationarity and ADF indicates stationarity - The series is difference stationary. Differencing is to be used to make series stationary. The differenced series is checked for stationarity.

Here, due to the difference in the results from ADF test and KPSS test, it can be inferred that the series is trend stationary and not strict stationary. The series can be detrended by differencing or by model fitting.

Detrending by Differencing

It is one of the simplest methods for detrending a time series. A new series is constructed where the value at the current time step is calculated as the difference between the original observation and the observation at the previous time step.

Differencing is applied on the data and the result is plotted.



In [ ]:

    
sunspots['SUNACTIVITY_diff'] = sunspots['SUNACTIVITY'] - sunspots['SUNACTIVITY'].shift(1)
sunspots['SUNACTIVITY_diff'].dropna().plot(figsize=(12,8))

ADF test is now applied on these detrended values and stationarity is checked.



In [ ]:

    
adf_test(sunspots['SUNACTIVITY_diff'].dropna())

Based upon the p-value of ADF test, there is evidence for rejecting the null hypothesis in favor of the alternative. Hence, the series is strict stationary now.

KPSS test is now applied on these detrended values and stationarity is checked.



In [ ]:

    
kpss_test(sunspots['SUNACTIVITY_diff'].dropna())

Based upon the p-value of KPSS test, the null hypothesis can not be rejected. Hence, the series is stationary.

Conclusion

Two tests for checking the stationarity of a time series are used, namely ADF test and KPSS test. Detrending is carried out by using differencing. Trend stationary time series is converted into strict stationary time series. Requisite forecasting model can now be applied on a stationary time series data.