Online Retail

Data Set Information:

This is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail.The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers.

Attribute Information:

  • InvoiceNo: Invoice number. Nominal, a 6-digit integral number uniquely assigned to each transaction. If this code starts with letter 'c', it indicates a cancellation.
  • StockCode: Product (item) code. Nominal, a 5-digit integral number uniquely assigned to each distinct product.
  • Description: Product (item) name. Nominal.
  • Quantity: The quantities of each product (item) per transaction. Numeric.
  • InvoiceDate: Invice Date and time. Numeric, the day and time when each transaction was generated.
  • UnitPrice: Unit price. Numeric, Product price per unit in sterling.
  • CustomerID: Customer number. Nominal, a 5-digit integral number uniquely assigned to each customer.
  • Country: Country name. Nominal, the name of the country where each customer resides.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import warnings
import itertools
import operator
import statsmodels.api as sm


/Users/Thanakom/anaconda/lib/python3.6/site-packages/statsmodels/compat/pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
  from pandas.core import datetools

In [2]:
online_retail = pd.read_excel('data/Online Retail.xlsx')

In [3]:
online_retail.describe()


Out[3]:
Quantity UnitPrice CustomerID
count 541909.000000 541909.000000 406829.000000
mean 9.552250 4.611114 15287.690570
std 218.081158 96.759853 1713.600303
min -80995.000000 -11062.060000 12346.000000
25% 1.000000 1.250000 13953.000000
50% 3.000000 2.080000 15152.000000
75% 10.000000 4.130000 16791.000000
max 80995.000000 38970.000000 18287.000000

In [4]:
online_retail.head()


Out[4]:
InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice CustomerID Country
0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2010-12-01 08:26:00 2.55 17850.0 United Kingdom
1 536365 71053 WHITE METAL LANTERN 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom
2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 2010-12-01 08:26:00 2.75 17850.0 United Kingdom
3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom
4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom

In [5]:
online_retail['InvoiceDate'] = online_retail['InvoiceDate'].astype('datetime64[ns]')
online_retail['TotalPrice'] = online_retail['Quantity'] * online_retail['UnitPrice']

In [6]:
online_retail.head()


Out[6]:
InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice CustomerID Country TotalPrice
0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2010-12-01 08:26:00 2.55 17850.0 United Kingdom 15.30
1 536365 71053 WHITE METAL LANTERN 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 2010-12-01 08:26:00 2.75 17850.0 United Kingdom 22.00
3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34
4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom 20.34

In [7]:
online_retail.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 541909 entries, 0 to 541908
Data columns (total 9 columns):
InvoiceNo      541909 non-null object
StockCode      541909 non-null object
Description    540455 non-null object
Quantity       541909 non-null int64
InvoiceDate    541909 non-null datetime64[ns]
UnitPrice      541909 non-null float64
CustomerID     406829 non-null float64
Country        541909 non-null object
TotalPrice     541909 non-null float64
dtypes: datetime64[ns](1), float64(3), int64(1), object(4)
memory usage: 37.2+ MB

In [8]:
(online_retail['CustomerID'].isnull()).any()


Out[8]:
True

In [9]:
online_retail[online_retail['CustomerID'].isnull()]


Out[9]:
InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice CustomerID Country TotalPrice
622 536414 22139 NaN 56 2010-12-01 11:52:00 0.00 NaN United Kingdom 0.00
1443 536544 21773 DECORATIVE ROSE BATHROOM BOTTLE 1 2010-12-01 14:32:00 2.51 NaN United Kingdom 2.51
1444 536544 21774 DECORATIVE CATS BATHROOM BOTTLE 2 2010-12-01 14:32:00 2.51 NaN United Kingdom 5.02
1445 536544 21786 POLKADOT RAIN HAT 4 2010-12-01 14:32:00 0.85 NaN United Kingdom 3.40
1446 536544 21787 RAIN PONCHO RETROSPOT 2 2010-12-01 14:32:00 1.66 NaN United Kingdom 3.32
1447 536544 21790 VINTAGE SNAP CARDS 9 2010-12-01 14:32:00 1.66 NaN United Kingdom 14.94
1448 536544 21791 VINTAGE HEADS AND TAILS CARD GAME 2 2010-12-01 14:32:00 2.51 NaN United Kingdom 5.02
1449 536544 21801 CHRISTMAS TREE DECORATION WITH BELL 10 2010-12-01 14:32:00 0.43 NaN United Kingdom 4.30
1450 536544 21802 CHRISTMAS TREE HEART DECORATION 9 2010-12-01 14:32:00 0.43 NaN United Kingdom 3.87
1451 536544 21803 CHRISTMAS TREE STAR DECORATION 11 2010-12-01 14:32:00 0.43 NaN United Kingdom 4.73
1452 536544 21809 CHRISTMAS HANGING TREE WITH BELL 1 2010-12-01 14:32:00 2.51 NaN United Kingdom 2.51
1453 536544 21810 CHRISTMAS HANGING STAR WITH BELL 3 2010-12-01 14:32:00 2.51 NaN United Kingdom 7.53
1454 536544 21811 CHRISTMAS HANGING HEART WITH BELL 1 2010-12-01 14:32:00 2.51 NaN United Kingdom 2.51
1455 536544 21821 GLITTER STAR GARLAND WITH BELLS 1 2010-12-01 14:32:00 7.62 NaN United Kingdom 7.62
1456 536544 21822 GLITTER CHRISTMAS TREE WITH BELLS 1 2010-12-01 14:32:00 4.21 NaN United Kingdom 4.21
1457 536544 21823 PAINTED METAL HEART WITH HOLLY BELL 2 2010-12-01 14:32:00 2.98 NaN United Kingdom 5.96
1458 536544 21844 RED RETROSPOT MUG 2 2010-12-01 14:32:00 5.91 NaN United Kingdom 11.82
1459 536544 21851 LILAC DIAMANTE PEN IN GIFT BOX 1 2010-12-01 14:32:00 4.21 NaN United Kingdom 4.21
1460 536544 21870 I CAN ONLY PLEASE ONE PERSON MUG 1 2010-12-01 14:32:00 3.36 NaN United Kingdom 3.36
1461 536544 21871 SAVE THE PLANET MUG 5 2010-12-01 14:32:00 3.36 NaN United Kingdom 16.80
1462 536544 21874 GIN AND TONIC MUG 1 2010-12-01 14:32:00 3.36 NaN United Kingdom 3.36
1463 536544 21879 HEARTS GIFT TAPE 1 2010-12-01 14:32:00 1.66 NaN United Kingdom 1.66
1464 536544 21884 CAKES AND BOWS GIFT TAPE 1 2010-12-01 14:32:00 1.66 NaN United Kingdom 1.66
1465 536544 21888 BINGO SET 1 2010-12-01 14:32:00 7.62 NaN United Kingdom 7.62
1466 536544 21889 WOODEN BOX OF DOMINOES 2 2010-12-01 14:32:00 2.51 NaN United Kingdom 5.02
1467 536544 21892 TRADITIONAL WOODEN CATCH CUP GAME 3 2010-12-01 14:32:00 2.51 NaN United Kingdom 7.53
1468 536544 21894 POTTING SHED SEED ENVELOPES 1 2010-12-01 14:32:00 2.51 NaN United Kingdom 2.51
1469 536544 21911 GARDEN METAL SIGN 1 2010-12-01 14:32:00 3.36 NaN United Kingdom 3.36
1470 536544 21912 VINTAGE SNAKES & LADDERS 3 2010-12-01 14:32:00 7.62 NaN United Kingdom 22.86
1471 536544 21913 VINTAGE SEASIDE JIGSAW PUZZLES 1 2010-12-01 14:32:00 7.62 NaN United Kingdom 7.62
... ... ... ... ... ... ... ... ... ...
541511 581498 71053 WHITE MOROCCAN METAL LANTERN 1 2011-12-09 10:26:00 8.29 NaN United Kingdom 8.29
541512 581498 72349b SET/6 PURPLE BUTTERFLY T-LIGHTS 2 2011-12-09 10:26:00 4.13 NaN United Kingdom 8.26
541513 581498 79321 CHILLI LIGHTS 10 2011-12-09 10:26:00 12.46 NaN United Kingdom 124.60
541514 581498 82001s SILVER RECORD COVER FRAME 2 2011-12-09 10:26:00 7.46 NaN United Kingdom 14.92
541515 581498 82482 WOODEN PICTURE FRAME WHITE FINISH 4 2011-12-09 10:26:00 4.96 NaN United Kingdom 19.84
541516 581498 82552 WASHROOM METAL SIGN 1 2011-12-09 10:26:00 2.46 NaN United Kingdom 2.46
541517 581498 82580 BATHROOM METAL SIGN 1 2011-12-09 10:26:00 1.25 NaN United Kingdom 1.25
541518 581498 82581 TOILET METAL SIGN 1 2011-12-09 10:26:00 1.25 NaN United Kingdom 1.25
541519 581498 82600 N0 SINGING METAL SIGN 4 2011-12-09 10:26:00 4.13 NaN United Kingdom 16.52
541520 581498 84029E RED WOOLLY HOTTIE WHITE HEART. 4 2011-12-09 10:26:00 8.29 NaN United Kingdom 33.16
541521 581498 84032A CHARLIE+LOLA PINK HOT WATER BOTTLE 4 2011-12-09 10:26:00 5.79 NaN United Kingdom 23.16
541522 581498 84032B CHARLIE + LOLA RED HOT WATER BOTTLE 3 2011-12-09 10:26:00 3.29 NaN United Kingdom 9.87
541523 581498 84375 SET OF 20 KIDS COOKIE CUTTERS 3 2011-12-09 10:26:00 4.13 NaN United Kingdom 12.39
541524 581498 84509a SET OF 4 ENGLISH ROSE PLACEMATS 1 2011-12-09 10:26:00 7.46 NaN United Kingdom 7.46
541525 581498 84558a 3D DOG PICTURE PLAYING CARDS 1 2011-12-09 10:26:00 5.79 NaN United Kingdom 5.79
541526 581498 84832 ZINC WILLIE WINKIE CANDLE STICK 26 2011-12-09 10:26:00 1.63 NaN United Kingdom 42.38
541527 581498 84968e SET OF 16 VINTAGE BLACK CUTLERY 1 2011-12-09 10:26:00 24.96 NaN United Kingdom 24.96
541528 581498 84970s HANGING HEART ZINC T-LIGHT HOLDER 1 2011-12-09 10:26:00 2.08 NaN United Kingdom 2.08
541529 581498 84997a CHILDRENS CUTLERY POLKADOT GREEN 2 2011-12-09 10:26:00 8.29 NaN United Kingdom 16.58
541530 581498 84997b CHILDRENS CUTLERY RETROSPOT RED 3 2011-12-09 10:26:00 8.29 NaN United Kingdom 24.87
541531 581498 84997d CHILDRENS CUTLERY POLKADOT PINK 1 2011-12-09 10:26:00 8.29 NaN United Kingdom 8.29
541532 581498 85038 6 CHOCOLATE LOVE HEART T-LIGHTS 1 2011-12-09 10:26:00 4.13 NaN United Kingdom 4.13
541533 581498 85048 15CM CHRISTMAS GLASS BALL 20 LIGHTS 1 2011-12-09 10:26:00 16.63 NaN United Kingdom 16.63
541534 581498 85049a TRADITIONAL CHRISTMAS RIBBONS 5 2011-12-09 10:26:00 3.29 NaN United Kingdom 16.45
541535 581498 85049e SCANDINAVIAN REDS RIBBONS 4 2011-12-09 10:26:00 3.29 NaN United Kingdom 13.16
541536 581498 85099B JUMBO BAG RED RETROSPOT 5 2011-12-09 10:26:00 4.13 NaN United Kingdom 20.65
541537 581498 85099C JUMBO BAG BAROQUE BLACK WHITE 4 2011-12-09 10:26:00 4.13 NaN United Kingdom 16.52
541538 581498 85150 LADIES & GENTLEMEN METAL SIGN 1 2011-12-09 10:26:00 4.96 NaN United Kingdom 4.96
541539 581498 85174 S/4 CACTI CANDLES 1 2011-12-09 10:26:00 10.79 NaN United Kingdom 10.79
541540 581498 DOT DOTCOM POSTAGE 1 2011-12-09 10:26:00 1714.17 NaN United Kingdom 1714.17

135080 rows × 9 columns


In [10]:
#calculate revenue? total sum of the price
online_retail.set_index('InvoiceDate', inplace=True)

In [11]:
# http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases
y = online_retail['TotalPrice'].resample('D').sum()

In [12]:
y.head()


Out[12]:
InvoiceDate
2010-12-01    58635.56
2010-12-02    46207.28
2010-12-03    45620.46
2010-12-04         NaN
2010-12-05    31383.95
Freq: D, Name: TotalPrice, dtype: float64

In [13]:
y = y.fillna(y.bfill())

In [14]:
y.head()


Out[14]:
InvoiceDate
2010-12-01    58635.56
2010-12-02    46207.28
2010-12-03    45620.46
2010-12-04    31383.95
2010-12-05    31383.95
Freq: D, Name: TotalPrice, dtype: float64

In [15]:
y.isnull().any()


Out[15]:
False

In [16]:
y.plot(figsize=(15,6))
plt.show()



In [17]:
p = d = q = range(0, 2)

pdq = list(itertools.product(p, d, q))

# try adjust the `s` parameter
s = 30
seasonal_pdq = [(x[0], x[1], x[2], s) for x in list(itertools.product(p, d, q))]

In [18]:
print('Example of parameter conbination for Seasonal ARIMA')
print('SARIMAX: {} x {}'.format(pdq[1], seasonal_pdq[1]))
print('SARIMAX: {} x {}'.format(pdq[2], seasonal_pdq[2]))


Example of parameter conbination for Seasonal ARIMA
SARIMAX: (0, 0, 1) x (0, 0, 1, 30)
SARIMAX: (0, 1, 0) x (0, 1, 0, 30)

In [19]:
warnings.filterwarnings('ignore')
history = {}
for param in pdq:
    for param_seasonal in seasonal_pdq:
        try:
            mod = sm.tsa.statespace.SARIMAX(y,
                                           order=param,
                                           seasonal_order=param_seasonal,
                                           enforce_stationarity=False,
                                           enforce_invertibility=False)
            
            results = mod.fit()
            history[(param, param_seasonal)] = results.aic
            print('ARIMA{}x{} - AIC: {}'.format(param, param_seasonal, results.aic))
        except:
            continue


ARIMA(0, 0, 0)x(0, 0, 1, 30) - AIC: 8018.168502154193
ARIMA(0, 0, 0)x(0, 1, 1, 30) - AIC: 7048.507276551258
ARIMA(0, 0, 0)x(1, 0, 0, 30) - AIC: 7832.814107767612
ARIMA(0, 0, 0)x(1, 0, 1, 30) - AIC: 7686.6921297533245
ARIMA(0, 0, 0)x(1, 1, 0, 30) - AIC: 7104.051753703679
ARIMA(0, 0, 0)x(1, 1, 1, 30) - AIC: 7045.340255483797
ARIMA(0, 0, 1)x(0, 0, 0, 30) - AIC: 8607.222842705485
ARIMA(0, 0, 1)x(0, 0, 1, 30) - AIC: 7884.9780744830805
ARIMA(0, 0, 1)x(0, 1, 0, 30) - AIC: 7762.412152127949
ARIMA(0, 0, 1)x(0, 1, 1, 30) - AIC: 6979.385626247506
ARIMA(0, 0, 1)x(1, 0, 0, 30) - AIC: 7900.793936469681
ARIMA(0, 0, 1)x(1, 0, 1, 30) - AIC: 7822.7291175681885
ARIMA(0, 0, 1)x(1, 1, 0, 30) - AIC: 7056.905746870598
ARIMA(0, 0, 1)x(1, 1, 1, 30) - AIC: 6979.228527386424
ARIMA(0, 1, 0)x(0, 0, 1, 30) - AIC: 7653.66752668981
ARIMA(0, 1, 0)x(0, 1, 1, 30) - AIC: 7065.365236445079
ARIMA(0, 1, 0)x(1, 0, 0, 30) - AIC: 7674.3499734171
ARIMA(0, 1, 0)x(1, 0, 1, 30) - AIC: 7655.216490504715
ARIMA(0, 1, 0)x(1, 1, 0, 30) - AIC: 7132.71535254097
ARIMA(0, 1, 0)x(1, 1, 1, 30) - AIC: 7034.341675104406
ARIMA(0, 1, 1)x(0, 0, 0, 30) - AIC: 8190.495335664507
ARIMA(0, 1, 1)x(0, 0, 1, 30) - AIC: 7516.923218258148
ARIMA(0, 1, 1)x(0, 1, 0, 30) - AIC: 7781.783993373989
ARIMA(0, 1, 1)x(0, 1, 1, 30) - AIC: 6996.61072573717
ARIMA(0, 1, 1)x(1, 0, 0, 30) - AIC: 7561.96702056592
ARIMA(0, 1, 1)x(1, 0, 1, 30) - AIC: 7517.341200598579
ARIMA(0, 1, 1)x(1, 1, 0, 30) - AIC: 7093.438720358266
ARIMA(0, 1, 1)x(1, 1, 1, 30) - AIC: 6996.843571751471
ARIMA(1, 0, 0)x(0, 0, 0, 30) - AIC: 8319.929893920249
ARIMA(1, 0, 0)x(0, 0, 1, 30) - AIC: 7657.386983458688
ARIMA(1, 0, 0)x(0, 1, 0, 30) - AIC: 7792.9998490845355
ARIMA(1, 0, 0)x(0, 1, 1, 30) - AIC: 7000.10070880895
ARIMA(1, 0, 0)x(1, 0, 0, 30) - AIC: 7657.228618675041
ARIMA(1, 0, 0)x(1, 0, 1, 30) - AIC: 7642.75519014865
ARIMA(1, 0, 0)x(1, 1, 0, 30) - AIC: 7037.821457504029
ARIMA(1, 0, 0)x(1, 1, 1, 30) - AIC: 6999.02238510636
ARIMA(1, 0, 1)x(0, 0, 0, 30) - AIC: 8214.338314723327
ARIMA(1, 0, 1)x(0, 0, 1, 30) - AIC: 7538.269279956821
ARIMA(1, 0, 1)x(0, 1, 0, 30) - AIC: 7763.963408232448
ARIMA(1, 0, 1)x(0, 1, 1, 30) - AIC: 6976.478919897044
ARIMA(1, 0, 1)x(1, 0, 0, 30) - AIC: 7560.896570218099
ARIMA(1, 0, 1)x(1, 0, 1, 30) - AIC: 7540.269119028846
ARIMA(1, 0, 1)x(1, 1, 0, 30) - AIC: 7033.096660792293
ARIMA(1, 0, 1)x(1, 1, 1, 30) - AIC: 6975.350879017582
ARIMA(1, 1, 0)x(0, 0, 0, 30) - AIC: 8297.863944904744
ARIMA(1, 1, 0)x(0, 0, 1, 30) - AIC: 7625.504056827524
ARIMA(1, 1, 0)x(0, 1, 0, 30) - AIC: 7899.762048850454
ARIMA(1, 1, 0)x(0, 1, 1, 30) - AIC: 7000.244874771306
ARIMA(1, 1, 0)x(1, 0, 0, 30) - AIC: 7627.379324247141
ARIMA(1, 1, 0)x(1, 0, 1, 30) - AIC: 7626.312784413969
ARIMA(1, 1, 0)x(1, 1, 0, 30) - AIC: 7120.279955358235
ARIMA(1, 1, 0)x(1, 1, 1, 30) - AIC: 7055.866720706452
ARIMA(1, 1, 1)x(0, 0, 0, 30) - AIC: 8176.326156774695
ARIMA(1, 1, 1)x(0, 0, 1, 30) - AIC: 7511.891854164342
ARIMA(1, 1, 1)x(0, 1, 0, 30) - AIC: 7764.548737250791
ARIMA(1, 1, 1)x(0, 1, 1, 30) - AIC: 6977.994960674036
ARIMA(1, 1, 1)x(1, 0, 0, 30) - AIC: 7536.00457764349
ARIMA(1, 1, 1)x(1, 0, 1, 30) - AIC: 7513.255179336516
ARIMA(1, 1, 1)x(1, 1, 0, 30) - AIC: 7042.881430168845
ARIMA(1, 1, 1)x(1, 1, 1, 30) - AIC: 6977.85554343104

Get the combination that results the minimum AIC


In [20]:
sorted_x = sorted(history.items(), key=operator.itemgetter(1))

In [21]:
param, param_seasonal =  sorted_x[0][0][0], sorted_x[0][0][1]

In [22]:
print(param)
print(param_seasonal)


(1, 0, 1)
(1, 1, 1, 30)

In [23]:
model = sm.tsa.statespace.SARIMAX(y,
                         order = param,
                         seasonal_order=param_seasonal,
                         enforce_stationarity=False,
                         enforce_invertibility=False)

results = model.fit()

In [24]:
print(results.summary())


                                 Statespace Model Results                                 
==========================================================================================
Dep. Variable:                         TotalPrice   No. Observations:                  374
Model:             SARIMAX(1, 0, 1)x(1, 1, 1, 30)   Log Likelihood               -3482.675
Date:                            Fri, 25 Aug 2017   AIC                           6975.351
Time:                                    15:30:14   BIC                           6994.972
Sample:                                12-01-2010   HQIC                          6983.141
                                     - 12-09-2011                                         
Covariance Type:                              opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1          0.3974      0.160      2.478      0.013       0.083       0.712
ma.L1          0.1162      0.168      0.690      0.490      -0.214       0.446
ar.S.L30      -0.1614      0.110     -1.472      0.141      -0.376       0.054
ma.S.L30      -0.6465      0.092     -7.016      0.000      -0.827      -0.466
sigma2      4.173e+08    1.2e-10   3.48e+18      0.000    4.17e+08    4.17e+08
===================================================================================
Ljung-Box (Q):                      263.04   Jarque-Bera (JB):               109.98
Prob(Q):                              0.00   Prob(JB):                         0.00
Heteroskedasticity (H):               3.10   Skew:                             0.79
Prob(H) (two-sided):                  0.00   Kurtosis:                         5.44
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
[2] Covariance matrix is singular or near-singular, with condition number 4.08e+34. Standard errors may be unstable.

In [25]:
# http://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAXResults.plot_diagnostics.html
results.plot_diagnostics(lags=1, figsize=(15,6))
plt.show()


Validating Forecasts

  • one-step ahead forecast
  • dynamic forecast

One-step ahead forecast


In [26]:
start_date = '2011-05-02'
pred = results.get_prediction(start=pd.to_datetime(start_date), dynamic=False)

In [27]:
pred_ci = pred.conf_int()

In [28]:
ax = y['2011':].plot(label='observed')

pred.predicted_mean.plot(ax=ax, label='One-step ahead Forecast', alpha=.7)

ax.fill_between(pred_ci.index,
               pred_ci.iloc[:, 0],
               pred_ci.iloc[:, 1], color='k',
               alpha=.2)

ax.set_xlabel('Date')
ax.set_ylabel('Sum of Total Price')

plt.legend()

plt.show()



In [29]:
y_forecasted = pred.predicted_mean
y_truth = y[start_date:]

mse = ((y_forecasted - y_truth) ** 2).mean()

print('The Mean Squared Error of our forecasts is {}'.format(round(mse,2))) 
#TODO: to check more


The Mean Squared Error of our forecasts is 318295109.63

Dynamic forecast


In [30]:
pred_dynamic = results.get_prediction(start=pd.to_datetime(start_date), dynamic=True)

pred_dynamic_ci = pred_dynamic.conf_int()

In [31]:
ax = y['2011':].plot(label='observed')

pred_dynamic.predicted_mean.plot(ax=ax, label='Dynamic Forecast', alpha=.7)

ax.fill_between(pred_dynamic_ci.index,
               pred_dynamic_ci.iloc[:, 0],
               pred_dynamic_ci.iloc[:, 1], color='k',
               alpha=.2)

ax.fill_betweenx(ax.get_ylim(), pd.to_datetime(start_date), y.index[-1], alpha=.1, zorder=-1)

ax.set_xlabel('Date')
ax.set_ylabel('Sum of Total Price')

plt.legend()

plt.show()



In [32]:
y_forecasted = pred_dynamic.predicted_mean
y_truth = y[start_date:]
mse = ((y_forecasted - y_truth) ** 2).mean()

print('The Mean Squared Error of our forecasts is {}'.format(round(mse, 2)))


The Mean Squared Error of our forecasts is 547949899.57

The one-step ahead results lower MSE than the dynamic.

Visualizing Forecasts


In [33]:
pred_uc = results.get_forecast(steps=30)

pred_ci = pred_uc.conf_int()

In [34]:
ax = y.plot(label='observed', figsize=(15,10))

pred_uc.predicted_mean.plot(ax=ax, label='Forecast')
ax.fill_between(pred_ci.index,
               pred_ci.iloc[:, 0],
               pred_ci.iloc[:, 1],
               color='k',
               alpha=.25)

ax.set_xlabel('Date')
ax.set_ylabel('Sum of Total Price')

plt.legend()
plt.show()



In [ ]:


In [ ]:


In [ ]: