Window function

  • rolling window --> how did i do last three days --> check everyday
  • expanding window --> all data equallu relevant --> old or new

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt

%pylab inline
pylab.rcParams['figure.figsize'] = (19,6)


Populating the interactive namespace from numpy and matplotlib

In [2]:
import numpy as np
import pandas as pd

special thing about timeseries

  • data points relate to one another
  • not independent
  • campare and relate them
  • one way to do that is to look at how they change

In [9]:
ts = pd.Series(np.random.randn(20), 
             pd.date_range('7/1/16', 
                           freq='D', 
                           periods=20))
# shift my one period --> here one day
ts_lagged = ts.shift()

In [10]:
plt.plot(ts, color='blue')
plt.plot(ts_lagged, color='red')


Out[10]:
[<matplotlib.lines.Line2D at 0x7f60d9b302e8>]

In [11]:
ts2 = pd.Series(np.random.randn(20), 
             pd.date_range('7/1/16', 
                           freq='H', 
                           periods=20))
# shift my one period --> here hourly --> but shift 5 hour
ts2_lagged = ts2.shift(5)

In [12]:
plt.plot(ts2, color='blue')
plt.plot(ts2_lagged, color='red')


Out[12]:
[<matplotlib.lines.Line2D at 0x7f60d7a36e10>]

In [13]:
ts3_lagged = ts2.shift(-5)
plt.plot(ts2, color='blue')
plt.plot(ts2_lagged, color='red')
plt.plot(ts3_lagged, color='black')


Out[13]:
[<matplotlib.lines.Line2D at 0x7f60d79a0240>]

moving aggregate measures of time series

  • window functions are like aggregate functions
  • it can be used in conjuction with .resample()

In [25]:
df = pd.DataFrame( np.random.randn(600, 3), 
                  index = pd.date_range('5/1/16', 
                                        freq='D', 
                                        periods=600), 
                  columns = ['A', 'B', 'C'])
df.head()


Out[25]:
A B C
2016-05-01 -0.678741 0.143138 0.395669
2016-05-02 2.179080 -0.964481 -1.561436
2016-05-03 0.557114 0.328156 -0.837427
2016-05-04 0.260505 0.173312 1.125810
2016-05-05 0.423245 -0.995782 -1.106271

In [26]:
df.plot()


Out[26]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f60d7694e10>

In [27]:
df.index


Out[27]:
DatetimeIndex(['2016-05-01', '2016-05-02', '2016-05-03', '2016-05-04',
               '2016-05-05', '2016-05-06', '2016-05-07', '2016-05-08',
               '2016-05-09', '2016-05-10',
               ...
               '2017-12-12', '2017-12-13', '2017-12-14', '2017-12-15',
               '2017-12-16', '2017-12-17', '2017-12-18', '2017-12-19',
               '2017-12-20', '2017-12-21'],
              dtype='datetime64[ns]', length=600, freq='D')

In [28]:
# window averaging over 20 days coz freq days.
# window is simply row count
r = df.rolling(window=20)
r


Out[28]:
Rolling [window=20,center=False,axis=0]

In [29]:
df['A'].plot(color='red')
r.mean()['A'].plot(color='blue')


Out[29]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f60d7611160>

In [34]:
r.count().head()


Out[34]:
A B C
2016-05-01 1.0 1.0 1.0
2016-05-02 2.0 2.0 2.0
2016-05-03 3.0 3.0 3.0
2016-05-04 4.0 4.0 4.0
2016-05-05 5.0 5.0 5.0

In [35]:
r.A.count().head()


Out[35]:
2016-05-01    1.0
2016-05-02    2.0
2016-05-03    3.0
2016-05-04    4.0
2016-05-05    5.0
Freq: D, Name: A, dtype: float64

In [42]:
r.quantile(.5).plot()


Out[42]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f60d75a4fd0>

In [49]:
r.agg( [ 'sum', 'var' ])[15:25]


Out[49]:
A B C
sum var sum var sum var
2016-05-16 NaN NaN NaN NaN NaN NaN
2016-05-17 NaN NaN NaN NaN NaN NaN
2016-05-18 NaN NaN NaN NaN NaN NaN
2016-05-19 NaN NaN NaN NaN NaN NaN
2016-05-20 2.761720 1.164650 -2.715167 1.209857 -0.954722 1.135784
2016-05-21 3.482911 1.128646 -2.164679 1.241170 -1.919482 1.137293
2016-05-22 1.024378 0.912003 -1.523217 1.203928 -0.944793 1.034439
2016-05-23 0.535064 0.897917 -3.010987 1.251282 -2.052959 1.188015
2016-05-24 -0.343793 0.914912 -3.901344 1.260566 -2.123595 1.179131
2016-05-25 -0.820566 0.904173 -3.081535 1.225072 -1.466113 1.131530

rolling custom function

  • .apply()

In [47]:
df.rolling(window=10, center=False).apply(lambda x: x[1] / x[2])[1:30]


Out[47]:
A B C
2016-05-02 NaN NaN NaN
2016-05-03 NaN NaN NaN
2016-05-04 NaN NaN NaN
2016-05-05 NaN NaN NaN
2016-05-06 NaN NaN NaN
2016-05-07 NaN NaN NaN
2016-05-08 NaN NaN NaN
2016-05-09 NaN NaN NaN
2016-05-10 3.911370 -2.939098 1.864565
2016-05-11 2.138593 1.893435 -0.743843
2016-05-12 0.615494 -0.174046 -1.017662
2016-05-13 0.543635 0.689320 3.373102
2016-05-14 3.758492 0.713652 -0.202137
2016-05-15 0.357502 5.999597 15.406560
2016-05-16 0.262833 -1.575695 0.077220
2016-05-17 4.320381 -0.178223 3.192568
2016-05-18 1.282391 -2.430427 -0.374619
2016-05-19 -0.411389 -0.600225 3.534808
2016-05-20 0.469168 0.501648 0.325519
2016-05-21 4.086939 -0.982241 -1.024264
2016-05-22 4.716901 1.344210 -2.812373
2016-05-23 0.378724 -26.836457 -0.771306
2016-05-24 3.668431 -0.078190 -0.278707
2016-05-25 -0.057940 -4.504863 -1.064075
2016-05-26 -0.750776 0.584616 -1.055829
2016-05-27 8.160375 -0.099540 -1.916392
2016-05-28 -5.108121 3.258866 -1.306131
2016-05-29 -0.151903 -2.147320 0.969909
2016-05-30 -4.121747 0.278558 0.301578

generate rolling window function of monthly data from daily data


In [50]:
ts_long = pd.Series(np.random.rand(200), 
                    pd.date_range('7/1/16', 
                                  freq='D', 
                                  periods=200))
ts_long.head()


Out[50]:
2016-07-01    0.333773
2016-07-02    0.726211
2016-07-03    0.608773
2016-07-04    0.676437
2016-07-05    0.243666
Freq: D, dtype: float64

In [53]:
# rolling window of 3 month at a time
ts_long.resample('M').mean().rolling(window=3).mean()


Out[53]:
2016-07-31         NaN
2016-08-31         NaN
2016-09-30    0.496872
2016-10-31    0.507792
2016-11-30    0.534600
2016-12-31    0.530523
2017-01-31    0.544490
Freq: M, dtype: float64

In [54]:
ts_long.resample('M').mean().rolling(window=3).mean().plot()


Out[54]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f60d748b860>

expanding window


In [55]:
df.expanding(min_periods=1).mean()[1:5]


Out[55]:
A B C
2016-05-02 0.750169 -0.410671 -0.582884
2016-05-03 0.685818 -0.164396 -0.667731
2016-05-04 0.579489 -0.079969 -0.219346
2016-05-05 0.548241 -0.263131 -0.396731

In [56]:
df.expanding(min_periods=1).mean().plot()


Out[56]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f60d751d2e8>

exponentially weight moving average


In [65]:
ts_ewma = pd.Series(np.random.rand(1000), 
                    pd.date_range('7/1/16', 
                                  freq='D', 
                                  periods=1000))
ts_ewma.ewm(span=60, freq='D', min_periods=0, adjust=True).mean().plot()


Out[65]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f60d72ceb38>

In [64]:
ts_ewma.ewm(span=60, freq='D', min_periods=0, adjust=True).mean().plot()
ts_ewma.rolling(window=60).mean().plot()


Out[64]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f60d72e8f98>

In [ ]: