Window function

rolling window --> how did i do last three days --> check everyday
expanding window --> all data equallu relevant --> old or new



In [1]:

    
%matplotlib inline
import matplotlib.pyplot as plt

%pylab inline
pylab.rcParams['figure.figsize'] = (19,6)









    



Populating the interactive namespace from numpy and matplotlib



In [2]:

    
import numpy as np
import pandas as pd

special thing about timeseries

data points relate to one another
not independent
campare and relate them
one way to do that is to look at how they change



In [9]:

    
ts = pd.Series(np.random.randn(20), 
             pd.date_range('7/1/16', 
                           freq='D', 
                           periods=20))
# shift my one period --> here one day
ts_lagged = ts.shift()



In [10]:

    
plt.plot(ts, color='blue')
plt.plot(ts_lagged, color='red')









    Out[10]:





[<matplotlib.lines.Line2D at 0x7f60d9b302e8>]



In [11]:

    
ts2 = pd.Series(np.random.randn(20), 
             pd.date_range('7/1/16', 
                           freq='H', 
                           periods=20))
# shift my one period --> here hourly --> but shift 5 hour
ts2_lagged = ts2.shift(5)



In [12]:

    
plt.plot(ts2, color='blue')
plt.plot(ts2_lagged, color='red')









    Out[12]:





[<matplotlib.lines.Line2D at 0x7f60d7a36e10>]



In [13]:

    
ts3_lagged = ts2.shift(-5)
plt.plot(ts2, color='blue')
plt.plot(ts2_lagged, color='red')
plt.plot(ts3_lagged, color='black')









    Out[13]:





[<matplotlib.lines.Line2D at 0x7f60d79a0240>]

moving aggregate measures of time series

window functions are like aggregate functions
it can be used in conjuction with .resample()



In [25]:

    
df = pd.DataFrame( np.random.randn(600, 3), 
                  index = pd.date_range('5/1/16', 
                                        freq='D', 
                                        periods=600), 
                  columns = ['A', 'B', 'C'])
df.head()



In [26]:

    
df.plot()









    Out[26]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f60d7694e10>



In [27]:

    
df.index









    Out[27]:





DatetimeIndex(['2016-05-01', '2016-05-02', '2016-05-03', '2016-05-04',
               '2016-05-05', '2016-05-06', '2016-05-07', '2016-05-08',
               '2016-05-09', '2016-05-10',
               ...
               '2017-12-12', '2017-12-13', '2017-12-14', '2017-12-15',
               '2017-12-16', '2017-12-17', '2017-12-18', '2017-12-19',
               '2017-12-20', '2017-12-21'],
              dtype='datetime64[ns]', length=600, freq='D')



In [28]:

    
# window averaging over 20 days coz freq days.
# window is simply row count
r = df.rolling(window=20)
r









    Out[28]:





Rolling [window=20,center=False,axis=0]



In [29]:

    
df['A'].plot(color='red')
r.mean()['A'].plot(color='blue')









    Out[29]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f60d7611160>



In [34]:

    
r.count().head()



In [35]:

    
r.A.count().head()









    Out[35]:





2016-05-01    1.0
2016-05-02    2.0
2016-05-03    3.0
2016-05-04    4.0
2016-05-05    5.0
Freq: D, Name: A, dtype: float64



In [42]:

    
r.quantile(.5).plot()









    Out[42]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f60d75a4fd0>



In [49]:

    
r.agg( [ 'sum', 'var' ])[15:25]

rolling custom function

.apply()



In [47]:

    
df.rolling(window=10, center=False).apply(lambda x: x[1] / x[2])[1:30]

generate rolling window function of monthly data from daily data



In [50]:

    
ts_long = pd.Series(np.random.rand(200), 
                    pd.date_range('7/1/16', 
                                  freq='D', 
                                  periods=200))
ts_long.head()









    Out[50]:





2016-07-01    0.333773
2016-07-02    0.726211
2016-07-03    0.608773
2016-07-04    0.676437
2016-07-05    0.243666
Freq: D, dtype: float64



In [53]:

    
# rolling window of 3 month at a time
ts_long.resample('M').mean().rolling(window=3).mean()









    Out[53]:





2016-07-31         NaN
2016-08-31         NaN
2016-09-30    0.496872
2016-10-31    0.507792
2016-11-30    0.534600
2016-12-31    0.530523
2017-01-31    0.544490
Freq: M, dtype: float64



In [54]:

    
ts_long.resample('M').mean().rolling(window=3).mean().plot()









    Out[54]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f60d748b860>

expanding window



In [55]:

    
df.expanding(min_periods=1).mean()[1:5]



In [56]:

    
df.expanding(min_periods=1).mean().plot()









    Out[56]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f60d751d2e8>

exponentially weight moving average



In [65]:

    
ts_ewma = pd.Series(np.random.rand(1000), 
                    pd.date_range('7/1/16', 
                                  freq='D', 
                                  periods=1000))
ts_ewma.ewm(span=60, freq='D', min_periods=0, adjust=True).mean().plot()









    Out[65]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f60d72ceb38>



In [64]:

    
ts_ewma.ewm(span=60, freq='D', min_periods=0, adjust=True).mean().plot()
ts_ewma.rolling(window=60).mean().plot()









    Out[64]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f60d72e8f98>



In [ ]:

	A	B	C
2016-05-01	-0.678741	0.143138	0.395669
2016-05-02	2.179080	-0.964481	-1.561436
2016-05-03	0.557114	0.328156	-0.837427
2016-05-04	0.260505	0.173312	1.125810
2016-05-05	0.423245	-0.995782	-1.106271

	A		B		C
	sum	var	sum	var	sum	var
2016-05-16	NaN	NaN	NaN	NaN	NaN	NaN
2016-05-17	NaN	NaN	NaN	NaN	NaN	NaN
2016-05-18	NaN	NaN	NaN	NaN	NaN	NaN
2016-05-19	NaN	NaN	NaN	NaN	NaN	NaN
2016-05-20	2.761720	1.164650	-2.715167	1.209857	-0.954722	1.135784
2016-05-21	3.482911	1.128646	-2.164679	1.241170	-1.919482	1.137293
2016-05-22	1.024378	0.912003	-1.523217	1.203928	-0.944793	1.034439
2016-05-23	0.535064	0.897917	-3.010987	1.251282	-2.052959	1.188015
2016-05-24	-0.343793	0.914912	-3.901344	1.260566	-2.123595	1.179131
2016-05-25	-0.820566	0.904173	-3.081535	1.225072	-1.466113	1.131530

	A	B	C
2016-05-02	NaN	NaN	NaN
2016-05-03	NaN	NaN	NaN
2016-05-04	NaN	NaN	NaN
2016-05-05	NaN	NaN	NaN
2016-05-06	NaN	NaN	NaN
2016-05-07	NaN	NaN	NaN
2016-05-08	NaN	NaN	NaN
2016-05-09	NaN	NaN	NaN
2016-05-10	3.911370	-2.939098	1.864565
2016-05-11	2.138593	1.893435	-0.743843
2016-05-12	0.615494	-0.174046	-1.017662
2016-05-13	0.543635	0.689320	3.373102
2016-05-14	3.758492	0.713652	-0.202137
2016-05-15	0.357502	5.999597	15.406560
2016-05-16	0.262833	-1.575695	0.077220
2016-05-17	4.320381	-0.178223	3.192568
2016-05-18	1.282391	-2.430427	-0.374619
2016-05-19	-0.411389	-0.600225	3.534808
2016-05-20	0.469168	0.501648	0.325519
2016-05-21	4.086939	-0.982241	-1.024264
2016-05-22	4.716901	1.344210	-2.812373
2016-05-23	0.378724	-26.836457	-0.771306
2016-05-24	3.668431	-0.078190	-0.278707
2016-05-25	-0.057940	-4.504863	-1.064075
2016-05-26	-0.750776	0.584616	-1.055829
2016-05-27	8.160375	-0.099540	-1.916392
2016-05-28	-5.108121	3.258866	-1.306131
2016-05-29	-0.151903	-2.147320	0.969909
2016-05-30	-4.121747	0.278558	0.301578

	A	B	C
2016-05-02	0.750169	-0.410671	-0.582884
2016-05-03	0.685818	-0.164396	-0.667731
2016-05-04	0.579489	-0.079969	-0.219346
2016-05-05	0.548241	-0.263131	-0.396731