Synthetic data examples

In this Notebook we will build synthetic data suitable to Alphalens analysis. This is useful to understand how Alphalens expects the input to be formatted and also it is a good testing environment to experiment with Alphalens.



In [1]:

    
%matplotlib inline
    
from numpy import nan
from pandas import (DataFrame, date_range)
import matplotlib.pyplot as plt

from alphalens.tears import (create_returns_tear_sheet,
                      create_information_tear_sheet,
                      create_turnover_tear_sheet,
                      create_summary_tear_sheet,
                      create_full_tear_sheet,
                      create_event_returns_tear_sheet,
                      create_event_study_tear_sheet)

from alphalens.utils import get_clean_factor_and_forward_returns



In [2]:

    
#
# build price
#
price_index = date_range(start='2015-1-10', end='2015-2-28')
price_index.name = 'date'
tickers = ['A', 'B', 'C', 'D', 'E', 'F']
data = [[1.0025**i, 1.005**i, 1.00**i, 0.995**i, 1.005**i, 1.00**i]
        for i in range(1, 51)]
prices = DataFrame(index=price_index, columns=tickers, data=data)



In [3]:

    
prices.plot()









    Out[3]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f5d6d002400>



In [4]:

    
prices.head()

Now it's time to build the events DataFrame, the input will give to Alphalens.

Alphalens calculates statistics for those dates where the input DataFrame has values (not NaN). So to compute the performace analysis on specific dates and securities (like an event study) then we have to make sure the input DataFrame contains valid values only on those date/security combinations where the event happens. All the other values in the DataFrame must be NaN or not present.

Also, make sure the event values are positive (it doesn't matter the value but they must be positive) if you intend to go long on the events and use negative values if you intent to go short. This impacts the cumulative returns plots.

Let's create the event DataFrame where we "mark" (any value) each day a security price fall below 30$.



In [5]:

    
#
# build factor
#
factor_index = date_range(start='2015-1-15', end='2015-2-13')
factor_index.name = 'date'

event = DataFrame(index=factor_index, columns=tickers,
                   data=[[1, nan, nan, nan, nan, nan],
                         [4, nan, nan, 7, nan, nan],
                         [nan, nan, nan, nan, nan, nan],
                         [nan, 3, nan, 2, nan, nan],
                         [1, nan, nan, nan, nan, nan],
                         [nan, nan, 2, nan, nan, nan],
                         [nan, nan, nan, 2, nan, nan],
                         [nan, nan, nan, 1, nan, nan],
                         [2, nan, nan, nan, nan, nan],
                         [nan, nan, nan, nan, 5, nan],
                         [nan, nan, nan, 2, nan, nan],
                         [nan, nan, nan, nan, nan, nan],
                         [2, nan, nan, nan, nan, nan],
                         [nan, nan, nan, nan, nan, 5],
                         [nan, nan, nan, 1, nan, nan],
                         [nan, nan, nan, nan, 4, nan],
                         [5, nan, nan, 4, nan, nan],
                         [nan, nan, nan, 3, nan, nan],
                         [nan, nan, nan, 4, nan, nan],
                         [nan, nan, 2, nan, nan, nan],
                         [5, nan, nan, nan, nan, nan],
                         [nan, 1, nan, nan, nan, nan],
                         [nan, nan, nan, nan, 4, nan],
                         [0, nan, nan, nan, nan, nan],
                         [nan, 5, nan, nan, nan, 4],
                         [nan, nan, nan, nan, nan, nan],
                         [nan, nan, 5, nan, nan, 3],
                         [nan, nan, 1, 2, 3, nan],
                         [nan, nan, nan, 5, nan, nan],
                         [nan, nan, 1, nan, 3, nan]]).stack()
factor_groups = {'A': 'Group1', 'B': 'Group2', 'C': 'Group1', 'D': 'Group2', 'E': 'Group1', 'F': 'Group2'}



In [6]:

    
event.head(10)









    Out[6]:





date         
2015-01-15  A    1.0
2015-01-16  A    4.0
            D    7.0
2015-01-18  B    3.0
            D    2.0
2015-01-19  A    1.0
2015-01-20  C    2.0
2015-01-21  D    2.0
2015-01-22  D    1.0
2015-01-23  A    2.0
dtype: float64



In [7]:

    
event_data = get_clean_factor_and_forward_returns(event, prices,
                                                  quantiles=None, bins=1,
                                                  periods=(1, 2, 3, 4, 5, 10, 15), filter_zscore=None)









    



Dropped 2.9% entries from factor data: 0.0% in forward returns computation and 2.9% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!



In [8]:

    
event_data.head(10)









    Out[8]:






  
    
      
      
      1D
      2D
      3D
      4D
      5D
      10D
      15D
      factor
      factor_quantile
    
    
      date
      asset
      
      
      
      
      
      
      
      
      
    
  
  
    
      2015-01-15
      A
      0.0025
      0.005006
      0.007519
      0.010038
      0.012563
      0.025283
      0.038163
      1.0
      1.0
    
    
      2015-01-16
      A
      0.0025
      0.005006
      0.007519
      0.010038
      0.012563
      0.025283
      0.038163
      4.0
      1.0
    
    
      D
      -0.0050
      -0.009975
      -0.014925
      -0.019850
      -0.024751
      -0.048890
      -0.072431
      7.0
      1.0
    
    
      2015-01-18
      B
      0.0050
      0.010025
      0.015075
      0.020151
      0.025251
      0.051140
      0.077683
      3.0
      1.0
    
    
      D
      -0.0050
      -0.009975
      -0.014925
      -0.019850
      -0.024751
      -0.048890
      -0.072431
      2.0
      1.0
    
    
      2015-01-19
      A
      0.0025
      0.005006
      0.007519
      0.010038
      0.012563
      0.025283
      0.038163
      1.0
      1.0
    
    
      2015-01-20
      C
      0.0000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      0.000000
      2.0
      1.0
    
    
      2015-01-21
      D
      -0.0050
      -0.009975
      -0.014925
      -0.019850
      -0.024751
      -0.048890
      -0.072431
      2.0
      1.0
    
    
      2015-01-22
      D
      -0.0050
      -0.009975
      -0.014925
      -0.019850
      -0.024751
      -0.048890
      -0.072431
      1.0
      1.0
    
    
      2015-01-23
      A
      0.0025
      0.005006
      0.007519
      0.010038
      0.012563
      0.025283
      0.038163
      2.0
      1.0



In [9]:

    
create_event_study_tear_sheet(event_data, prices, avgretplot=(5, 10))









    



Quantiles Statistics






    






  
    
      
      min
      max
      mean
      std
      count
      count %
    
    
      factor_quantile
      
      
      
      
      
      
    
  
  
    
      1.0
      1.0
      7.0
      3.058824
      1.613225
      34
      100.0
    
  








    





<matplotlib.figure.Figure at 0x7f5d6cf9e128>






    












    



/home/lucasca/.local/lib/python3.5/site-packages/matplotlib/axes/_axes.py:2818: MatplotlibDeprecationWarning: Use of None object as fmt keyword argument to suppress plotting of data values is deprecated since 1.4; use the string "none" instead.
  warnings.warn(msg, mplDeprecation, stacklevel=1)






    





<matplotlib.figure.Figure at 0x7f5d6ad3ca90>

	A	B	C	D	E	F
date
2015-01-10	1.002500	1.005000	1.0	0.995000	1.005000	1.0
2015-01-11	1.005006	1.010025	1.0	0.990025	1.010025	1.0
2015-01-12	1.007519	1.015075	1.0	0.985075	1.015075	1.0
2015-01-13	1.010038	1.020151	1.0	0.980150	1.020151	1.0
2015-01-14	1.012563	1.025251	1.0	0.975249	1.025251	1.0

		1D	2D	3D	4D	5D	10D	15D	factor	factor_quantile
date	asset
2015-01-15	A	0.0025	0.005006	0.007519	0.010038	0.012563	0.025283	0.038163	1.0	1.0
2015-01-16	A	0.0025	0.005006	0.007519	0.010038	0.012563	0.025283	0.038163	4.0	1.0
2015-01-16	D	-0.0050	-0.009975	-0.014925	-0.019850	-0.024751	-0.048890	-0.072431	7.0	1.0
2015-01-18	B	0.0050	0.010025	0.015075	0.020151	0.025251	0.051140	0.077683	3.0	1.0
2015-01-18	D	-0.0050	-0.009975	-0.014925	-0.019850	-0.024751	-0.048890	-0.072431	2.0	1.0
2015-01-19	A	0.0025	0.005006	0.007519	0.010038	0.012563	0.025283	0.038163	1.0	1.0
2015-01-20	C	0.0000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	2.0	1.0
2015-01-21	D	-0.0050	-0.009975	-0.014925	-0.019850	-0.024751	-0.048890	-0.072431	2.0	1.0
2015-01-22	D	-0.0050	-0.009975	-0.014925	-0.019850	-0.024751	-0.048890	-0.072431	1.0	1.0
2015-01-23	A	0.0025	0.005006	0.007519	0.010038	0.012563	0.025283	0.038163	2.0	1.0