In this Notebook we will build synthetic data suitable to Alphalens analysis. This is useful to understand how Alphalens expects the input to be formatted and also it is a good testing environment to experiment with Alphalens.
In [1]:
%matplotlib inline
from numpy import nan
from pandas import (DataFrame, date_range)
import matplotlib.pyplot as plt
from alphalens.tears import (create_returns_tear_sheet,
create_information_tear_sheet,
create_turnover_tear_sheet,
create_summary_tear_sheet,
create_full_tear_sheet,
create_event_returns_tear_sheet,
create_event_study_tear_sheet)
from alphalens.utils import get_clean_factor_and_forward_returns
In [2]:
#
# build price
#
price_index = date_range(start='2015-1-10', end='2015-2-28')
price_index.name = 'date'
tickers = ['A', 'B', 'C', 'D', 'E', 'F']
data = [[1.0025**i, 1.005**i, 1.00**i, 0.995**i, 1.005**i, 1.00**i]
for i in range(1, 51)]
prices = DataFrame(index=price_index, columns=tickers, data=data)
In [3]:
prices.plot()
Out[3]:
In [4]:
prices.head()
Out[4]:
Now it's time to build the events DataFrame, the input will give to Alphalens.
Alphalens calculates statistics for those dates where the input DataFrame has values (not NaN). So to compute the performace analysis on specific dates and securities (like an event study) then we have to make sure the input DataFrame contains valid values only on those date/security combinations where the event happens. All the other values in the DataFrame must be NaN or not present.
Also, make sure the event values are positive (it doesn't matter the value but they must be positive) if you intend to go long on the events and use negative values if you intent to go short. This impacts the cumulative returns plots.
Let's create the event DataFrame where we "mark" (any value) each day a security price fall below 30$.
In [5]:
#
# build factor
#
factor_index = date_range(start='2015-1-15', end='2015-2-13')
factor_index.name = 'date'
event = DataFrame(index=factor_index, columns=tickers,
data=[[1, nan, nan, nan, nan, nan],
[4, nan, nan, 7, nan, nan],
[nan, nan, nan, nan, nan, nan],
[nan, 3, nan, 2, nan, nan],
[1, nan, nan, nan, nan, nan],
[nan, nan, 2, nan, nan, nan],
[nan, nan, nan, 2, nan, nan],
[nan, nan, nan, 1, nan, nan],
[2, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, 5, nan],
[nan, nan, nan, 2, nan, nan],
[nan, nan, nan, nan, nan, nan],
[2, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, 5],
[nan, nan, nan, 1, nan, nan],
[nan, nan, nan, nan, 4, nan],
[5, nan, nan, 4, nan, nan],
[nan, nan, nan, 3, nan, nan],
[nan, nan, nan, 4, nan, nan],
[nan, nan, 2, nan, nan, nan],
[5, nan, nan, nan, nan, nan],
[nan, 1, nan, nan, nan, nan],
[nan, nan, nan, nan, 4, nan],
[0, nan, nan, nan, nan, nan],
[nan, 5, nan, nan, nan, 4],
[nan, nan, nan, nan, nan, nan],
[nan, nan, 5, nan, nan, 3],
[nan, nan, 1, 2, 3, nan],
[nan, nan, nan, 5, nan, nan],
[nan, nan, 1, nan, 3, nan]]).stack()
factor_groups = {'A': 'Group1', 'B': 'Group2', 'C': 'Group1', 'D': 'Group2', 'E': 'Group1', 'F': 'Group2'}
In [6]:
event.head(10)
Out[6]:
In [7]:
event_data = get_clean_factor_and_forward_returns(event, prices,
quantiles=None, bins=1,
periods=(1, 2, 3, 4, 5, 10, 15), filter_zscore=None)
In [8]:
event_data.head(10)
Out[8]:
In [9]:
create_event_study_tear_sheet(event_data, prices, avgretplot=(5, 10))