Alphalens: event study

While Alphalens is a tool designed to evaluate a cross-sectional signal which can be used to rank many securities each day, we can still make use of Alphalens returns analysis functions, a subset of Alphalens, to create a meaningful event study.

An event study is a statistical method to assess the impact of a particular event on the value of a stock. In this example we will evalute what happens to stocks whose price fall below 30$


In [1]:
%pylab inline --no-import-all
import alphalens
import pandas as pd
import numpy as np
import datetime


Populating the interactive namespace from numpy and matplotlib
/home/lucasca/.local/lib/python3.5/site-packages/statsmodels/compat/pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
  from pandas.core import datetools

In [2]:
import warnings
warnings.filterwarnings('ignore')

Below is a simple mapping of tickers to sectors for a universe of 500 large cap stocks.


In [3]:
tickers = [ 'ACN', 'ATVI', 'ADBE', 'AMD', 'AKAM', 'ADS', 'GOOGL', 'GOOG', 'APH', 'ADI', 'ANSS', 'AAPL',
'AVGO', 'CA', 'CDNS', 'CSCO', 'CTXS', 'CTSH', 'GLW', 'CSRA', 'DXC', 'EBAY', 'EA', 'FFIV', 'FB',
'FLIR', 'IT', 'GPN', 'HRS', 'HPE', 'HPQ', 'INTC', 'IBM', 'INTU', 'JNPR', 'KLAC', 'LRCX', 'MA', 'MCHP',
'MSFT', 'MSI', 'NTAP', 'NFLX', 'NVDA', 'ORCL', 'PAYX', 'PYPL', 'QRVO', 'QCOM', 'RHT', 'CRM', 'STX',
'AMG', 'AFL', 'ALL', 'AXP', 'AIG', 'AMP', 'AON', 'AJG', 'AIZ', 'BAC', 'BK', 'BBT', 'BRK.B', 'BLK', 'HRB',
'BHF', 'COF', 'CBOE', 'SCHW', 'CB', 'CINF', 'C', 'CFG', 'CME', 'CMA', 'DFS', 'ETFC', 'RE', 'FITB', 'BEN',
'GS', 'HIG', 'HBAN', 'ICE', 'IVZ', 'JPM', 'KEY', 'LUK', 'LNC', 'L', 'MTB', 'MMC', 'MET', 'MCO', 'MS',
'NDAQ', 'NAVI', 'NTRS', 'PBCT', 'PNC', 'PFG', 'PGR', 'PRU', 'RJF', 'RF', 'SPGI', 'STT', 'STI', 'SYF', 'TROW',
'ABT', 'ABBV', 'AET', 'A', 'ALXN', 'ALGN', 'AGN', 'ABC', 'AMGN', 'ANTM', 'BCR', 'BAX', 'BDX', 'BIIB', 'BSX',
'BMY', 'CAH', 'CELG', 'CNC', 'CERN', 'CI', 'COO', 'DHR', 'DVA', 'XRAY', 'EW', 'EVHC', 'ESRX', 'GILD', 'HCA',
'HSIC', 'HOLX', 'HUM', 'IDXX', 'ILMN', 'INCY', 'ISRG', 'IQV', 'JNJ', 'LH', 'LLY', 'MCK', 'MDT', 'MRK', 'MTD',
'MYL', 'PDCO', 'PKI', 'PRGO', 'PFE', 'DGX', 'REGN', 'RMD', 'SYK', 'TMO', 'UNH', 'UHS', 'VAR', 'VRTX', 'WAT',
'MMM', 'AYI', 'ALK', 'ALLE', 'AAL', 'AME', 'AOS', 'ARNC', 'BA', 'CHRW', 'CAT', 'CTAS', 'CSX', 'CMI', 'DE',
'DAL', 'DOV', 'ETN', 'EMR', 'EFX', 'EXPD', 'FAST', 'FDX', 'FLS', 'FLR', 'FTV', 'FBHS', 'GD', 'GE', 'GWW',
'HON', 'INFO', 'ITW', 'IR', 'JEC', 'JBHT', 'JCI', 'KSU', 'LLL', 'LMT', 'MAS', 'NLSN', 'NSC', 'NOC', 'PCAR',
'PH', 'PNR', 'PWR', 'RTN', 'RSG', 'RHI', 'ROK', 'COL', 'ROP', 'LUV', 'SRCL', 'TXT', 'TDG', 'UNP', 'UAL',
'AES', 'LNT', 'AEE', 'AEP', 'AWK', 'CNP', 'CMS', 'ED', 'D', 'DTE', 'DUK', 'EIX', 'ETR', 'ES', 'EXC']

Load the prices.


In [4]:
import pandas_datareader.data as web
pan = web.DataReader(tickers, "google", datetime.datetime(2015, 6, 1),  datetime.datetime(2017, 1, 1))

In [5]:
pan = pan.transpose(2,1,0)

Now it's time to build the events DataFrame, the input will give to Alphalens.

Alphalens calculates statistics for those dates where the input DataFrame has values (not NaN). So to compute the performace analysis on specific dates and securities (like an event study) then we have to make sure the input DataFrame contains valid values only on those date/security combinations where the event happens. All the other values in the DataFrame must be NaN or not present.

Also, make sure the event values are positive (it doesn't matter the value but they must be positive) if you intend to go long on the events and use negative values if you intent to go short. This impacts the cumulative returns plots.

Let's create the event DataFrame where we "mark" (any value) each day a security price fall below 30$.


In [6]:
today_price = pan.loc[:,:,'Open']
yesterday_price = today_price.shift(1)
events = today_price[ (today_price < 30.0) & (yesterday_price >= 30) ]
events = events.stack()
events.index = events.index.set_names(['date', 'asset'])
events = events.astype(float)
events


Out[6]:
date        asset
2015-06-02  ETFC     29.32
2015-06-03  CA       29.96
2015-06-04  LNT      29.64
            PWR      29.85
2015-06-15  CA       29.93
2015-06-18  PWR      29.87
2015-06-25  PWR      29.63
2015-06-29  CA       29.48
2015-06-30  ETFC     29.74
            HRB      29.73
2015-07-07  INTC     29.87
2015-07-10  LNT      29.80
2015-07-17  INTC     29.66
2015-07-20  EBAY     26.89
2015-07-22  FLIR     29.62
            LNT      29.81
2015-07-24  ARNC     29.73
            CA       29.81
            ETFC     28.97
2015-07-29  LNT      29.92
2015-08-03  ARNC     29.46
2015-08-11  CSX      29.98
2015-08-20  FLIR     29.90
2015-08-24  AOS      28.94
            LNT      29.62
            NTAP     28.96
            PGR      28.73
            SCHW     28.44
2015-08-26  NTAP     29.64
            SCHW     29.87
                     ...  
2016-06-08  IVZ      29.84
2016-06-14  SYF      29.26
2016-06-16  AAL      29.80
2016-06-23  GE       29.15
2016-06-24  AAL      28.20
            ARNC     28.68
2016-06-27  FLIR     29.51
            MAS      29.77
2016-07-15  STX      29.40
2016-08-11  SCHW     29.71
2016-08-23  GE       29.95
2016-09-07  GE       29.70
2016-09-12  ARNC     28.56
2016-10-11  ARNC     29.73
2016-10-12  FLIR     29.97
2016-10-17  IVZ      29.97
2016-10-19  IVZ      29.74
2016-10-20  EBAY     29.50
2016-10-26  IVZ      29.40
2016-11-02  MAS      29.94
2016-11-04  CSCO     29.98
            PFE      29.91
2016-11-14  EXC      29.90
2016-11-17  CSCO     29.91
2016-11-23  CSCO     29.83
2016-11-29  CSCO     29.94
2016-12-06  JEC      29.96
2016-12-09  CSCO     29.98
2016-12-14  EBAY     29.85
2016-12-15  JEC      29.94
Length: 145, dtype: float64

The pricing data passed to alphalens should contain the entry price for the assets so it must reflect the next available price after an event was observed at a given timestamp. Those prices must not be used in the calculation of the events for that time. Always double check to ensure you are not introducing lookahead bias to your study.

The pricing data must also contain the exit price for the assets, for period 1 the price at the next timestamp will be used, for period 2 the price after 2 timestats will be used and so on.

While Alphalens is time frequency agnostic, in our example we build 'pricing' DataFrame so that for each event timestamp it contains the assets open price for the next day afer the event is detected, this price will be used as the assets entry price. Also, we are not adding additional prices so the assets exit price will be the following days open prices (how many days depends on 'periods' argument).


In [7]:
pricing = pan.loc[:,:,'Open'].iloc[1:]

Run Event Style Returns Analysis

Before running Alphalens beware of some important options:


In [8]:
# we don't want any filtering to be done

filter_zscore = None

In [9]:
# We want to have only one  bin/quantile. So we can either use quantiles=1 or bins=1

quantiles = None
bins      = 1

# Beware that in pandas versions below 0.20.0 there were few bugs in panda.qcut and pandas.cut
# that resulted in ValueError exception to be thrown when identical values were present in the
# dataframe and 1 quantile/bin was selected.
# As a workaroung use the bins custom range option that include all your values. E.g.

quantiles = None
bins      = [-1000000,1000000]

In [10]:
# You don't have to directly set 'long_short' option when running alphalens.tears.create_event_study_tear_sheet
# But in case you are making use of other Alphalens functions make sure to set 'long_short=False'
# if you set 'long_short=True' Alphalens will perform forward return demeaning and that makes sense only
# in a dollar neutral portfolio. With an event style signal you cannot usually create a dollar neutral
# long/short portfolio

long_short = False

In [11]:
factor_data = alphalens.utils.get_clean_factor_and_forward_returns(events, 
                                                                   pricing, 
                                                                   quantiles=None,
                                                                   bins=1,
                                                                   periods=(1, 2, 3, 4, 5, 6, 10),
                                                                   filter_zscore=None)


Dropped 29.0% entries from factor data: 29.0% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!

In [12]:
alphalens.tears.create_event_study_tear_sheet(factor_data, pricing, avgretplot=(5, 10))


Quantiles Statistics
min max mean std count count %
factor_quantile
1 26.89 29.99 29.645728 0.419674 103 100.0
<matplotlib.figure.Figure at 0x7fe1b5b5b748>
<matplotlib.figure.Figure at 0x7fe1b49cf710>

If we wanted to analyze the performance of short signal, we only had to switch from positive to negative event values


In [13]:
events = -events

In [14]:
factor_data = alphalens.utils.get_clean_factor_and_forward_returns(events, 
                                                                   pricing, 
                                                                   quantiles=None,
                                                                   bins=1,
                                                                   periods=(1, 2, 3, 4, 5, 6, 10),
                                                                   filter_zscore=None)


Dropped 29.0% entries from factor data: 29.0% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!

In [15]:
alphalens.tears.create_event_study_tear_sheet(factor_data, pricing, avgretplot=(5, 10))


Quantiles Statistics
min max mean std count count %
factor_quantile
1 -29.99 -26.89 -29.645728 0.419674 103 100.0
<matplotlib.figure.Figure at 0x7fe1b5386e48>
<matplotlib.figure.Figure at 0x7fe1b413aa90>