EventVestor: Index Changes

In this notebook, we'll take a look at EventVestor's Index Changes dataset, available on the Quantopian Store. This dataset spans January 01, 2007 through the current day, and documents index additions and deletions to major S&P, Russell, and Nasdaq 100 indexes.

Blaze

Before we dig into the data, we want to tell you about how you generally access Quantopian Store data sets. These datasets are available through an API service known as Blaze. Blaze provides the Quantopian user with a convenient interface to access very large datasets.

Blaze provides an important function for accessing these datasets. Some of these sets are many millions of records. Bringing that data directly into Quantopian Research directly just is not viable. So Blaze allows us to provide a simple querying interface and shift the burden over to the server side.

It is common to use Blaze to reduce your dataset in size, convert it over to Pandas and then to use Pandas for further computation, manipulation and visualization.

Helpful links:

Once you've limited the size of your Blaze object, you can convert it to a Pandas DataFrames using:

from odo import odo
odo(expr, pandas.DataFrame)

Free samples and limits

One other key caveat: we limit the number of results returned from any given expression to 10,000 to protect against runaway memory usage. To be clear, you have access to all the data server side. We are limiting the size of the responses back from Blaze.

There is a free version of this dataset as well as a paid one. The free one includes about three years of historical data, though not up to the current day.

With preamble in place, let's get started:


In [1]:
# import the dataset
from quantopian.interactive.data.eventvestor import index_changes
# or if you want to import the free dataset, use:
# from quantopian.interactive.data.eventvestor import index_changes_free

# import data operations
from odo import odo
# import other libraries we will use
import pandas as pd

In [2]:
# Let's use blaze to understand the data a bit using Blaze dshape()
index_changes.dshape


Out[2]:
dshape("""var * {
  event_id: ?float64,
  asof_date: datetime,
  trade_date: ?datetime,
  symbol: ?string,
  event_type: ?string,
  event_headline: ?string,
  index_name: ?string,
  change_type: ?string,
  change_reason: ?string,
  event_rating: ?float64,
  timestamp: datetime,
  sid: ?int64
  }""")

In [3]:
# And how many rows are there?
# N.B. we're using a Blaze function to do this, not len()
index_changes.count()


Out[3]:
2510

In [4]:
# Let's see what the data looks like. We'll grab the first three rows.
index_changes[:3]


Out[4]:
event_id asof_date trade_date symbol event_type event_headline index_name change_type change_reason event_rating timestamp sid
0 174074 2007-01-02 2007-01-03 BLS Index Change BellSouth Corp. (BLS) removed from S&P 500 S&P 500 Deletion NaN 1 2007-01-03 948
1 174076 2007-01-02 2007-01-03 ESV Index Change ENSCO, Int'l (ESV) removed from S&P 400 S&P 400 Deletion NaN 1 2007-01-03 2621
2 174071 2007-01-02 2007-01-03 ESV Index Change ENSCO International (ESV) added to S&P 500 S&P 500 Addition NaN 1 2007-01-03 2621

Let's go over the columns:

  • event_id: the unique identifier for this event.
  • asof_date: EventVestor's timestamp of event capture.
  • trade_date: for event announcements made before trading ends, trade_date is the same as event_date. For announcements issued after market close, trade_date is next market open day.
  • symbol: stock ticker symbol of the affected company.
  • event_type: this should always be Index Change.
  • event_headline: a brief description of the event
  • index_name: name of the index affected. Values include S&P 400, S&P 500, S&P 600
  • change_type: Addition/Deletion of equity
  • change_reason: reason for addition/deletion of the equity from the index. Reasons include Acquired, Market Cap, Other.
  • event_rating: this is always 1. The meaning of this is uncertain.
  • timestamp: this is our timestamp on when we registered the data.
  • sid: the equity's unique identifier. Use this instead of the symbol. Note: this sid represents the company the shares of which are being purchased, not the acquiring entity.

We've done much of the data processing for you. Fields like timestamp and sid are standardized across all our Store Datasets, so the datasets are easy to combine. We have standardized the sid across all our equity databases.

We can select columns and rows with ease. Below, we'll fetch all 2015 deletions due to market cap.


In [5]:
deletions = index_changes[('2014-12-31' < index_changes['asof_date']) & 
                                        (index_changes['asof_date'] <'2016-01-01') & 
                                        (index_changes.change_type == "Deletion")&
                                        (index_changes.change_reason  == "Market Cap")]
# When displaying a Blaze Data Object, the printout is automatically truncated to ten rows.
deletions.sort('asof_date')


Out[5]:
event_id asof_date trade_date symbol event_type event_headline index_name change_type change_reason event_rating timestamp sid
0 1885908 2015-05-21 2015-05-22 WIN Index Change Windstream Holdings to be Removed from S&P Mid... S&P 400 Deletion Market Cap 1 2015-05-22 27019
1 1894211 2015-06-24 2015-06-24 ATI Index Change Allegheny Technologies to be Removed from S&P ... S&P 500 Deletion Market Cap 1 2015-06-25 24840
2 1894270 2015-06-24 2015-06-24 SMTC Index Change Semtech Corp. to be Removed from S&P MidCap 40... S&P 400 Deletion Market Cap 1 2015-06-25 6961
3 1894266 2015-06-24 2015-06-24 BTU Index Change Peabody Energy Corp. to be Removed from S&P Mi... S&P 400 Deletion Market Cap 1 2015-06-25 22660
4 1894278 2015-06-24 2015-06-24 HSC Index Change Harsco Corp. to be Removed from S&P MidCap 400... S&P 400 Deletion Market Cap 1 2015-06-25 3686
5 1894221 2015-06-24 2015-06-24 PQ Index Change PetroQuest Energy to be Removed from S&P Small... S&P 600 Deletion Market Cap 1 2015-06-25 19326
6 1894247 2015-06-24 2015-06-24 ARO Index Change Aeropostale to be Removed from S&P SmallCap 60... S&P 600 Deletion Market Cap 1 2015-06-25 23650
7 1894217 2015-06-24 2015-06-24 UNT Index Change Unit Corp. to be Removed from S&P MidCap 400 I... S&P 400 Deletion Market Cap 1 2015-06-25 7806
8 1894258 2015-06-24 2015-06-24 ZQK Index Change Quiksilver to be Removed from S&P SmallCap 600... S&P 600 Deletion Market Cap 1 2015-06-25 6317
9 1894293 2015-06-24 2015-06-24 FXCM Index Change FXCM to be Removed from S&P SmallCap 600 Index S&P 600 Deletion Market Cap 1 2015-06-25 40531
10 1895235 2015-06-26 2015-06-29 JBHT Index Change J.B. Hunt Transport Services to be Removed fro... S&P 400 Deletion Market Cap 1 2015-06-27 4108

Now suppose we want a DataFrame of the Blaze Data Object above, want to filter it further down to the S&P 600, and we only want the sid and the asof_date.


In [6]:
df = odo(deletions, pd.DataFrame)
df = df[df.index_name == "S&P 600"]
df = df[['sid', 'asof_date']]
df


Out[6]:
sid asof_date
1 23650 2015-06-24
4 40531 2015-06-24
6 19326 2015-06-24
9 6317 2015-06-24
12 1308 2015-06-29
15 20740 2015-07-06
16 8291 2015-07-06
19 6825 2015-07-14
20 20526 2015-07-17
22 1263 2015-07-24
23 1663 2015-07-24
24 13918 2015-07-24
27 24823 2015-08-19
28 21736 2015-09-21