EventVestor: Stock Splits

In this notebook, we'll take a look at EventVestor's Stock Splits dataset, available on the Quantopian Store. This dataset spans January 01, 2007 through the current day, and documents stock splits and reverse stock splits.

Blaze

Before we dig into the data, we want to tell you about how you generally access Quantopian Store data sets. These datasets are available through an API service known as Blaze. Blaze provides the Quantopian user with a convenient interface to access very large datasets.

Blaze provides an important function for accessing these datasets. Some of these sets are many millions of records. Bringing that data directly into Quantopian Research directly just is not viable. So Blaze allows us to provide a simple querying interface and shift the burden over to the server side.

It is common to use Blaze to reduce your dataset in size, convert it over to Pandas and then to use Pandas for further computation, manipulation and visualization.

Helpful links:

Once you've limited the size of your Blaze object, you can convert it to a Pandas DataFrames using:

from odo import odo
odo(expr, pandas.DataFrame)

Free samples and limits

One other key caveat: we limit the number of results returned from any given expression to 10,000 to protect against runaway memory usage. To be clear, you have access to all the data server side. We are limiting the size of the responses back from Blaze.

There is a free version of this dataset as well as a paid one. The free one includes about three years of historical data, though not up to the current day.

With preamble in place, let's get started:


In [3]:
# import the dataset
from quantopian.interactive.data.eventvestor import stock_splits
# or if you want to import the free dataset, use:
# from quantopian.data.eventvestor import stock_splits_free

# import data operations
from odo import odo
# import other libraries we will use
import pandas as pd

In [4]:
# Let's use blaze to understand the data a bit using Blaze dshape()
stock_splits.dshape


Out[4]:
dshape("""var * {
  event_id: ?float64,
  asof_date: datetime,
  trade_date: ?datetime,
  symbol: ?string,
  event_type: ?string,
  event_headline: ?string,
  split_type: ?string,
  split_factor: ?string,
  new_shares: ?float64,
  old_shares: ?float64,
  effective_date: ?datetime,
  event_rating: ?float64,
  timestamp: datetime,
  sid: ?int64
  }""")

In [5]:
# And how many rows are there?
# N.B. we're using a Blaze function to do this, not len()
stock_splits.count()


Out[5]:
1062

In [6]:
# Let's see what the data looks like. We'll grab the first three rows.
stock_splits[:3]


Out[6]:
event_id asof_date trade_date symbol event_type event_headline split_type split_factor new_shares old_shares effective_date event_rating timestamp sid
0 61191 2007-01-09 2007-01-09 MDCI Stock Split Medical Action announces 3-for-2 stock split, ... Split 3-for-2 3 2 NaT 1 2007-01-10 4737
1 61190 2007-01-09 2007-01-09 SSI Stock Split Stage Stores announces 3-for-2 stock split, pa... Split 3-for-2 3 2 NaT 1 2007-01-10 23395
2 61189 2007-01-17 2007-01-17 APH Stock Split Amphenol announces 2-for-1 stock split, payabl... Split 2-for-1 2 1 NaT 1 2007-01-18 465

Let's go over the columns:

  • event_id: the unique identifier for this event.
  • asof_date: EventVestor's timestamp of event capture.
  • trade_date: for event announcements made before trading ends, trade_date is the same as event_date. For announcements issued after market close, trade_date is next market open day.
  • symbol: stock ticker symbol of the affected company.
  • event_type: this should always be Stock Split.
  • event_headline: a brief description of the event
  • split_type: stock split or reverse split
  • split_factor: the x-for-y split factor. This is equivalently expressed by new_shares and old_shares.
  • new_shares: number of new shares for x number of old shares
  • old_shares: number of old shares exchanged for the number of new shares.
  • effective_date: effective date of stock split.
  • event_rating: this is always 1. The meaning of this is uncertain.
  • timestamp: this is our timestamp on when we registered the data.
  • sid: the equity's unique identifier. Use this instead of the symbol.

We've done much of the data processing for you. Fields like timestamp and sid are standardized across all our Store Datasets, so the datasets are easy to combine. We have standardized the sid across all our equity databases.

We can select columns and rows with ease. Below, we'll fetch Nike's stock splits.


In [7]:
# get apple's sid first
nike_sid = symbols('NKE').sid
splits = stock_splits[(stock_splits.sid == nike_sid)]
# When displaying a Blaze Data Object, the printout is automatically truncated to ten rows.
splits.sort('asof_date')


Out[7]:
event_id asof_date trade_date symbol event_type event_headline split_type split_factor new_shares old_shares effective_date event_rating timestamp sid
0 61171 2007-02-15 2007-02-15 NKE Stock Split Nike announces 2-for-1 stock split Split 2-for-1 2 1 NaT 1 2007-02-16 5328
1 1509519 2012-11-15 2012-11-16 NKE Stock Split Nike Announces Two-For-One Stock Split Split 2-for-1 2 1 NaT 1 2012-11-16 5328

Now suppose we want a DataFrame of stock_splits, but limited to reverse splits only. Of those, we then want to display the split_factor, timestamp, and sid.


In [8]:
reverse = stock_splits[stock_splits.split_type == "Reverse Split"]
df = odo(reverse, pd.DataFrame)
df = df[['asof_date','split_factor','sid']]
df = df[df.sid.notnull()]
# When printing a pandas DataFrame, the head 30 and tail 30 rows are displayed. The middle is truncated.
df


Out[8]:
asof_date split_factor sid
0 2007-02-20 1-for-3 21120
3 2007-03-29 1-for-4 16607
4 2007-07-17 8-for-9 12626
5 2007-08-01 1-for-6 17914
7 2007-08-07 1-for-10 6276
8 2007-08-20 1-for-20 17504
10 2007-11-01 1-for-10 10583
11 2007-11-14 1-for-4 17799
12 2008-01-31 0-for-0 26837
13 2008-02-22 1-for-5 24074
14 2008-03-07 1-for-12 19187
17 2008-04-11 1-for-10 14420
18 2008-04-23 1-for-4 19635
20 2008-04-25 1-for-10 1365
21 2008-05-07 0-for-0 6804
22 2008-05-09 1-for-3 7121
23 2008-05-30 1-for-5 24074
24 2008-06-02 1-for-10 19682
25 2008-06-02 1-for-5 25206
26 2008-06-16 1-for-5 15881
27 2008-07-01 1-for-5 25206
28 2008-07-03 1-for-20 21291
29 2008-07-09 1-for-10 1365
30 2008-07-11 1-for-10 21113
31 2008-08-12 1-for-5 14469
32 2008-08-21 1-for-2 26470
33 2008-08-29 1-for-10 16607
34 2008-09-05 1-for-10 23635
35 2008-09-11 1-for-20 9774
36 2008-09-16 1-for-10 14420
... ... ... ...
391 2015-05-26 1-for-6 32867
392 2015-05-28 1-for-10 12765
393 2015-06-01 1-for-4 19709
394 2015-06-18 1-for-8 35162
395 2015-06-18 0-for-0 39627
396 2015-06-18 1-for-8 40461
397 2015-06-23 1-for-4 19709
398 2015-06-25 1-for-7 39627
400 2015-06-26 1-for-5 28718
401 2015-06-29 1-for-10 12765
402 2015-07-08 1-for-4 40822
403 2015-07-10 1-for-8 4982
404 2015-07-13 1-for-8 44063
405 2015-07-15 1-for-10 42820
406 2015-07-20 1-for-10 88
407 2015-07-20 1-for-10 41717
408 2015-07-21 1-for-10 40531
409 2015-07-27 1-for-10 88
410 2015-07-31 0-for-0 35162
411 2015-08-03 1-for-10 42820
412 2015-08-04 1-for-4 28076
413 2015-08-04 1-for-10 6523
414 2015-08-06 1-for-10 19074
415 2015-08-18 1-for-3 19350
416 2015-08-24 1-for-5 28416
417 2015-08-25 1-for-7 39627
418 2015-08-31 1-for-4 28076
419 2015-09-03 1-for-7 24624
420 2015-09-09 0-for-0 22253
421 2015-09-16 1-for-15 22660

361 rows × 3 columns


In [ ]: