EventVestor: Stock Splits

In this notebook, we'll take a look at EventVestor's Stock Splits dataset, available on the Quantopian Store. This dataset spans January 01, 2007 through the current day, and documents stock splits and reverse stock splits.

Blaze

Before we dig into the data, we want to tell you about how you generally access Quantopian Store data sets. These datasets are available through an API service known as Blaze. Blaze provides the Quantopian user with a convenient interface to access very large datasets.

Blaze provides an important function for accessing these datasets. Some of these sets are many millions of records. Bringing that data directly into Quantopian Research directly just is not viable. So Blaze allows us to provide a simple querying interface and shift the burden over to the server side.

It is common to use Blaze to reduce your dataset in size, convert it over to Pandas and then to use Pandas for further computation, manipulation and visualization.

Helpful links:

Once you've limited the size of your Blaze object, you can convert it to a Pandas DataFrames using:

from odo import odo
odo(expr, pandas.DataFrame)

Free samples and limits

One other key caveat: we limit the number of results returned from any given expression to 10,000 to protect against runaway memory usage. To be clear, you have access to all the data server side. We are limiting the size of the responses back from Blaze.

There is a free version of this dataset as well as a paid one. The free one includes about three years of historical data, though not up to the current day.

With preamble in place, let's get started:



In [3]:

    
# import the dataset
from quantopian.interactive.data.eventvestor import stock_splits
# or if you want to import the free dataset, use:
# from quantopian.data.eventvestor import stock_splits_free

# import data operations
from odo import odo
# import other libraries we will use
import pandas as pd



In [4]:

    
# Let's use blaze to understand the data a bit using Blaze dshape()
stock_splits.dshape









    Out[4]:





dshape("""var * {
  event_id: ?float64,
  asof_date: datetime,
  trade_date: ?datetime,
  symbol: ?string,
  event_type: ?string,
  event_headline: ?string,
  split_type: ?string,
  split_factor: ?string,
  new_shares: ?float64,
  old_shares: ?float64,
  effective_date: ?datetime,
  event_rating: ?float64,
  timestamp: datetime,
  sid: ?int64
  }""")



In [5]:

    
# And how many rows are there?
# N.B. we're using a Blaze function to do this, not len()
stock_splits.count()









    Out[5]:




1062



In [6]:

    
# Let's see what the data looks like. We'll grab the first three rows.
stock_splits[:3]









    Out[6]:





  
    
      
      event_id
      asof_date
      trade_date
      symbol
      event_type
      event_headline
      split_type
      split_factor
      new_shares
      old_shares
      effective_date
      event_rating
      timestamp
      sid
    
  
  
    
      0
      61191
      2007-01-09
      2007-01-09
      MDCI
      Stock Split
      Medical Action announces 3-for-2 stock split, ...
      Split
      3-for-2
      3
      2
      NaT
      1
      2007-01-10
      4737
    
    
      1
      61190
      2007-01-09
      2007-01-09
      SSI
      Stock Split
      Stage Stores announces 3-for-2 stock split, pa...
      Split
      3-for-2
      3
      2
      NaT
      1
      2007-01-10
      23395
    
    
      2
      61189
      2007-01-17
      2007-01-17
      APH
      Stock Split
      Amphenol announces 2-for-1 stock split, payabl...
      Split
      2-for-1
      2
      1
      NaT
      1
      2007-01-18
      465

Let's go over the columns:

event_id: the unique identifier for this event.
asof_date: EventVestor's timestamp of event capture.
trade_date: for event announcements made before trading ends, trade_date is the same as event_date. For announcements issued after market close, trade_date is next market open day.
symbol: stock ticker symbol of the affected company.
event_type: this should always be Stock Split.
event_headline: a brief description of the event
split_type: stock split or reverse split
split_factor: the x-for-y split factor. This is equivalently expressed by new_shares and old_shares.
new_shares: number of new shares for x number of old shares
old_shares: number of old shares exchanged for the number of new shares.
effective_date: effective date of stock split.
event_rating: this is always 1. The meaning of this is uncertain.
timestamp: this is our timestamp on when we registered the data.
sid: the equity's unique identifier. Use this instead of the symbol.

We've done much of the data processing for you. Fields like timestamp and sid are standardized across all our Store Datasets, so the datasets are easy to combine. We have standardized the sid across all our equity databases.

We can select columns and rows with ease. Below, we'll fetch Nike's stock splits.



In [7]:

    
# get apple's sid first
nike_sid = symbols('NKE').sid
splits = stock_splits[(stock_splits.sid == nike_sid)]
# When displaying a Blaze Data Object, the printout is automatically truncated to ten rows.
splits.sort('asof_date')









    Out[7]:





  
    
      
      event_id
      asof_date
      trade_date
      symbol
      event_type
      event_headline
      split_type
      split_factor
      new_shares
      old_shares
      effective_date
      event_rating
      timestamp
      sid
    
  
  
    
      0
      61171
      2007-02-15
      2007-02-15
      NKE
      Stock Split
      Nike announces 2-for-1 stock split
      Split
      2-for-1
      2
      1
      NaT
      1
      2007-02-16
      5328
    
    
      1
      1509519
      2012-11-15
      2012-11-16
      NKE
      Stock Split
      Nike Announces Two-For-One Stock Split
      Split
      2-for-1
      2
      1
      NaT
      1
      2012-11-16
      5328

Now suppose we want a DataFrame of stock_splits, but limited to reverse splits only. Of those, we then want to display the split_factor, timestamp, and sid.



In [8]:

    
reverse = stock_splits[stock_splits.split_type == "Reverse Split"]
df = odo(reverse, pd.DataFrame)
df = df[['asof_date','split_factor','sid']]
df = df[df.sid.notnull()]
# When printing a pandas DataFrame, the head 30 and tail 30 rows are displayed. The middle is truncated.
df









    Out[8]:






  
    
      
      asof_date
      split_factor
      sid
    
  
  
    
      0
      2007-02-20
      1-for-3
      21120
    
    
      3
      2007-03-29
      1-for-4
      16607
    
    
      4
      2007-07-17
      8-for-9
      12626
    
    
      5
      2007-08-01
      1-for-6
      17914
    
    
      7
      2007-08-07
      1-for-10
      6276
    
    
      8
      2007-08-20
      1-for-20
      17504
    
    
      10
      2007-11-01
      1-for-10
      10583
    
    
      11
      2007-11-14
      1-for-4
      17799
    
    
      12
      2008-01-31
      0-for-0
      26837
    
    
      13
      2008-02-22
      1-for-5
      24074
    
    
      14
      2008-03-07
      1-for-12
      19187
    
    
      17
      2008-04-11
      1-for-10
      14420
    
    
      18
      2008-04-23
      1-for-4
      19635
    
    
      20
      2008-04-25
      1-for-10
      1365
    
    
      21
      2008-05-07
      0-for-0
      6804
    
    
      22
      2008-05-09
      1-for-3
      7121
    
    
      23
      2008-05-30
      1-for-5
      24074
    
    
      24
      2008-06-02
      1-for-10
      19682
    
    
      25
      2008-06-02
      1-for-5
      25206
    
    
      26
      2008-06-16
      1-for-5
      15881
    
    
      27
      2008-07-01
      1-for-5
      25206
    
    
      28
      2008-07-03
      1-for-20
      21291
    
    
      29
      2008-07-09
      1-for-10
      1365
    
    
      30
      2008-07-11
      1-for-10
      21113
    
    
      31
      2008-08-12
      1-for-5
      14469
    
    
      32
      2008-08-21
      1-for-2
      26470
    
    
      33
      2008-08-29
      1-for-10
      16607
    
    
      34
      2008-09-05
      1-for-10
      23635
    
    
      35
      2008-09-11
      1-for-20
      9774
    
    
      36
      2008-09-16
      1-for-10
      14420
    
    
      ...
      ...
      ...
      ...
    
    
      391
      2015-05-26
      1-for-6
      32867
    
    
      392
      2015-05-28
      1-for-10
      12765
    
    
      393
      2015-06-01
      1-for-4
      19709
    
    
      394
      2015-06-18
      1-for-8
      35162
    
    
      395
      2015-06-18
      0-for-0
      39627
    
    
      396
      2015-06-18
      1-for-8
      40461
    
    
      397
      2015-06-23
      1-for-4
      19709
    
    
      398
      2015-06-25
      1-for-7
      39627
    
    
      400
      2015-06-26
      1-for-5
      28718
    
    
      401
      2015-06-29
      1-for-10
      12765
    
    
      402
      2015-07-08
      1-for-4
      40822
    
    
      403
      2015-07-10
      1-for-8
      4982
    
    
      404
      2015-07-13
      1-for-8
      44063
    
    
      405
      2015-07-15
      1-for-10
      42820
    
    
      406
      2015-07-20
      1-for-10
      88
    
    
      407
      2015-07-20
      1-for-10
      41717
    
    
      408
      2015-07-21
      1-for-10
      40531
    
    
      409
      2015-07-27
      1-for-10
      88
    
    
      410
      2015-07-31
      0-for-0
      35162
    
    
      411
      2015-08-03
      1-for-10
      42820
    
    
      412
      2015-08-04
      1-for-4
      28076
    
    
      413
      2015-08-04
      1-for-10
      6523
    
    
      414
      2015-08-06
      1-for-10
      19074
    
    
      415
      2015-08-18
      1-for-3
      19350
    
    
      416
      2015-08-24
      1-for-5
      28416
    
    
      417
      2015-08-25
      1-for-7
      39627
    
    
      418
      2015-08-31
      1-for-4
      28076
    
    
      419
      2015-09-03
      1-for-7
      24624
    
    
      420
      2015-09-09
      0-for-0
      22253
    
    
      421
      2015-09-16
      1-for-15
      22660
    
  

361 rows × 3 columns



In [ ]:

	event_id	asof_date	trade_date	symbol	event_type	event_headline	split_type	split_factor	new_shares	old_shares	effective_date	event_rating	timestamp	sid
0	61191	2007-01-09	2007-01-09	MDCI	Stock Split	Medical Action announces 3-for-2 stock split, ...	Split	3-for-2	3	2	NaT	1	2007-01-10	4737
1	61190	2007-01-09	2007-01-09	SSI	Stock Split	Stage Stores announces 3-for-2 stock split, pa...	Split	3-for-2	3	2	NaT	1	2007-01-10	23395
2	61189	2007-01-17	2007-01-17	APH	Stock Split	Amphenol announces 2-for-1 stock split, payabl...	Split	2-for-1	2	1	NaT	1	2007-01-18	465

	event_id	asof_date	trade_date	symbol	event_type	event_headline	split_type	split_factor	new_shares	old_shares	effective_date	event_rating	timestamp	sid
0	61171	2007-02-15	2007-02-15	NKE	Stock Split	Nike announces 2-for-1 stock split	Split	2-for-1	2	1	NaT	1	2007-02-16	5328
1	1509519	2012-11-15	2012-11-16	NKE	Stock Split	Nike Announces Two-For-One Stock Split	Split	2-for-1	2	1	NaT	1	2012-11-16	5328

	asof_date	split_factor	sid
0	2007-02-20	1-for-3	21120
3	2007-03-29	1-for-4	16607
4	2007-07-17	8-for-9	12626
5	2007-08-01	1-for-6	17914
7	2007-08-07	1-for-10	6276
8	2007-08-20	1-for-20	17504
10	2007-11-01	1-for-10	10583
11	2007-11-14	1-for-4	17799
12	2008-01-31	0-for-0	26837
13	2008-02-22	1-for-5	24074
14	2008-03-07	1-for-12	19187
17	2008-04-11	1-for-10	14420
18	2008-04-23	1-for-4	19635
20	2008-04-25	1-for-10	1365
21	2008-05-07	0-for-0	6804
22	2008-05-09	1-for-3	7121
23	2008-05-30	1-for-5	24074
24	2008-06-02	1-for-10	19682
25	2008-06-02	1-for-5	25206
26	2008-06-16	1-for-5	15881
27	2008-07-01	1-for-5	25206
28	2008-07-03	1-for-20	21291
29	2008-07-09	1-for-10	1365
30	2008-07-11	1-for-10	21113
31	2008-08-12	1-for-5	14469
32	2008-08-21	1-for-2	26470
33	2008-08-29	1-for-10	16607
34	2008-09-05	1-for-10	23635
35	2008-09-11	1-for-20	9774
36	2008-09-16	1-for-10	14420
...	...	...	...
391	2015-05-26	1-for-6	32867
392	2015-05-28	1-for-10	12765
393	2015-06-01	1-for-4	19709
394	2015-06-18	1-for-8	35162
395	2015-06-18	0-for-0	39627
396	2015-06-18	1-for-8	40461
397	2015-06-23	1-for-4	19709
398	2015-06-25	1-for-7	39627
400	2015-06-26	1-for-5	28718
401	2015-06-29	1-for-10	12765
402	2015-07-08	1-for-4	40822
403	2015-07-10	1-for-8	4982
404	2015-07-13	1-for-8	44063
405	2015-07-15	1-for-10	42820
406	2015-07-20	1-for-10	88
407	2015-07-20	1-for-10	41717
408	2015-07-21	1-for-10	40531
409	2015-07-27	1-for-10	88
410	2015-07-31	0-for-0	35162
411	2015-08-03	1-for-10	42820
412	2015-08-04	1-for-4	28076
413	2015-08-04	1-for-10	6523
414	2015-08-06	1-for-10	19074
415	2015-08-18	1-for-3	19350
416	2015-08-24	1-for-5	28416
417	2015-08-25	1-for-7	39627
418	2015-08-31	1-for-4	28076
419	2015-09-03	1-for-7	24624
420	2015-09-09	0-for-0	22253
421	2015-09-16	1-for-15	22660