EventVestor: Spin-Offs

In this notebook, we'll take a look at EventVestor's Spin-Offs dataset, available on the Quantopian Store. This dataset spans January 01, 2007 through the current day, and documents corporate spin-off events.

Blaze

Before we dig into the data, we want to tell you about how you generally access Quantopian Store data sets. These datasets are available through an API service known as Blaze. Blaze provides the Quantopian user with a convenient interface to access very large datasets.

Blaze provides an important function for accessing these datasets. Some of these sets are many millions of records. Bringing that data directly into Quantopian Research directly just is not viable. So Blaze allows us to provide a simple querying interface and shift the burden over to the server side.

It is common to use Blaze to reduce your dataset in size, convert it over to Pandas and then to use Pandas for further computation, manipulation and visualization.

Helpful links:

Once you've limited the size of your Blaze object, you can convert it to a Pandas DataFrames using:

from odo import odo
odo(expr, pandas.DataFrame)

Free samples and limits

One other key caveat: we limit the number of results returned from any given expression to 10,000 to protect against runaway memory usage. To be clear, you have access to all the data server side. We are limiting the size of the responses back from Blaze.

There is a free version of this dataset as well as a paid one. The free one includes about three years of historical data, though not up to the current day.

With preamble in place, let's get started:



In [1]:

    
# import the dataset
from quantopian.interactive.data.eventvestor import spin_offs
# or if you want to import the free dataset, use:
# from quantopian.data.eventvestor import spin_offs_free

# import data operations
from odo import odo
# import other libraries we will use
import pandas as pd



In [2]:

    
# Let's use blaze to understand the data a bit using Blaze dshape()
spin_offs.dshape









    Out[2]:





dshape("""var * {
  event_id: ?float64,
  asof_date: datetime,
  trade_date: ?datetime,
  symbol: ?string,
  event_type: ?string,
  event_headline: ?string,
  spinoff_phase: ?string,
  spinoff_name: ?string,
  event_rating: ?float64,
  timestamp: datetime,
  sid: ?int64
  }""")



In [3]:

    
# And how many rows are there?
# N.B. we're using a Blaze function to do this, not len()
spin_offs.count()









    Out[3]:




1189



In [4]:

    
# Let's see what the data looks like. We'll grab the first three rows.
spin_offs[:3]









    Out[4]:





  
    
      
      event_id
      asof_date
      trade_date
      symbol
      event_type
      event_headline
      spinoff_phase
      spinoff_name
      event_rating
      timestamp
      sid
    
  
  
    
      0
      127421
      2007-01-08
      2007-01-09
      DUK
      Spin-off
      Duke Energy completes Natural Gas business spi...
      Completes
      NaN
      1
      2007-01-09
      2351
    
    
      1
      134268
      2007-01-08
      2007-01-08
      NCR
      Spin-off
      NCR To Separate Into Two Independent Companies
      Proposal
      NaN
      1
      2007-01-09
      16389
    
    
      2
      77960
      2007-01-16
      2007-01-16
      VZ
      Spin-off
      Verizon to spin off and merge local exchange a...
      Board Approval
      NaN
      1
      2007-01-17
      21839

Let's go over the columns:

event_id: the unique identifier for this event.
asof_date: EventVestor's timestamp of event capture.
trade_date: for event announcements made before trading ends, trade_date is the same as event_date. For announcements issued after market close, trade_date is next market open day.
symbol: stock ticker symbol of the affected company.
event_type: this should always be Spin-off.
event_headline: a brief description of the event
spinoff_phase: values include proposal, approval, completes.
spinoff_name: name of the entity being spun off.
event_rating: this is always 1. The meaning of this is uncertain.
timestamp: this is our timestamp on when we registered the data.
sid: the equity's unique identifier. Use this instead of the symbol.

We've done much of the data processing for you. Fields like timestamp and sid are standardized across all our Store Datasets, so the datasets are easy to combine. We have standardized the sid across all our equity databases.

We can select columns and rows with ease. Below, we'll fetch Yahoo's 2015 spin-offs.



In [5]:

    
# get yahoo's sid first
yahoo_sid = symbols('YHOO').sid
spinoffs = spin_offs[('2014-12-31' < spin_offs['asof_date']) & 
                                (spin_offs['asof_date'] <'2016-01-01') & 
                                (spin_offs.sid == yahoo_sid)]
# When displaying a Blaze Data Object, the printout is automatically truncated to ten rows.
spinoffs.sort('asof_date')









    Out[5]:





  
    
      
      event_id
      asof_date
      trade_date
      symbol
      event_type
      event_headline
      spinoff_phase
      spinoff_name
      event_rating
      timestamp
      sid
    
  
  
    
      0
      1827542
      2015-01-27
      2015-01-28
      YHOO
      Spin-off
      Yahoo to Spin-Off its Alibaba Stake into Newly...
      Board Approval
      NaN
      1
      2015-01-28 00:00:00
      14848
    
    
      1
      1903562
      2015-07-17
      2015-07-20
      YHOO
      Spin-off
      Yahoo! Announces SEC Filing for Planned Spin-O...
      Updates
      Aabaco Holdings Inc.
      1
      2015-07-18 00:00:00
      14848
    
    
      2
      1937451
      2015-09-28
      2015-09-29
      YHOO
      Spin-off
      Yahoo! to Proceed Alibaba Stake Spinoff withou...
      Updates
      Alibaba Holding Group Ltd.
      1
      2015-09-29 11:14:35.314487
      14848

Now suppose we want a DataFrame of spin_offs, but only want the asof_date, spinoff_phase, and the sid.



In [6]:

    
#len(spin_offs) = ~10000, so we can convert it to a dataframe without a worry -- it's a small dataset.
df = odo(spin_offs, pd.DataFrame)
df = df[['asof_date','spinoff_phase','sid']]
df = df[df.sid.notnull()]
# When printing a pandas DataFrame, the head 30 and tail 30 rows are displayed. The middle is truncated.
df









    Out[6]:






  
    
      
      asof_date
      spinoff_phase
      sid
    
  
  
    
      0
      2007-01-08
      Completes
      2351
    
    
      1
      2007-01-08
      Proposal
      16389
    
    
      2
      2007-01-16
      Board Approval
      21839
    
    
      3
      2007-01-17
      Updates
      4758
    
    
      4
      2007-01-19
      Updates
      13373
    
    
      5
      2007-01-31
      Completes
      13373
    
    
      6
      2007-01-31
      Board Approval
      4954
    
    
      8
      2007-02-02
      Updates
      8326
    
    
      9
      2007-02-06
      Updates
      22954
    
    
      10
      2007-02-27
      Proposal
      22983
    
    
      11
      2007-02-27
      Proposal
      7449
    
    
      12
      2007-03-02
      Updates
      3443
    
    
      13
      2007-03-02
      Updates
      32880
    
    
      14
      2007-03-06
      Updates
      22983
    
    
      15
      2007-03-07
      Completes
      8326
    
    
      16
      2007-03-20
      Updates
      4954
    
    
      17
      2007-03-21
      Proposal
      630
    
    
      18
      2007-03-26
      Updates
      22954
    
    
      19
      2007-03-30
      Completes
      4954
    
    
      20
      2007-04-03
      Updates
      3443
    
    
      21
      2007-04-03
      Updates
      32880
    
    
      22
      2007-04-04
      Completes
      630
    
    
      23
      2007-04-04
      Proposal
      5025
    
    
      24
      2007-04-05
      Completes
      3443
    
    
      25
      2007-04-05
      Completes
      32880
    
    
      26
      2007-04-10
      Proposal
      18027
    
    
      28
      2007-05-15
      Proposal
      4010
    
    
      29
      2007-05-26
      Updates
      2190
    
    
      30
      2007-06-01
      Board Approval
      17080
    
    
      31
      2007-06-08
      Proposal
      7679
    
    
      ...
      ...
      ...
      ...
    
    
      1028
      2015-07-13
      Updates
      34575
    
    
      1029
      2015-07-14
      Board Approval
      9693
    
    
      1030
      2015-07-17
      Updates
      14848
    
    
      1031
      2015-07-20
      Completes
      24819
    
    
      1032
      2015-07-22
      Updates
      34575
    
    
      1034
      2015-07-24
      Updates
      34575
    
    
      1035
      2015-07-24
      Updates
      4117
    
    
      1036
      2015-07-30
      Updates
      22015
    
    
      1037
      2015-07-30
      Board Approval
      18821
    
    
      1038
      2015-07-31
      Updates
      13306
    
    
      1039
      2015-08-03
      Completes
      9693
    
    
      1040
      2015-08-03
      Proposal
      21608
    
    
      1041
      2015-08-04
      Proposal
      2248
    
    
      1042
      2015-08-06
      Proposal
      32878
    
    
      1043
      2015-08-06
      Updates
      47812
    
    
      1044
      2015-08-18
      Completes
      18821
    
    
      1045
      2015-08-27
      NaN
      22689
    
    
      1046
      2015-08-31
      Updates
      1898
    
    
      1047
      2015-09-01
      Updates
      4656
    
    
      1048
      2015-09-04
      Updates
      21608
    
    
      1049
      2015-09-08
      Board Approval
      1936
    
    
      1050
      2015-09-08
      NaN
      42176
    
    
      1051
      2015-09-11
      Updates
      34067
    
    
      1052
      2015-09-15
      Updates
      11498
    
    
      1053
      2015-09-16
      Board Approval
      460
    
    
      1054
      2015-09-22
      Proposal
      559
    
    
      1055
      2015-09-28
      NaN
      2
    
    
      1160
      2014-07-11
      Board Approval
      5249
    
    
      1187
      2015-09-28
      Completes
      7086
    
    
      1188
      2015-09-28
      Updates
      14848
    
  

929 rows × 3 columns



In [ ]:

	event_id	asof_date	trade_date	symbol	event_type	event_headline	spinoff_phase	spinoff_name	event_rating	timestamp	sid
0	127421	2007-01-08	2007-01-09	DUK	Spin-off	Duke Energy completes Natural Gas business spi...	Completes	NaN	1	2007-01-09	2351
1	134268	2007-01-08	2007-01-08	NCR	Spin-off	NCR To Separate Into Two Independent Companies	Proposal	NaN	1	2007-01-09	16389
2	77960	2007-01-16	2007-01-16	VZ	Spin-off	Verizon to spin off and merge local exchange a...	Board Approval	NaN	1	2007-01-17	21839

	event_id	asof_date	trade_date	symbol	event_type	event_headline	spinoff_phase	spinoff_name	event_rating	timestamp	sid
0	1827542	2015-01-27	2015-01-28	YHOO	Spin-off	Yahoo to Spin-Off its Alibaba Stake into Newly...	Board Approval	NaN	1	2015-01-28 00:00:00	14848
1	1903562	2015-07-17	2015-07-20	YHOO	Spin-off	Yahoo! Announces SEC Filing for Planned Spin-O...	Updates	Aabaco Holdings Inc.	1	2015-07-18 00:00:00	14848
2	1937451	2015-09-28	2015-09-29	YHOO	Spin-off	Yahoo! to Proceed Alibaba Stake Spinoff withou...	Updates	Alibaba Holding Group Ltd.	1	2015-09-29 11:14:35.314487	14848