EventVestor: Spin-Offs

In this notebook, we'll take a look at EventVestor's Spin-Offs dataset, available on the Quantopian Store. This dataset spans January 01, 2007 through the current day, and documents corporate spin-off events.

Blaze

Before we dig into the data, we want to tell you about how you generally access Quantopian Store data sets. These datasets are available through an API service known as Blaze. Blaze provides the Quantopian user with a convenient interface to access very large datasets.

Blaze provides an important function for accessing these datasets. Some of these sets are many millions of records. Bringing that data directly into Quantopian Research directly just is not viable. So Blaze allows us to provide a simple querying interface and shift the burden over to the server side.

It is common to use Blaze to reduce your dataset in size, convert it over to Pandas and then to use Pandas for further computation, manipulation and visualization.

Helpful links:

Once you've limited the size of your Blaze object, you can convert it to a Pandas DataFrames using:

from odo import odo
odo(expr, pandas.DataFrame)

Free samples and limits

One other key caveat: we limit the number of results returned from any given expression to 10,000 to protect against runaway memory usage. To be clear, you have access to all the data server side. We are limiting the size of the responses back from Blaze.

There is a free version of this dataset as well as a paid one. The free one includes about three years of historical data, though not up to the current day.

With preamble in place, let's get started:


In [1]:
# import the dataset
from quantopian.interactive.data.eventvestor import spin_offs
# or if you want to import the free dataset, use:
# from quantopian.data.eventvestor import spin_offs_free

# import data operations
from odo import odo
# import other libraries we will use
import pandas as pd

In [2]:
# Let's use blaze to understand the data a bit using Blaze dshape()
spin_offs.dshape


Out[2]:
dshape("""var * {
  event_id: ?float64,
  asof_date: datetime,
  trade_date: ?datetime,
  symbol: ?string,
  event_type: ?string,
  event_headline: ?string,
  spinoff_phase: ?string,
  spinoff_name: ?string,
  event_rating: ?float64,
  timestamp: datetime,
  sid: ?int64
  }""")

In [3]:
# And how many rows are there?
# N.B. we're using a Blaze function to do this, not len()
spin_offs.count()


Out[3]:
1189

In [4]:
# Let's see what the data looks like. We'll grab the first three rows.
spin_offs[:3]


Out[4]:
event_id asof_date trade_date symbol event_type event_headline spinoff_phase spinoff_name event_rating timestamp sid
0 127421 2007-01-08 2007-01-09 DUK Spin-off Duke Energy completes Natural Gas business spi... Completes NaN 1 2007-01-09 2351
1 134268 2007-01-08 2007-01-08 NCR Spin-off NCR To Separate Into Two Independent Companies Proposal NaN 1 2007-01-09 16389
2 77960 2007-01-16 2007-01-16 VZ Spin-off Verizon to spin off and merge local exchange a... Board Approval NaN 1 2007-01-17 21839

Let's go over the columns:

  • event_id: the unique identifier for this event.
  • asof_date: EventVestor's timestamp of event capture.
  • trade_date: for event announcements made before trading ends, trade_date is the same as event_date. For announcements issued after market close, trade_date is next market open day.
  • symbol: stock ticker symbol of the affected company.
  • event_type: this should always be Spin-off.
  • event_headline: a brief description of the event
  • spinoff_phase: values include proposal, approval, completes.
  • spinoff_name: name of the entity being spun off.
  • event_rating: this is always 1. The meaning of this is uncertain.
  • timestamp: this is our timestamp on when we registered the data.
  • sid: the equity's unique identifier. Use this instead of the symbol.

We've done much of the data processing for you. Fields like timestamp and sid are standardized across all our Store Datasets, so the datasets are easy to combine. We have standardized the sid across all our equity databases.

We can select columns and rows with ease. Below, we'll fetch Yahoo's 2015 spin-offs.


In [5]:
# get yahoo's sid first
yahoo_sid = symbols('YHOO').sid
spinoffs = spin_offs[('2014-12-31' < spin_offs['asof_date']) & 
                                (spin_offs['asof_date'] <'2016-01-01') & 
                                (spin_offs.sid == yahoo_sid)]
# When displaying a Blaze Data Object, the printout is automatically truncated to ten rows.
spinoffs.sort('asof_date')


Out[5]:
event_id asof_date trade_date symbol event_type event_headline spinoff_phase spinoff_name event_rating timestamp sid
0 1827542 2015-01-27 2015-01-28 YHOO Spin-off Yahoo to Spin-Off its Alibaba Stake into Newly... Board Approval NaN 1 2015-01-28 00:00:00 14848
1 1903562 2015-07-17 2015-07-20 YHOO Spin-off Yahoo! Announces SEC Filing for Planned Spin-O... Updates Aabaco Holdings Inc. 1 2015-07-18 00:00:00 14848
2 1937451 2015-09-28 2015-09-29 YHOO Spin-off Yahoo! to Proceed Alibaba Stake Spinoff withou... Updates Alibaba Holding Group Ltd. 1 2015-09-29 11:14:35.314487 14848

Now suppose we want a DataFrame of spin_offs, but only want the asof_date, spinoff_phase, and the sid.


In [6]:
#len(spin_offs) = ~10000, so we can convert it to a dataframe without a worry -- it's a small dataset.
df = odo(spin_offs, pd.DataFrame)
df = df[['asof_date','spinoff_phase','sid']]
df = df[df.sid.notnull()]
# When printing a pandas DataFrame, the head 30 and tail 30 rows are displayed. The middle is truncated.
df


Out[6]:
asof_date spinoff_phase sid
0 2007-01-08 Completes 2351
1 2007-01-08 Proposal 16389
2 2007-01-16 Board Approval 21839
3 2007-01-17 Updates 4758
4 2007-01-19 Updates 13373
5 2007-01-31 Completes 13373
6 2007-01-31 Board Approval 4954
8 2007-02-02 Updates 8326
9 2007-02-06 Updates 22954
10 2007-02-27 Proposal 22983
11 2007-02-27 Proposal 7449
12 2007-03-02 Updates 3443
13 2007-03-02 Updates 32880
14 2007-03-06 Updates 22983
15 2007-03-07 Completes 8326
16 2007-03-20 Updates 4954
17 2007-03-21 Proposal 630
18 2007-03-26 Updates 22954
19 2007-03-30 Completes 4954
20 2007-04-03 Updates 3443
21 2007-04-03 Updates 32880
22 2007-04-04 Completes 630
23 2007-04-04 Proposal 5025
24 2007-04-05 Completes 3443
25 2007-04-05 Completes 32880
26 2007-04-10 Proposal 18027
28 2007-05-15 Proposal 4010
29 2007-05-26 Updates 2190
30 2007-06-01 Board Approval 17080
31 2007-06-08 Proposal 7679
... ... ... ...
1028 2015-07-13 Updates 34575
1029 2015-07-14 Board Approval 9693
1030 2015-07-17 Updates 14848
1031 2015-07-20 Completes 24819
1032 2015-07-22 Updates 34575
1034 2015-07-24 Updates 34575
1035 2015-07-24 Updates 4117
1036 2015-07-30 Updates 22015
1037 2015-07-30 Board Approval 18821
1038 2015-07-31 Updates 13306
1039 2015-08-03 Completes 9693
1040 2015-08-03 Proposal 21608
1041 2015-08-04 Proposal 2248
1042 2015-08-06 Proposal 32878
1043 2015-08-06 Updates 47812
1044 2015-08-18 Completes 18821
1045 2015-08-27 NaN 22689
1046 2015-08-31 Updates 1898
1047 2015-09-01 Updates 4656
1048 2015-09-04 Updates 21608
1049 2015-09-08 Board Approval 1936
1050 2015-09-08 NaN 42176
1051 2015-09-11 Updates 34067
1052 2015-09-15 Updates 11498
1053 2015-09-16 Board Approval 460
1054 2015-09-22 Proposal 559
1055 2015-09-28 NaN 2
1160 2014-07-11 Board Approval 5249
1187 2015-09-28 Completes 7086
1188 2015-09-28 Updates 14848

929 rows × 3 columns


In [ ]: