EventVestor: Share Repurchases

In this notebook, we'll take a look at EventVestor's Share Repurchases dataset, available on the Quantopian Store. This dataset spans January 01, 2007 through the current day, and documents actual share repurchase announcements by companies. Note that this is different from Share Buyback Authorizations.

Blaze

Before we dig into the data, we want to tell you about how you generally access Quantopian Store data sets. These datasets are available through an API service known as Blaze. Blaze provides the Quantopian user with a convenient interface to access very large datasets.

Blaze provides an important function for accessing these datasets. Some of these sets are many millions of records. Bringing that data directly into Quantopian Research directly just is not viable. So Blaze allows us to provide a simple querying interface and shift the burden over to the server side.

It is common to use Blaze to reduce your dataset in size, convert it over to Pandas and then to use Pandas for further computation, manipulation and visualization.

Helpful links:

Once you've limited the size of your Blaze object, you can convert it to a Pandas DataFrames using:

from odo import odo
odo(expr, pandas.DataFrame)

Free samples and limits

One other key caveat: we limit the number of results returned from any given expression to 10,000 to protect against runaway memory usage. To be clear, you have access to all the data server side. We are limiting the size of the responses back from Blaze.

There is a free version of this dataset as well as a paid one. The free one includes about three years of historical data, though not up to the current day.

With preamble in place, let's get started:


In [1]:
# import the dataset
from quantopian.interactive.data.eventvestor import share_repurchases
# or if you want to import the free dataset, use:
# from quantopian.interactive.data.eventvestor import share_repurchases_free

# import data operations
from odo import odo
# import other libraries we will use
import pandas as pd

In [2]:
# Let's use blaze to understand the data a bit using Blaze dshape()
share_repurchases.dshape


Out[2]:
dshape("""var * {
  event_id: ?float64,
  asof_date: datetime,
  trade_date: ?datetime,
  symbol: ?string,
  event_type: ?string,
  event_headline: ?string,
  repurchase_amount: ?float64,
  repurchase_units: ?string,
  event_rating: ?float64,
  timestamp: datetime,
  sid: ?int64
  }""")

In [3]:
# And how many rows are there?
# N.B. we're using a Blaze function to do this, not len()
share_repurchases.count()


Out[3]:
15509

In [4]:
# Let's see what the data looks like. We'll grab the first three rows.
share_repurchases[:3]


Out[4]:
event_id asof_date trade_date symbol event_type event_headline repurchase_amount repurchase_units event_rating timestamp sid
0 1113050 2007-01-17 2007-01-17 TESS Buyback Update TESSCO Tech Repurchases $1.7M Shares in 3Q 07 ... 1.7 $M 1 2007-01-18 11968
1 131345 2007-01-17 2007-01-18 WM Buyback Update Washington Mutual Announces $2.7B Accelerated ... 2700.0 $M 1 2007-01-18 19181
2 137183 2007-01-23 2007-01-23 RDN Buyback Update Radian Group Repurchased 1.5M shares for $81.1... 81.1 $M 1 2007-01-24 20276

Let's go over the columns:

  • event_id: the unique identifier for this event.
  • asof_date: EventVestor's timestamp of event capture.
  • trade_date: for event announcements made before trading ends, trade_date is the same as event_date. For announcements issued after market close, trade_date is next market open day.
  • symbol: stock ticker symbol of the affected company.
  • event_type: this should always be Buyback Update.
  • event_headline: a brief description of the event
  • repurchase_amount: amount of shares (in repurchase_units) repurchased during the reported period
  • repurchase_units: millions of dollars or percent of total shares outstanding.
  • event_rating: this is always 1. The meaning of this is uncertain.
  • timestamp: this is our timestamp on when we registered the data.
  • sid: the equity's unique identifier. Use this instead of the symbol.

We've done much of the data processing for you. Fields like timestamp and sid are standardized across all our Store Datasets, so the datasets are easy to combine. We have standardized the sid across all our equity databases.

We can select columns and rows with ease. Below, we'll fetch Apple's 2014 share repurchases.


In [5]:
# get apple's sid first
apple_sid = symbols('AAPL').sid
buybacks = share_repurchases[('2013-12-31' < share_repurchases['asof_date']) & 
                                (share_repurchases['asof_date'] <'2015-01-01') & 
                                (share_repurchases.sid == apple_sid)]
# When displaying a Blaze Data Object, the printout is automatically truncated to ten rows.
buybacks.sort('asof_date')


Out[5]:
event_id asof_date trade_date symbol event_type event_headline repurchase_amount repurchase_units event_rating timestamp sid
0 1918241 2014-01-27 2014-01-27 AAPL Buyback Update Apple Repurchases $5.03B Common Stock in 1Q 14 5029 $M 1 2014-01-28 24
1 1674141 2014-02-07 2014-02-07 AAPL Buyback Update Apple Repurchases $14B Common Stock Since 1Q 1... 14000 $M 1 2014-02-08 24
2 1918254 2014-04-23 2014-04-23 AAPL Buyback Update Apple Repurchases $23B Common Stock in FY 14 YTD 23000 $M 1 2014-04-24 24
3 1918258 2014-07-22 2014-07-22 AAPL Buyback Update Apple Repurchases $28B Common Stock in FY 14 YTD 5000 $M 1 2014-07-23 24
4 1918275 2014-10-20 2014-10-20 AAPL Buyback Update Apple Repurchases $45B Common Stock in FY 14 17000 $M 1 2014-10-21 24

Now suppose we want a DataFrame of the Blaze Data Object above, but only want the asof_date, repurchase_units, and the repurchase_amount.


In [6]:
df = odo(buybacks, pd.DataFrame)
df = df[['asof_date','repurchase_amount','repurchase_units']]
df


Out[6]:
asof_date repurchase_amount repurchase_units
0 2014-01-27 5029 $M
1 2014-02-07 14000 $M
2 2014-04-23 23000 $M
3 2014-07-22 5000 $M
4 2014-10-20 17000 $M