EventVestor: Earnings Releases

In this notebook, we'll take a look at EventVestor's Earnings Releases dataset, available on the Quantopian Store. This dataset spans January 01, 2007 through the current day, and documents quarterly earnings releases.

Blaze

Before we dig into the data, we want to tell you about how you generally access Quantopian Store data sets. These datasets are available through an API service known as Blaze. Blaze provides the Quantopian user with a convenient interface to access very large datasets.

Blaze provides an important function for accessing these datasets. Some of these sets are many millions of records. Bringing that data directly into Quantopian Research directly just is not viable. So Blaze allows us to provide a simple querying interface and shift the burden over to the server side.

It is common to use Blaze to reduce your dataset in size, convert it over to Pandas and then to use Pandas for further computation, manipulation and visualization.

Helpful links:

Once you've limited the size of your Blaze object, you can convert it to a Pandas DataFrames using:

from odo import odo
odo(expr, pandas.DataFrame)

Free samples and limits

One other key caveat: we limit the number of results returned from any given expression to 10,000 to protect against runaway memory usage. To be clear, you have access to all the data server side. We are limiting the size of the responses back from Blaze.

There is a free version of this dataset as well as a paid one. The free one includes about three years of historical data, though not up to the current day.

With preamble in place, let's get started:


In [1]:
# import the dataset
from quantopian.interactive.data.eventvestor import earnings_releases
# or if you want to import the free dataset, use:
# from quantopian.interactivedata.eventvestor import earnings_releases_free

# import data operations
from odo import odo
# import other libraries we will use
import pandas as pd

In [2]:
# Let's use blaze to understand the data a bit using Blaze dshape()
earnings_releases.dshape


Out[2]:
dshape("""var * {
  event_id: ?float64,
  asof_date: datetime,
  trade_date: ?datetime,
  symbol: ?string,
  event_type: ?string,
  event_headline: ?string,
  event_phase: ?string,
  fiscal_period: ?string,
  calendar_period: ?string,
  fiscal_periodend: ?datetime,
  currency: ?string,
  revenue: ?float64,
  gross_income: ?float64,
  operating_income: ?float64,
  net_income: ?float64,
  eps: ?float64,
  eps_surprisepct: ?float64,
  event_rating: ?float64,
  timestamp: datetime,
  sid: ?int64
  }""")

In [3]:
# And how many rows are there?
# N.B. we're using a Blaze function to do this, not len()
earnings_releases.count()


Out[3]:
139427

In [4]:
# Let's see what the data looks like. We'll grab the first three rows.
earnings_releases[:3]


Out[4]:
event_id asof_date trade_date symbol event_type event_headline event_phase fiscal_period calendar_period fiscal_periodend currency revenue gross_income operating_income net_income eps eps_surprisepct event_rating timestamp sid
0 526391 2007-01-03 2007-01-04 ANGO Earnings Release AngioDynamics 2Q Net up 48% NaN 2Q 07 4Q 06 2006-12-02 $ 24.37 14.24 3.0 2.45 0.15 0 1 2007-01-04 26324
1 196507 2007-01-03 2007-01-04 BLUD Earnings Release Immucor Reports 2Q Results NaN NaN NaN NaT NaN 0.00 0.00 0.0 0.00 0.00 0 1 2007-01-04 955
2 180559 2007-01-03 2007-01-03 CALM Earnings Release CAL-MAINE FOODS REPORTS 2Q 07 RESULTS NaN 2Q 07 4Q 06 2006-12-02 $ 137.74 24.96 10.5 6.40 0.27 0 1 2007-01-04 16169

Let's go over the columns:

  • event_id: the unique identifier for this event.
  • asof_date: EventVestor's timestamp of event capture.
  • trade_date: for event announcements made before trading ends, trade_date is the same as event_date. For announcements issued after market close, trade_date is next market open day.
  • symbol: stock ticker symbol of the affected company.
  • event_type: this should always be Earnings Release/Earnings release.
  • event_headline: a brief description of the event
  • event_phase: the inclusion of this field is likely an error on the part of the data vendor. We're currently attempting to resolve this.
  • fiscal_period: fiscal period for the reported earnings, such as 1Q 15, 2Q 15, etc.
  • calendar_period: identifies the calendar period based on the fiscal period end date. E.g. if the fiscal period ends any time after the middle of a given calendar quarter, like 1Q 15, that calendar quarter will be assigned regardless of the fiscal quarter.
  • fiscal_periodend: the last date for the reported earnings period.
  • currency: currency used for reporting earnings.
  • revenue: revenue in millions
  • gross_income: gross income in millions
  • operating_income: operating income in millions
  • net_income: net income in millions
  • eps: earnings per share, in the reported currency
  • eps_surprisepct: the meaning of this column is presently uncertain. We're working with our data vendor to resolve this issue.
  • event_rating: this is always 1. The meaning of this is uncertain.
  • timestamp: this is our timestamp on when we registered the data.
  • sid: the equity's unique identifier. Use this instead of the symbol.

We've done much of the data processing for you. Fields like timestamp and sid are standardized across all our Store Datasets, so the datasets are easy to combine. We have standardized the sid across all our equity databases.

We can select columns and rows with ease. Below, we'll fetch all of Apple's entries from 2012.


In [5]:
# get apple's sid first
aapl_sid = symbols('AAPL').sid
aapl_earnings = earnings_releases[('2011-12-31' < earnings_releases['asof_date']) & (earnings_releases['asof_date'] <'2013-01-01') & (earnings_releases.sid==aapl_sid)]
# When displaying a Blaze Data Object, the printout is automatically truncated to ten rows.
aapl_earnings.sort('asof_date')


Out[5]:
event_id asof_date trade_date symbol event_type event_headline event_phase fiscal_period calendar_period fiscal_periodend currency revenue gross_income operating_income net_income eps eps_surprisepct event_rating timestamp sid
0 1385939 2012-01-24 2012-01-25 AAPL Earnings Release Apple 1Q 12 Net Jumps to $13B on Higher Revenues NaN 1Q 12 4Q 11 2011-12-31 $ 46333 20703 17340 13064 13.87 38.29 1 2012-01-25 24
1 1421108 2012-04-24 2012-04-25 AAPL Earnings Release Apple 2Q 12 Net Up 94% on Higher Revenues NaN 2Q 12 1Q 12 2012-03-31 $ 39186 18564 15384 11622 12.30 23.74 1 2012-04-25 24
2 1456685 2012-07-24 2012-07-25 AAPL Earnings Release Apple 3Q 12 Net Up 21% NaN 3Q 12 2Q 12 2012-06-30 $ 35023 14994 11573 8824 9.32 -10.21 1 2012-07-25 24
3 1496807 2012-10-25 2012-10-26 AAPL Earnings Release Apple 4Q 12 Net Up 24% NaN 4Q 12 3Q 12 2012-09-29 $ 35966 14401 10944 8223 8.67 -2.03 1 2012-10-26 24

Now suppose we want a DataFrame of all earnings releases with revenue over 30 billion dollars. For those earnings releases, we only want the sid and the asof_date.


In [6]:
# manipulate with Blaze first:
big_earnings = earnings_releases[earnings_releases.revenue > 40000]
# now that we've got a much smaller object (len: ~2167 rows), we can convert it to a pandas DataFrame
df = odo(big_earnings, pd.DataFrame)
df = df[['sid', 'asof_date','revenue']].dropna()
df.sort('revenue',ascending=False)


Out[6]:
sid asof_date revenue
510 26503 2013-01-24 8743000
491 26503 2012-10-26 7593000
657 26503 2015-04-22 7022000
474 26503 2012-07-26 6910000
529 26503 2013-04-22 6803000
98 7543 2008-02-05 6709983
342 26503 2010-10-21 6698000
443 26503 2012-01-27 6610000
590 7543 2014-02-04 6585044
563 26503 2013-10-17 6579000
547 26503 2013-07-19 6572000
49 7543 2007-08-03 6522637
362 26503 2011-01-21 6483000
324 26503 2010-07-22 6454000
576 7543 2013-11-06 6282166
417 26503 2011-10-20 6269000
559 7543 2013-08-02 6255319
155 7543 2008-08-07 6220000
457 26503 2012-04-24 6184000
283 26503 2010-01-20 6082000
398 26503 2011-07-21 6047000
182 7543 2008-11-06 5975275
258 26503 2009-10-15 5974000
310 26503 2010-04-22 5876000
485 7543 2012-08-03 5501573
503 7543 2012-11-05 5406781
381 26503 2011-04-18 5366000
519 7543 2013-02-05 5318752
301 7543 2010-02-04 5292890
231 26503 2009-07-16 4891000
... ... ... ...
21 3149 2007-07-13 42316
104 3149 2008-04-11 42243
13 24074 2007-04-26 42156
625 24 2014-10-20 42123
629 23052 2014-10-28 42114
642 3149 2015-01-23 42004
169 18711 2008-10-28 41723
391 7538 2011-04-29 41602
415 22899 2011-10-19 41562
423 7538 2011-10-28 41525
135 2673 2008-07-24 41500
286 3149 2010-01-22 41438
361 3149 2011-01-21 41377
266 23998 2009-10-28 41305
605 42788 2014-04-30 41099
76 2673 2007-11-08 41078
499 1091 2012-11-02 41050
500 11100 2012-11-02 41050
633 42788 2014-10-29 41048
645 23052 2015-01-29 40959
380 22899 2011-04-18 40952
139 18711 2008-07-28 40569
591 40430 2014-02-06 40485
408 7538 2011-07-29 40465
580 3149 2014-01-17 40382
673 23112 2015-07-31 40357
670 23052 2015-07-28 40277
247 23112 2009-07-31 40205
9 3149 2007-04-13 40195
377 7538 2011-02-11 40157

670 rows × 3 columns


In [ ]: