EventVestor: Earnings Releases

In this notebook, we'll take a look at EventVestor's Earnings Releases dataset, available on the Quantopian Store. This dataset spans January 01, 2007 through the current day, and documents quarterly earnings releases.

Blaze

Before we dig into the data, we want to tell you about how you generally access Quantopian Store data sets. These datasets are available through an API service known as Blaze. Blaze provides the Quantopian user with a convenient interface to access very large datasets.

Blaze provides an important function for accessing these datasets. Some of these sets are many millions of records. Bringing that data directly into Quantopian Research directly just is not viable. So Blaze allows us to provide a simple querying interface and shift the burden over to the server side.

It is common to use Blaze to reduce your dataset in size, convert it over to Pandas and then to use Pandas for further computation, manipulation and visualization.

Helpful links:

Once you've limited the size of your Blaze object, you can convert it to a Pandas DataFrames using:

from odo import odo
odo(expr, pandas.DataFrame)

Free samples and limits

One other key caveat: we limit the number of results returned from any given expression to 10,000 to protect against runaway memory usage. To be clear, you have access to all the data server side. We are limiting the size of the responses back from Blaze.

There is a free version of this dataset as well as a paid one. The free one includes about three years of historical data, though not up to the current day.

With preamble in place, let's get started:



In [1]:

    
# import the dataset
from quantopian.interactive.data.eventvestor import earnings_releases
# or if you want to import the free dataset, use:
# from quantopian.interactivedata.eventvestor import earnings_releases_free

# import data operations
from odo import odo
# import other libraries we will use
import pandas as pd



In [2]:

    
# Let's use blaze to understand the data a bit using Blaze dshape()
earnings_releases.dshape









    Out[2]:





dshape("""var * {
  event_id: ?float64,
  asof_date: datetime,
  trade_date: ?datetime,
  symbol: ?string,
  event_type: ?string,
  event_headline: ?string,
  event_phase: ?string,
  fiscal_period: ?string,
  calendar_period: ?string,
  fiscal_periodend: ?datetime,
  currency: ?string,
  revenue: ?float64,
  gross_income: ?float64,
  operating_income: ?float64,
  net_income: ?float64,
  eps: ?float64,
  eps_surprisepct: ?float64,
  event_rating: ?float64,
  timestamp: datetime,
  sid: ?int64
  }""")



In [3]:

    
# And how many rows are there?
# N.B. we're using a Blaze function to do this, not len()
earnings_releases.count()









    Out[3]:




139427



In [4]:

    
# Let's see what the data looks like. We'll grab the first three rows.
earnings_releases[:3]









    Out[4]:





  
    
      
      event_id
      asof_date
      trade_date
      symbol
      event_type
      event_headline
      event_phase
      fiscal_period
      calendar_period
      fiscal_periodend
      currency
      revenue
      gross_income
      operating_income
      net_income
      eps
      eps_surprisepct
      event_rating
      timestamp
      sid
    
  
  
    
      0
      526391
      2007-01-03
      2007-01-04
      ANGO
      Earnings Release
      AngioDynamics 2Q Net up 48%
      NaN
      2Q 07
      4Q 06
      2006-12-02
      $
      24.37
      14.24
      3.0
      2.45
      0.15
      0
      1
      2007-01-04
      26324
    
    
      1
      196507
      2007-01-03
      2007-01-04
      BLUD
      Earnings Release
      Immucor Reports 2Q Results
      NaN
      NaN
      NaN
      NaT
      NaN
      0.00
      0.00
      0.0
      0.00
      0.00
      0
      1
      2007-01-04
      955
    
    
      2
      180559
      2007-01-03
      2007-01-03
      CALM
      Earnings Release
      CAL-MAINE FOODS REPORTS 2Q 07 RESULTS
      NaN
      2Q 07
      4Q 06
      2006-12-02
      $
      137.74
      24.96
      10.5
      6.40
      0.27
      0
      1
      2007-01-04
      16169

Let's go over the columns:

event_id: the unique identifier for this event.
asof_date: EventVestor's timestamp of event capture.
trade_date: for event announcements made before trading ends, trade_date is the same as event_date. For announcements issued after market close, trade_date is next market open day.
symbol: stock ticker symbol of the affected company.
event_type: this should always be Earnings Release/Earnings release.
event_headline: a brief description of the event
event_phase: the inclusion of this field is likely an error on the part of the data vendor. We're currently attempting to resolve this.
fiscal_period: fiscal period for the reported earnings, such as 1Q 15, 2Q 15, etc.
calendar_period: identifies the calendar period based on the fiscal period end date. E.g. if the fiscal period ends any time after the middle of a given calendar quarter, like 1Q 15, that calendar quarter will be assigned regardless of the fiscal quarter.
fiscal_periodend: the last date for the reported earnings period.
currency: currency used for reporting earnings.
revenue: revenue in millions
gross_income: gross income in millions
operating_income: operating income in millions
net_income: net income in millions
eps: earnings per share, in the reported currency
eps_surprisepct: the meaning of this column is presently uncertain. We're working with our data vendor to resolve this issue.
event_rating: this is always 1. The meaning of this is uncertain.
timestamp: this is our timestamp on when we registered the data.
sid: the equity's unique identifier. Use this instead of the symbol.

We've done much of the data processing for you. Fields like timestamp and sid are standardized across all our Store Datasets, so the datasets are easy to combine. We have standardized the sid across all our equity databases.

We can select columns and rows with ease. Below, we'll fetch all of Apple's entries from 2012.



In [5]:

    
# get apple's sid first
aapl_sid = symbols('AAPL').sid
aapl_earnings = earnings_releases[('2011-12-31' < earnings_releases['asof_date']) & (earnings_releases['asof_date'] <'2013-01-01') & (earnings_releases.sid==aapl_sid)]
# When displaying a Blaze Data Object, the printout is automatically truncated to ten rows.
aapl_earnings.sort('asof_date')









    Out[5]:





  
    
      
      event_id
      asof_date
      trade_date
      symbol
      event_type
      event_headline
      event_phase
      fiscal_period
      calendar_period
      fiscal_periodend
      currency
      revenue
      gross_income
      operating_income
      net_income
      eps
      eps_surprisepct
      event_rating
      timestamp
      sid
    
  
  
    
      0
      1385939
      2012-01-24
      2012-01-25
      AAPL
      Earnings Release
      Apple 1Q 12 Net Jumps to $13B on Higher Revenues
      NaN
      1Q 12
      4Q 11
      2011-12-31
      $
      46333
      20703
      17340
      13064
      13.87
      38.29
      1
      2012-01-25
      24
    
    
      1
      1421108
      2012-04-24
      2012-04-25
      AAPL
      Earnings Release
      Apple 2Q 12 Net Up 94% on Higher Revenues
      NaN
      2Q 12
      1Q 12
      2012-03-31
      $
      39186
      18564
      15384
      11622
      12.30
      23.74
      1
      2012-04-25
      24
    
    
      2
      1456685
      2012-07-24
      2012-07-25
      AAPL
      Earnings Release
      Apple 3Q 12 Net Up 21%
      NaN
      3Q 12
      2Q 12
      2012-06-30
      $
      35023
      14994
      11573
      8824
      9.32
      -10.21
      1
      2012-07-25
      24
    
    
      3
      1496807
      2012-10-25
      2012-10-26
      AAPL
      Earnings Release
      Apple 4Q 12 Net Up 24%
      NaN
      4Q 12
      3Q 12
      2012-09-29
      $
      35966
      14401
      10944
      8223
      8.67
      -2.03
      1
      2012-10-26
      24

Now suppose we want a DataFrame of all earnings releases with revenue over 30 billion dollars. For those earnings releases, we only want the sid and the asof_date.



In [6]:

    
# manipulate with Blaze first:
big_earnings = earnings_releases[earnings_releases.revenue > 40000]
# now that we've got a much smaller object (len: ~2167 rows), we can convert it to a pandas DataFrame
df = odo(big_earnings, pd.DataFrame)
df = df[['sid', 'asof_date','revenue']].dropna()
df.sort('revenue',ascending=False)









    Out[6]:






  
    
      
      sid
      asof_date
      revenue
    
  
  
    
      510
      26503
      2013-01-24
      8743000
    
    
      491
      26503
      2012-10-26
      7593000
    
    
      657
      26503
      2015-04-22
      7022000
    
    
      474
      26503
      2012-07-26
      6910000
    
    
      529
      26503
      2013-04-22
      6803000
    
    
      98
      7543
      2008-02-05
      6709983
    
    
      342
      26503
      2010-10-21
      6698000
    
    
      443
      26503
      2012-01-27
      6610000
    
    
      590
      7543
      2014-02-04
      6585044
    
    
      563
      26503
      2013-10-17
      6579000
    
    
      547
      26503
      2013-07-19
      6572000
    
    
      49
      7543
      2007-08-03
      6522637
    
    
      362
      26503
      2011-01-21
      6483000
    
    
      324
      26503
      2010-07-22
      6454000
    
    
      576
      7543
      2013-11-06
      6282166
    
    
      417
      26503
      2011-10-20
      6269000
    
    
      559
      7543
      2013-08-02
      6255319
    
    
      155
      7543
      2008-08-07
      6220000
    
    
      457
      26503
      2012-04-24
      6184000
    
    
      283
      26503
      2010-01-20
      6082000
    
    
      398
      26503
      2011-07-21
      6047000
    
    
      182
      7543
      2008-11-06
      5975275
    
    
      258
      26503
      2009-10-15
      5974000
    
    
      310
      26503
      2010-04-22
      5876000
    
    
      485
      7543
      2012-08-03
      5501573
    
    
      503
      7543
      2012-11-05
      5406781
    
    
      381
      26503
      2011-04-18
      5366000
    
    
      519
      7543
      2013-02-05
      5318752
    
    
      301
      7543
      2010-02-04
      5292890
    
    
      231
      26503
      2009-07-16
      4891000
    
    
      ...
      ...
      ...
      ...
    
    
      21
      3149
      2007-07-13
      42316
    
    
      104
      3149
      2008-04-11
      42243
    
    
      13
      24074
      2007-04-26
      42156
    
    
      625
      24
      2014-10-20
      42123
    
    
      629
      23052
      2014-10-28
      42114
    
    
      642
      3149
      2015-01-23
      42004
    
    
      169
      18711
      2008-10-28
      41723
    
    
      391
      7538
      2011-04-29
      41602
    
    
      415
      22899
      2011-10-19
      41562
    
    
      423
      7538
      2011-10-28
      41525
    
    
      135
      2673
      2008-07-24
      41500
    
    
      286
      3149
      2010-01-22
      41438
    
    
      361
      3149
      2011-01-21
      41377
    
    
      266
      23998
      2009-10-28
      41305
    
    
      605
      42788
      2014-04-30
      41099
    
    
      76
      2673
      2007-11-08
      41078
    
    
      499
      1091
      2012-11-02
      41050
    
    
      500
      11100
      2012-11-02
      41050
    
    
      633
      42788
      2014-10-29
      41048
    
    
      645
      23052
      2015-01-29
      40959
    
    
      380
      22899
      2011-04-18
      40952
    
    
      139
      18711
      2008-07-28
      40569
    
    
      591
      40430
      2014-02-06
      40485
    
    
      408
      7538
      2011-07-29
      40465
    
    
      580
      3149
      2014-01-17
      40382
    
    
      673
      23112
      2015-07-31
      40357
    
    
      670
      23052
      2015-07-28
      40277
    
    
      247
      23112
      2009-07-31
      40205
    
    
      9
      3149
      2007-04-13
      40195
    
    
      377
      7538
      2011-02-11
      40157
    
  

670 rows × 3 columns



In [ ]:

	event_id	asof_date	trade_date	symbol	event_type	event_headline	event_phase	fiscal_period	calendar_period	fiscal_periodend	currency	revenue	gross_income	operating_income	net_income	eps	event_rating	timestamp	sid
0	526391	2007-01-03	2007-01-04	ANGO	Earnings Release	AngioDynamics 2Q Net up 48%	NaN	2Q 07	4Q 06	2006-12-02	$	24.37	14.24	3.0	2.45	0.15	1	2007-01-04	26324
1	196507	2007-01-03	2007-01-04	BLUD	Earnings Release	Immucor Reports 2Q Results	NaN	NaN	NaN	NaT	NaN	0.00	0.00	0.0	0.00	0.00	1	2007-01-04	955
2	180559	2007-01-03	2007-01-03	CALM	Earnings Release	CAL-MAINE FOODS REPORTS 2Q 07 RESULTS	NaN	2Q 07	4Q 06	2006-12-02	$	137.74	24.96	10.5	6.40	0.27	1	2007-01-04	16169

	event_id	asof_date	trade_date	symbol	event_type	event_headline	event_phase	fiscal_period	calendar_period	fiscal_periodend	currency	revenue	gross_income	operating_income	net_income	eps	eps_surprisepct	event_rating	timestamp	sid
0	1385939	2012-01-24	2012-01-25	AAPL	Earnings Release	Apple 1Q 12 Net Jumps to $13B on Higher Revenues	NaN	1Q 12	4Q 11	2011-12-31	$	46333	20703	17340	13064	13.87	38.29	1	2012-01-25	24
1	1421108	2012-04-24	2012-04-25	AAPL	Earnings Release	Apple 2Q 12 Net Up 94% on Higher Revenues	NaN	2Q 12	1Q 12	2012-03-31	$	39186	18564	15384	11622	12.30	23.74	1	2012-04-25	24
2	1456685	2012-07-24	2012-07-25	AAPL	Earnings Release	Apple 3Q 12 Net Up 21%	NaN	3Q 12	2Q 12	2012-06-30	$	35023	14994	11573	8824	9.32	-10.21	1	2012-07-25	24
3	1496807	2012-10-25	2012-10-26	AAPL	Earnings Release	Apple 4Q 12 Net Up 24%	NaN	4Q 12	3Q 12	2012-09-29	$	35966	14401	10944	8223	8.67	-2.03	1	2012-10-26	24

	sid	asof_date	revenue
510	26503	2013-01-24	8743000
491	26503	2012-10-26	7593000
657	26503	2015-04-22	7022000
474	26503	2012-07-26	6910000
529	26503	2013-04-22	6803000
98	7543	2008-02-05	6709983
342	26503	2010-10-21	6698000
443	26503	2012-01-27	6610000
590	7543	2014-02-04	6585044
563	26503	2013-10-17	6579000
547	26503	2013-07-19	6572000
49	7543	2007-08-03	6522637
362	26503	2011-01-21	6483000
324	26503	2010-07-22	6454000
576	7543	2013-11-06	6282166
417	26503	2011-10-20	6269000
559	7543	2013-08-02	6255319
155	7543	2008-08-07	6220000
457	26503	2012-04-24	6184000
283	26503	2010-01-20	6082000
398	26503	2011-07-21	6047000
182	7543	2008-11-06	5975275
258	26503	2009-10-15	5974000
310	26503	2010-04-22	5876000
485	7543	2012-08-03	5501573
503	7543	2012-11-05	5406781
381	26503	2011-04-18	5366000
519	7543	2013-02-05	5318752
301	7543	2010-02-04	5292890
231	26503	2009-07-16	4891000
...	...	...	...
21	3149	2007-07-13	42316
104	3149	2008-04-11	42243
13	24074	2007-04-26	42156
625	24	2014-10-20	42123
629	23052	2014-10-28	42114
642	3149	2015-01-23	42004
169	18711	2008-10-28	41723
391	7538	2011-04-29	41602
415	22899	2011-10-19	41562
423	7538	2011-10-28	41525
135	2673	2008-07-24	41500
286	3149	2010-01-22	41438
361	3149	2011-01-21	41377
266	23998	2009-10-28	41305
605	42788	2014-04-30	41099
76	2673	2007-11-08	41078
499	1091	2012-11-02	41050
500	11100	2012-11-02	41050
633	42788	2014-10-29	41048
645	23052	2015-01-29	40959
380	22899	2011-04-18	40952
139	18711	2008-07-28	40569
591	40430	2014-02-06	40485
408	7538	2011-07-29	40465
580	3149	2014-01-17	40382
673	23112	2015-07-31	40357
670	23052	2015-07-28	40277
247	23112	2009-07-31	40205
9	3149	2007-04-13	40195
377	7538	2011-02-11	40157