Estimize: Analyst-by-Analyst Estimates

In this notebook, we'll take a look at Estimizes's Analyst-by-Analyst Estimates dataset, available on the Quantopian. This dataset spans January, 2010 through the current day.

This data contains a record for every estimate made by an individual on the Estimize product. By comparison, the Estimize Revisions product provides rolled-up consensus numbers for each possible earnings announcement.

In this notebook, we'll examine these detailed estimates and pull in that consensus data as well.

Blaze

Before we dig into the data, we want to tell you about how you generally access Quantopian Store data sets. These datasets are available using the Blaze library. Blaze provides the Quantopian user with a convenient interface to access very large datasets.

Blaze provides an important function for accessing these datasets. Some of these sets are many millions of records. Bringing that data directly into Quantopian Research directly just is not viable. So Blaze allows us to provide a simple querying interface and shift the burden over to the server side.

To learn more about using Blaze and generally accessing Quantopian Store data, clone this tutorial notebook.

Free samples and limits

A few key caveats:

1) We limit the number of results returned from any given expression to 10,000 to protect against runaway memory usage. To be clear, you have access to all the data server side. We are limiting the size of the responses back from Blaze.

2) There is a free version of this dataset as well as a paid one. The free one includes about three years of historical data, though not up to the current day.

With preamble in place, let's get started:


In [1]:
# import the free sample of the dataset
from quantopian.interactive.data.estimize import estimates_free

# or if you want to import the full dataset, use:
# from quantopian.interactive.data.estimize import estimates

# import data operations
from odo import odo
# import other libraries we will use
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
# Let's use blaze to understand the data a bit using Blaze dshape()
estimates_free.dshape


Out[2]:
dshape("""var * {
  analyst_id: ?string,
  asof_date: datetime,
  eps: ?float64,
  fiscal_quarter: ?float64,
  fiscal_year: ?float64,
  id: ?string,
  revenue: ?float64,
  symbol: ?string,
  username: ?string,
  timestamp: datetime,
  sid: ?int64
  }""")

In [3]:
# And how many rows are in this free sample?
# N.B. we're using a Blaze function to do this, not len()
estimates_free.count()


Out[3]:
120400

In [4]:
# Let's see what the data looks like. We'll grab the first three rows.
estimates_free.head(3)


Out[4]:
analyst_id asof_date eps fiscal_quarter fiscal_year id revenue symbol username timestamp sid
0 4e679bb77cb02d0b6700000f 2010-01-02 17:00:00 0.90 1 2011 4e6dee5a7cb02d2adc000014 26430 AAPL postsateventide 2010-01-02 17:20:00 24
1 4e679bb77cb02d0b67000005 2010-09-28 16:00:00 0.63 4 2010 4e6df18f7cb02d2adc000024 19530 AAPL DennisHildebrand 2010-09-28 16:20:00 24
2 4e679bb87cb02d0b6700001b 2010-09-28 16:00:00 0.71 4 2010 4e6df0977cb02d2adc00001f 20500 AAPL asymco 2010-09-28 16:20:00 24

Let's go over the columns:

  • analyst_id: the unique identifier assigned by Estimize for the person making the estimate.
  • asof_date: Estimize's timestamp of event capture.
  • eps: EPS estimate made by the analyst on the asof_date
  • fiscal_quarter: fiscal quarter for which this estimate is made, related to fiscal_year
  • fiscal_year: fiscal year for which this estimate is made, related to fiscal_quarter
  • revenue: revenue estimate made by the analyst on the asof_date
  • symbol: ticker symbol provided by Estimize for the company for whom these estimates have been made
  • username: Estimize username of the analyst making this estimate
  • timestamp: the datetime when Quantopian registered the data. For data loaded up via initial, historic loads, this timestamp is an estimate.
  • sid: the equity's unique identifier. Use this instead of the symbol. Derived by Quantopian using the symbol and our market data

We've done much of the data processing for you. Fields like asof_date and sid are standardized across all our Store Datasets, so the datasets are easy to combine. We have standardized the sid across all our equity databases.

We can select columns and rows with ease. Below, let's just look at the estimates made for TSLA for a particular quarter. Also, we're filtering out some spurious data:


In [5]:
stocks = symbols('TSLA')
one_quarter = estimates_free[(estimates_free.sid == stocks.sid) &
                 (estimates_free.fiscal_year == '2014') &
                 (estimates_free.fiscal_quarter == '1') &
                 (estimates_free.eps < 100)
                ]
one_quarter.head(5)


Out[5]:
analyst_id asof_date eps fiscal_quarter fiscal_year id revenue symbol username timestamp sid
0 500ee59d810f8d1c49000091 2013-05-30 19:00:13 0.08 1 2014 51a7a1bd810f8d6e27000362 500 TSLA hsctiger2009 2013-05-30 19:20:13 39840
1 4f997632810f8d1aaf0001ec 2013-09-18 14:11:45 0.35 1 2014 5239b4a1b7529b033a001ecc 700 TSLA aarkayne 2013-09-18 14:31:45 39840
2 500ee59d810f8d1c49000091 2013-09-20 21:56:58 0.33 1 2014 523cc4aab7529b009b00b0e1 620 TSLA hsctiger2009 2013-09-20 22:16:58 39840
3 500ee59d810f8d1c49000091 2013-09-26 16:40:26 0.35 1 2014 5244637ab7529bb85c033499 620 TSLA hsctiger2009 2013-09-26 17:00:26 39840
4 524990dfb7529b150a01af9a 2013-09-30 15:49:41 0.30 1 2014 52499d95b7529b6c8201d1ef 590 TSLA wjhughes 2013-09-30 16:09:41 39840

How many records do we have now?


In [6]:
one_quarter.count()


Out[6]:
136

Let's break it down by user:


In [7]:
one_quarter.username.count_values()


Out[7]:
username count
0 Analyst_7066456 3
1 Cwill 3
2 hsctiger2009 3
3 a76marine 3
4 wjbuckner 3
5 phi16 3
6 Mgspooner 3
7 Essential 2
8 Nils1975 2
9 Cassanova23 2
10 golfinguy224 2

Let's convert it over to a Pandas DataFrame so we can chart it and examine it closer


In [31]:
one_q_df = odo(one_quarter.sort('asof_date'), pd.DataFrame)
plt.plot(one_q_df.asof_date, one_q_df.eps, marker='.', linestyle='None', color='r')
plt.xlabel("As Of Date (asof_date)")
plt.ylabel("EPS Estimate")
plt.title("Analyst by Analyst EPS Estimates for TSLA")
plt.legend(["Individual Estimate"], loc=2)


Out[31]:
<matplotlib.legend.Legend at 0x7f6c66b172d0>

That's neat. But let's add in some data from another dataset -- the Estimize Revisions data. For the same timeframe, the revisions data provides each revision to the overall consensus estimates. So where estimates_free data provides every single estimate made by an individual on the Estimize site, revisions_free provides rolled up summaries of the estimates.


In [32]:
from quantopian.interactive.data.estimize import revisions_free
consensus = revisions_free[(revisions_free.sid == stocks.sid) &
                 (revisions_free.fiscal_year == '2014') &
                 (revisions_free.fiscal_quarter == '1') &
                 (revisions_free.source == 'estimize') &
                 (revisions_free.metric == 'eps')
                ]

In [33]:
consensus.head(3)


Out[33]:
count high low mean metric source standard_deviation asof_date consensus_eps_estimate consensus_revenue_estimate eps fiscal_quarter fiscal_year id name release_date revenue symbol wallstreet_eps_estimate wallstreet_revenue_estimate timestamp sid
0 112 0.42 0.04 0.210357 eps estimize 0.095823 2014-05-07 19:22:15.904000 0.210714285714286 719.256071428571 0.12 1 2014 510c4310810f8d63ab004f49 Tesla Motors, Inc. 2014-05-07 20:00:00 713.0 TSLA 0.08 693.425 2014-05-07 19:42:15.904000 39840
1 111 0.42 0.04 0.211622 eps estimize 0.095319 2014-05-07 19:08:04.281000 0.210714285714286 719.256071428571 0.12 1 2014 510c4310810f8d63ab004f49 Tesla Motors, Inc. 2014-05-07 20:00:00 713.0 TSLA 0.08 693.425 2014-05-07 19:28:04.281000 39840
2 110 0.42 0.04 0.212727 eps estimize 0.095040 2014-05-07 18:38:08.005000 0.210714285714286 719.256071428571 0.12 1 2014 510c4310810f8d63ab004f49 Tesla Motors, Inc. 2014-05-07 20:00:00 713.0 TSLA 0.08 693.425 2014-05-07 18:58:08.005000 39840

For this quick demonstration, let's just grab the consensus mean from the revisions_free data set and convert it over to Pandas. Note, we need to rename the mean column name because it causes problems otherwise:


In [13]:
consensus_df = odo(consensus[['asof_date', 'mean']].sort('asof_date'), pd.DataFrame)
consensus_df.rename(columns={'mean':'eps_mean'}, inplace=True)

Let's chart that in the same chart again so we get a trend of the mean over time, overlayed on a chart of each individual analyst estimate:


In [30]:
plt.plot(consensus_df.asof_date, consensus_df.eps_mean)
plt.plot(one_q_df.asof_date, one_q_df.eps, marker='.', linestyle='None', color='r')
plt.xlabel("As Of Date (asof_date)")
plt.ylabel("EPS Estimate")
plt.title("EPS Estimates for TSLA")
plt.legend(["Mean Estimate", "Individual Estimates"], loc=2)


Out[30]:
<matplotlib.legend.Legend at 0x7f6c66ba1dd0>