EventVestor: Contract Wins

In this notebook, we'll take a look at EventVestor's Contract Wins dataset, available on the Quantopian Store. This dataset spans January 01, 2007 through the current day, and documents major contract wins by companies.

Blaze

Before we dig into the data, we want to tell you about how you generally access Quantopian Store data sets. These datasets are available through an API service known as Blaze. Blaze provides the Quantopian user with a convenient interface to access very large datasets.

Blaze provides an important function for accessing these datasets. Some of these sets are many millions of records. Bringing that data directly into Quantopian Research directly just is not viable. So Blaze allows us to provide a simple querying interface and shift the burden over to the server side.

It is common to use Blaze to reduce your dataset in size, convert it over to Pandas and then to use Pandas for further computation, manipulation and visualization.

Helpful links:

Once you've limited the size of your Blaze object, you can convert it to a Pandas DataFrames using:

from odo import odo
odo(expr, pandas.DataFrame)

Free samples and limits

One other key caveat: we limit the number of results returned from any given expression to 10,000 to protect against runaway memory usage. To be clear, you have access to all the data server side. We are limiting the size of the responses back from Blaze.

There is a free version of this dataset as well as a paid one. The free one includes about three years of historical data, though not up to the current day.

With preamble in place, let's get started:


In [2]:
# import the dataset
from quantopian.interactive.data.eventvestor import contract_win
# or if you want to import the free dataset, use:
# from quantopian.data.eventvestor import contract_win_free

# import data operations
from odo import odo
# import other libraries we will use
import pandas as pd

In [3]:
# Let's use blaze to understand the data a bit using Blaze dshape()
contract_win.dshape


Out[3]:
dshape("""var * {
  event_id: ?float64,
  asof_date: datetime,
  trade_date: ?datetime,
  symbol: ?string,
  event_type: ?string,
  event_headline: ?string,
  contract_amount: ?float64,
  amount_units: ?string,
  contract_entity: ?string,
  event_rating: ?float64,
  timestamp: datetime,
  sid: ?int64
  }""")

In [4]:
# And how many rows are there?
# N.B. we're using a Blaze function to do this, not len()
contract_win.count()


Out[4]:
4135

In [5]:
# Let's see what the data looks like. We'll grab the first three rows.
contract_win[:3]


Out[5]:
event_id asof_date trade_date symbol event_type event_headline contract_amount amount_units contract_entity event_rating timestamp sid
0 907471 2007-01-03 2007-01-03 CECE Contract Win CECO Environmental Gets Two Orders for $55M Plus 55.0 $M NaN 1 2007-01-04 1396
1 148887 2007-01-04 2007-01-04 ATK Contract Win Alliant Techsystems Gets $90M Contract from U.... 90.0 $M U.S. Department of Homeland Security 1 2007-01-05 NaN
2 908341 2007-01-04 2007-01-04 BCRX Contract Win BioCryst Pharma Gets $102.6M Contract From US ... 102.6 $M U.S. Department of Health and Human Services 1 2007-01-05 10905

Let's go over the columns:

  • event_id: the unique identifier for this contract win.
  • asof_date: EventVestor's timestamp of event capture.
  • trade_date: for event announcements made before trading ends, trade_date is the same as event_date. For announcements issued after market close, trade_date is next market open day.
  • symbol: stock ticker symbol of the affected company.
  • event_type: this should always be Contract Win.
  • contract_amount: the amount of amount_units the contract is for.
  • amount_units: the currency or other units for the value of the contract. Most commonly in millions of dollars.
  • contract_entity: name of the customer, if available
  • event_rating: this is always 1. The meaning of this is uncertain.
  • timestamp: this is our timestamp on when we registered the data.
  • sid: the equity's unique identifier. Use this instead of the symbol.

We've done much of the data processing for you. Fields like timestamp and sid are standardized across all our Store Datasets, so the datasets are easy to combine. We have standardized the sid across all our equity databases.

We can select columns and rows with ease. Below, we'll fetch all contract wins by Boeing. We'll display only the contract_amount, amount_units, contract_entity, and timestamp. We'll sort by date.


In [6]:
ba_sid = symbols('BA').sid
wins = contract_win[contract_win.sid == ba_sid][['timestamp', 'contract_amount','amount_units','contract_entity']].sort('timestamp')
# When displaying a Blaze Data Object, the printout is automatically truncated to ten rows.
wins


Out[6]:
timestamp contract_amount amount_units contract_entity
0 2007-04-19 2500 $M South Korea
1 2007-04-20 295 $M CIT Aerospace
2 2007-04-24 1600 $M Aviation Capital Group
3 2007-04-25 3600 $M Virgin Atlantic
4 2007-04-27 700 $M SpiceJet
5 2007-05-17 4700 $M TUI Group
6 2007-05-30 2400 $M Russian Airline S7
7 2007-06-01 1900 $M Ryanair Holdings PLC
8 2007-06-05 3000 $M Kuwait Airways
9 2007-06-07 500 $M Philippine Airlines
10 2007-06-19 1420 $M GE Commercial Aviation Services

Finally, suppose we want the above as a DataFrame:


In [7]:
ba_df = odo(wins, pd.DataFrame)
# Printing a pandas DataFrame displays the first 30 and last 30 items, and truncates the middle.
ba_df


Out[7]:
timestamp contract_amount amount_units contract_entity
0 2007-04-19 2500.0 $M South Korea
1 2007-04-20 295.0 $M CIT Aerospace
2 2007-04-24 1600.0 $M Aviation Capital Group
3 2007-04-25 3600.0 $M Virgin Atlantic
4 2007-04-27 700.0 $M SpiceJet
5 2007-05-17 4700.0 $M TUI Group
6 2007-05-30 2400.0 $M Russian Airline S7
7 2007-06-01 1900.0 $M Ryanair Holdings PLC
8 2007-06-05 3000.0 $M Kuwait Airways
9 2007-06-07 500.0 $M Philippine Airlines
10 2007-06-19 1420.0 $M GE Commercial Aviation Services
11 2007-06-20 8800.0 $M International Lease Finance Corp
12 2007-06-21 2700.0 $M Air France KLM
13 2007-07-01 2000.0 $M U.S. Air Force
14 2007-07-06 810.0 $M CIT Aerospace
15 2007-07-09 4000.0 $M Air Berlin
16 2007-08-03 523.0 $M AeroSvit
17 2007-08-04 1100.0 $M Air New Zealand
18 2007-08-09 1400.0 $M Cathay Pacific Airways
19 2007-08-31 3100.0 $M Norwegian Air Shuttle ASA
20 2007-09-07 3800.0 $M China Southern Airlines
21 2007-09-12 1100.0 $M US Air Force
22 2007-10-17 1500.0 $M NaN
23 2007-11-06 5000.0 $M LAN Airlines
24 2007-11-09 5200.0 $M Cathay Pacific
25 2007-11-12 3200.0 $M Emirates
26 2007-11-14 523.0 $M transavia.com
27 2007-11-23 716.0 $M KLM Royal Dutch Airlines
28 2007-12-05 1700.0 $M Lion Air
29 2007-12-11 1500.0 $M Babcock & Brown
... ... ... ... ...
191 2014-01-07 8800.0 $M flydubai
192 2014-01-21 3900.0 $M GE Capital Aviation Services
193 2014-02-06 228.0 $M Linhas Aereas de Mocambique
194 2014-02-15 357.5 $M Cargolux Airlines
195 2014-03-13 4400.0 $M SpiceJet Ltd.
196 2014-03-20 830.0 $M Comair Limited
197 2014-05-01 452.0 $M Ryanair
198 2014-05-31 1100.0 $M Japan Transocean Air
199 2014-06-17 1600.0 $M Turkish Airlines
200 2014-06-27 272.0 $M Belarus flag carrier Belavia Airlines
201 2014-07-10 56000.0 $M Emirates Airline
202 2014-07-15 980.0 $M Okay Airways Company
203 2014-07-15 2000.0 $M Avolon
204 2014-07-16 1900.0 $M Intrepid Aviation
205 2014-07-17 1890.0 $M Qatar Airways
206 2014-07-17 152.0 $M NaN
207 2014-08-26 8800.0 $M BOC Aviation
208 2014-09-18 2100.0 $M Avolon
209 2014-09-21 2100.0 $M Ethiopian Airlines
210 2014-10-07 990.0 $M Alaska Airlines
211 2014-12-02 11000.0 $M Ryanair
212 2014-12-16 438.0 $M Jetlines
213 2014-12-19 186.0 $M BOC Aviation
214 2014-12-24 3300.0 $M Kuwait Airways
215 2015-01-07 514.0 $M Air New Zealand
216 2015-01-07 1240.0 $M Qatar Airways
217 2015-01-16 3600.0 $M Air Europa
218 2015-02-13 1600.0 $M Transavia Company
219 2015-02-13 1500.0 $M Korean Air
220 2015-03-28 900.0 $M All Nippon Airways

221 rows × 4 columns