EventVestor: Clinical Trials

In this notebook, we'll take a look at EventVestor's Clinical Trials dataset, available on the Quantopian Store. This dataset spans January 01, 2007 through the current day, and documents announcements of key phases of clinical trials by biotech/pharmaceutical companies.

Blaze

Before we dig into the data, we want to tell you about how you generally access Quantopian Store data sets. These datasets are available through an API service known as Blaze. Blaze provides the Quantopian user with a convenient interface to access very large datasets.

Blaze provides an important function for accessing these datasets. Some of these sets are many millions of records. Bringing that data directly into Quantopian Research directly just is not viable. So Blaze allows us to provide a simple querying interface and shift the burden over to the server side.

It is common to use Blaze to reduce your dataset in size, convert it over to Pandas and then to use Pandas for further computation, manipulation and visualization.

Helpful links:

Once you've limited the size of your Blaze object, you can convert it to a Pandas DataFrames using:

from odo import odo
odo(expr, pandas.DataFrame)

Free samples and limits

One other key caveat: we limit the number of results returned from any given expression to 10,000 to protect against runaway memory usage. To be clear, you have access to all the data server side. We are limiting the size of the responses back from Blaze.

There is a free version of this dataset as well as a paid one. The free one includes about three years of historical data, though not up to the current day.

With preamble in place, let's get started:


In [2]:
# import the dataset
from quantopian.interactive.data.eventvestor import clinical_trials
# or if you want to import the free dataset, use:
# from quantopian.data.eventvestor import clinical_trials_free

# import data operations
from odo import odo
# import other libraries we will use
import pandas as pd

In [3]:
# Let's use blaze to understand the data a bit using Blaze dshape()
clinical_trials.dshape


Out[3]:
dshape("""var * {
  event_id: ?float64,
  asof_date: datetime,
  trade_date: ?datetime,
  symbol: ?string,
  event_type: ?string,
  event_headline: ?string,
  clinical_phase: ?string,
  clinical_scope: ?string,
  clinical_result: ?string,
  product_name: ?string,
  event_rating: ?float64,
  timestamp: datetime,
  sid: ?int64
  }""")

In [4]:
# And how many rows are there?
# N.B. we're using a Blaze function to do this, not len()
clinical_trials.count()


Out[4]:
9118

In [5]:
# Let's see what the data looks like. We'll grab the first three rows.
clinical_trials[:3]


Out[5]:
event_id asof_date trade_date symbol event_type event_headline clinical_phase clinical_scope clinical_result product_name event_rating timestamp sid
0 138303 2007-01-03 2007-01-03 IMCL Clinical Trials ImClone Systems Commences Patient Treatment in... Phase I NaN NaN IMC-3G3 1 2007-01-04 3871
1 138180 2007-01-04 2007-01-04 DNA Clinical Trials Genentech Announces Positive Results From Rand... Phase II NaN Positive Pertuzumab 1 2007-01-05 24847
2 952759 2007-01-04 2007-01-04 VICL Clinical Trials Vical Initiates Pivotal Phase 3 Trial of Allov... Phase III NaN NaN Allovectin-7 1 2007-01-05 8763

Let's go over the columns:

  • event_id: the unique identifier for this clinical trial.
  • asof_date: EventVestor's timestamp of event capture.
  • trade_date: for event announcements made before trading ends, trade_date is the same as event_date. For announcements issued after market close, trade_date is next market open day.
  • symbol: stock ticker symbol of the affected company.
  • event_type: this should always be Clinical Trials.
  • event_headline: a short description of the event.
  • clinical_phase: phases include 0, I, II, III, IV, Pre-Clinical
  • clinical_scope: types of scope include additional indications, all indications, limited indications
  • clinical_result: result types include negative, partial, positive
  • product_name: name of the drug being investigated.
  • event_rating: this is always 1. The meaning of this is uncertain.
  • timestamp: this is our timestamp on when we registered the data.
  • sid: the equity's unique identifier. Use this instead of the symbol.

We've done much of the data processing for you. Fields like timestamp and sid are standardized across all our Store Datasets, so the datasets are easy to combine. We have standardized the sid across all our equity databases.

We can select columns and rows with ease. Below, we'll fetch all phase-3 announcements. We'll only display the columns for the sid and the drug name.


In [6]:
phase_three = clinical_trials[clinical_trials.clinical_phase == "Phase III"][['timestamp', 'sid','product_name']].sort('timestamp')
# When displaying a Blaze Data Object, the printout is automatically truncated to ten rows.
phase_three


Out[6]:
timestamp sid product_name
0 2007-01-05 8763 Allovectin-7
1 2007-01-09 1416 FENTORA
2 2007-01-11 3871 ERBITUX
3 2007-01-25 8763 Allovectin-7
4 2007-02-09 24415 Xibrom
5 2007-02-23 24847 Avastin
6 2007-04-05 3871 ERBITUX (Cetuximab)
7 2007-04-11 3871 ERBITUX
8 2007-04-17 3871 ERBITUX (Cetuximab)
9 2007-04-26 23846 BEMA Fentanyl
10 2007-04-27 5847 Nuvion

Finally, suppose we want a DataFrame of GlaxoSmithKline Phase-III announcements, sorted in descending order by date:


In [7]:
gsk_sid = symbols('GSK').sid

In [8]:
gsk = clinical_trials[clinical_trials.sid==gsk_sid].sort('timestamp',ascending=False)
gsk_df = odo(gsk, pd.DataFrame)
# now filter down to the Phase 4 trials
gsk_df = gsk_df[gsk_df.clinical_phase=="Phase III"]
gsk_df


Out[8]:
event_id asof_date trade_date symbol event_type event_headline clinical_phase clinical_scope clinical_result product_name event_rating timestamp sid
0 1937202 2015-09-27 2015-09-28 GSK Clinical Trials GlaxoSmithKline Reports Positive Results From ... Phase III NaN Positive Anoro Ellipta 1 2015-09-29 11:17:13.121838 3242
3 1836852 2015-02-09 2015-02-09 GSK Clinical Trials GlaxoSmithKline and Theravance Initiate Phase ... Phase III NaN NaN fluticasone furoate/umeclidinium/vilanterol ( 1 2015-02-10 00:00:00 3242
4 1817331 2014-12-18 2014-12-18 GSK Clinical Trials GlaxoSmithKline Reports ZOE-50 Phase 3 Study M... Phase III NaN Positive ZOE-50 1 2014-12-19 00:00:00 3242
5 1745987 2014-07-16 2014-07-16 GSK Clinical Trials GlaxoSmithKline & Theravance Initiates Phase I... Phase III NaN NaN IMPACT 1 2014-07-17 00:00:00 3242
6 1738566 2014-06-25 2014-06-25 GSK Clinical Trials GlaxoSmithKline Initiates Phase 3 Study with E... Phase III NaN NaN Eltrombopag 1 2014-06-26 00:00:00 3242
7 1735091 2014-06-13 2014-06-13 GSK Clinical Trials GlaxoSmithKline Reports Phase 3 PETIT2 Study M... Phase III NaN Positive PETIT2 1 2014-06-14 00:00:00 3242
8 1734216 2014-06-11 2014-06-11 GSK Clinical Trials GlaxoSmithKline Reports Positive Results from ... Phase III NaN Positive Incruse Ellipta 1 2014-06-12 00:00:00 3242
10 1707265 2014-04-22 2014-04-22 GSK Clinical Trials GlaxoSmithKline and Theravance Starts Phase II... Phase III NaN NaN FF/VI 1 2014-04-23 00:00:00 3242
11 1700157 2014-04-02 2014-04-02 GSK Clinical Trials GlaxoSmithKline to Stop MAGE-A3 Cancer Immunot... Phase III NaN NaN MAGRITi 1 2014-04-03 00:00:00 3242
12 1695526 2014-03-20 2014-03-20 GSK Clinical Trials GlaxoSmithKline's MAGE-A3 Cancer Immunotherape... Phase III NaN Negative MAGE-A3 1 2014-03-21 00:00:00 3242
13 1693181 2014-03-14 2014-03-14 GSK Clinical Trials GlaxoSmithKline & Theravance Reports Positve R... Phase III NaN Positive Anoro Ellipta 1 2014-03-15 00:00:00 3242
15 1653485 2013-12-06 2013-12-06 GSK Clinical Trials GlaxoSmithKline and Theravance Announces Posit... Phase III NaN Positive Fluticasone Furoate 1 2013-12-07 00:00:00 3242
17 1647384 2013-11-12 2013-11-12 GSK Clinical Trials GlaxoSmithKline Announces Phase III Stability ... Phase III NaN Negative Darapladib 1 2013-11-13 00:00:00 3242
18 1620476 2013-09-05 2013-09-05 GSK Clinical Trials GlaxoSmithKline's MAGE-A3 Vaccine Fails to Mee... Phase III NaN Negative MAGE-A3 1 2013-09-06 00:00:00 3242
19 1521603 2012-12-19 2012-12-20 GSK Clinical Trials GlaxoSmithKline, Amicus Therapeutics Announce ... Phase III NaN Negative Migalastat HCl 1 2012-12-20 00:00:00 3242
21 1474291 2012-08-24 2012-08-24 GSK Clinical Trials GlaxoSmithKline, Theravance Complete Phase III... Phase III NaN NaN LAMA/LABA 1 2012-08-25 00:00:00 3242
22 1451483 2012-07-11 2012-07-11 GSK Clinical Trials Shionogi-ViiV Healthcare Reports Positive Init... Phase III NaN Positive ING114467 1 2012-07-12 00:00:00 3242
23 1451624 2012-07-11 2012-07-11 GSK Clinical Trials GlaxoSmithKline Reports Positive Results in Ph... Phase III NaN Positive Albiglutide 1 2012-07-12 00:00:00 3242
24 1448947 2012-07-02 2012-07-02 GSK Clinical Trials Theravance and GlaxoSmithKline Report Positive... Phase III NaN Positive LAMA/LABA 1 2012-07-03 00:00:00 3242
26 1414886 2012-04-03 2012-04-03 GSK Clinical Trials GlaxoSmithKline Reports Further Positive Resul... Phase III NaN Positive Albiglutide 1 2012-04-04 00:00:00 3242
27 1381734 2012-01-09 2012-01-09 GSK Clinical Trials GlaxoSmithKline, Theravance Report Initial Res... Phase III NaN Partial Relovair 1 2012-01-10 00:00:00 3242
28 1376242 2011-12-15 2011-12-15 GSK Clinical Trials GlaxoSmithKline and Human Genome Initiate Phas... Phase III NaN NaN BENLYSTA 1 2011-12-16 00:00:00 3242
29 1352859 2011-10-18 2011-10-18 GSK Clinical Trials GlaxoSmithKline Reports Positive Results from ... Phase III NaN Positive RTS,S 1 2011-10-19 00:00:00 3242
30 1351145 2011-10-11 2011-10-11 GSK Clinical Trials GlaxoSmithKline and Pfizer JV Initiates Phase ... Phase III NaN NaN Celsentri/Selzentry; emtricitabine/tenofovir 1 2011-10-12 00:00:00 3242
31 1336573 2011-09-12 2011-09-12 GSK Clinical Trials GlaxoSmithKline and Amicus Therapeutics Initia... Phase III NaN NaN Amigal 1 2011-09-13 00:00:00 3242
32 1335427 2011-08-15 2011-08-15 GSK Clinical Trials GlaxoSmithKline Reports Positive Results for I... Phase III NaN Positive IPX066 1 2011-08-16 00:00:00 3242
33 1332195 2011-07-26 2011-07-26 GSK Clinical Trials GlaxoSmithKline Reports Positive Data from Pro... Phase III NaN Positive ENABLE-1 1 2011-07-27 00:00:00 3242
34 1301739 2011-06-02 2011-06-02 GSK Clinical Trials GlaxoSmithKline and Theravance Announce Positi... Phase III NaN Positive Relovair 1 2011-06-03 00:00:00 3242
36 1249526 2011-02-07 2011-02-07 GSK Clinical Trials GlaxoSmithKline, Human Genome Announce Positiv... Phase III NaN Positive BENLYSTA 1 2011-02-08 00:00:00 3242
37 1249289 2011-02-03 2011-02-03 GSK Clinical Trials GSK and Theravance Announce Progression of LAM... Phase III NaN Positive GSK573719/vilanterol 1 2011-02-04 00:00:00 3242
39 1188282 2010-10-21 2010-10-21 GSK Clinical Trials GlaxoSmithKline JV Initiates Phase III Trial f... Phase III NaN NaN S/GSK1349572 1 2010-10-22 00:00:00 3242
42 1126301 2010-06-17 2010-06-17 GSK Clinical Trials GlaxoSmithKline Announces Positive Result in P... Phase III NaN Partial BENLYSTA 1 2010-06-18 00:00:00 3242
43 1126332 2010-06-17 2010-06-17 GSK Clinical Trials GlaxoSmithKline Announces Positive Phase 3 Res... Phase III NaN Positive BENLYSTA 1 2010-06-18 00:00:00 3242
44 1089424 2010-04-20 2010-04-20 GSK Clinical Trials GlaxoSmithKline, Human Genome Announce Failure... Phase III Limited Indications Negative BENLYSTA 1 2010-04-21 00:00:00 3242
46 1004093 2009-11-02 2009-11-02 GSK Clinical Trials GlaxoSmithKline Reports Positive Results in Se... Phase III NaN Positive BENLYSTA 1 2009-11-03 00:00:00 3242
47 1000032 2009-10-27 2009-10-27 GSK Clinical Trials GlaxoSmithKline Commences Phase III Horizon Pr... Phase III NaN NaN COPD 1 2009-10-28 00:00:00 3242
48 976852 2009-10-20 2009-10-20 GSK Clinical Trials GlaxoSmithKline Announces Positive Phase3 Resu... Phase III NaN Positive Belimumab 1 2009-10-21 00:00:00 3242
53 537694 2009-02-17 2009-02-17 GSK Clinical Trials GSK Initiates Phase III Programme for Novel Ty... Phase III NaN NaN GLP-1 1 2009-02-18 00:00:00 3242
56 522433 2008-12-06 2008-12-08 GSK Clinical Trials GSK, Valeant's Retigabine Reduces Seizures in ... Phase III NaN Positive Retigabine 1 2008-12-07 00:00:00 3242
70 147654 2008-02-28 2008-02-28 GSK Clinical Trials GlaxoSmithKline and XenoPort Get Positive Resu... Phase III NaN Positive XP13512 1 2008-02-29 00:00:00 3242

In [ ]: