Finding trials registered on ClinicalTrials.gov that do not have reported results

Reporting of clinical trial results became mandatory for many trials in 2008. However this paper and this investigation both find that substantial numbers of clinical trials have not reported results, even for those trials where the FDAAA has made reporting mandatory.

This notebook examines how many trials on ClinicalTrials.gov have had their results publicly reported. We have a broader definition of a trial that should report its results than the FDAAA. We count a trial as eligible for our analysis if:

it has overall status of 'Completed'
it has a study type of 'Interventional'
its completion date was after 1 Jan 2006, but is more than 24 months ago
it is phase 2 or later (or its phase is N/A, ie it's a trial of a device or a behavioural intervention)
it has no results disposition date (i.e. no application to delay results has been filed).

We then classify it as overdue if it has no summary results attached on ClinicalTrials.gov, and no results on PubMed that are linked by NCT ID (see below).

This is substantially broader than FDAAA, which covers only US-based trials of FDA-approved drugs. However, we think all trials should report their results, not just US-based trials, or FDA-approved drugs. In addition, FDAAA requires results to be reported within 12 months of completion, and we allow 24 months.

ClinicalTrials.gov supplies notes on how to find studies with results and results in general.



In [1]:

    
import csv
from datetime import datetime
from dateutil.relativedelta import relativedelta
import glob
from pprint import pprint
from slugify import slugify

import sqlite3
import numpy as np
import pandas as pd

import utils

Create summary results file

The raw XML trial summaries from ClinicalTrials.gov are supplied as a single very large zip file, containing more than 200,000 XML files. This section assumes that that these have already been downloaded and unzipped in the search_result directory.

Extract the fields of interest from the XML summaries, and save them to a CSV file, which we'll use as our source data for the rest of this exercise. ClinicalTrials.gov supplies field definitions.

Toggle REGENERATE_SUMMARY to False for the purposes of development.



In [2]:

    
fname = './data/trials.csv'
REGENERATE_SUMMARY = False # False
if REGENERATE_SUMMARY:
    files = glob.glob('./search_result/*.xml')
    print len(files), 'files found'
    fieldnames = ['nct_id', 'title', 'overall_status', 
                  'study_type', 'completion_date',
                  'lead_sponsor', 'lead_sponsor_class',
                  'collaborator', 'collaborator_class', 
                  'phase', 'locations', 'has_drug_intervention', 'drugs', 
                  'disposition_date', 'results_date', 
                  'enrollment']
    trials = csv.DictWriter(open(fname, 'wb'), fieldnames=fieldnames)
    trials.writeheader()
    for i, f in enumerate(files):
        if i % 50000 == 0:
            print i, f
        text = open(f, 'r').read()
        data = utils.extract_ctgov_xml(text)
        trials.writerow(data)
print 'done'









    



done

Load data for analysis

Load into Pandas, normalising the date and phase fields. NB: If this produces a weird EOF error (which it does intermittently), delete the last line of the file manually. We will have to live with one missing trial.



In [17]:

    
dtype = {'has_drug_intervention': bool, 
         'phase': str} 
converters = {'enrollment': lambda x: x and int(x) or 0} 
datefields = ['completion_date', 'results_date', 'disposition_date']
df = pd.read_csv(fname,
                 parse_dates=datefields, 
                 infer_datetime_format=True,
                 keep_default_na=False,
                 na_values=None,
                 converters=converters,
                 dtype=dtype)



In [18]:

    
df['phase_normalised'] = df.phase.apply(utils.normalise_phase)



In [19]:

    
df.tail()









    Out[19]:






  
    
      
      nct_id
      title
      overall_status
      study_type
      completion_date
      lead_sponsor
      lead_sponsor_class
      collaborator
      collaborator_class
      phase
      locations
      has_drug_intervention
      drugs
      disposition_date
      results_date
      enrollment
      phase_normalised
    
  
  
    
      240369
      NCT00683956
      Evaluation of a Closed-Loop Control System for...
      Completed
      Interventional
      2009-07-01
      Stanford University
      Other
      
      
      1
      United States
      True
      Propofol;
      NaT
      NaT
      35
      1
    
    
      240370
      NCT01540734
      Pharmacokinetic Study Investigating the Extent...
      Completed
      Interventional
      2009-12-01
      GlaxoSmithKline
      Industry
      
      
      1
      United States
      True
      Marketed paracetamol; Experimental paracetamol...
      NaT
      NaT
      28
      1
    
    
      240371
      NCT02691429
      Applicability of the Acai Fruit (Euterpe Olera...
      Recruiting
      Interventional
      2016-11-01
      Federal University of São Paulo
      Other
      Rafael R. Caiado Cristiane S. Peris Acácio Alv...
      Other Other Other Other Other Other Other Othe...
      N/A
      Brazil
      False
      
      NaT
      NaT
      24
      5
    
    
      240372
      NCT02947750
      Neurovascular Transduction During Exercise in ...
      Recruiting
      Interventional
      2021-11-01
      Emory University
      Other
      National Heart, Lung, and Blood Institute (NHLBI)
      NIH
      2
      United States
      True
      6R-BH4;
      NaT
      NaT
      150
      2
    
    
      240373
      NCT00005693
      Computer Assisted Instruction Weight Managemen...
      Completed
      Observational
      NaT
      National Heart, Lung, and Blood Institute (NHLBI)
      NIH
      
      
      N/A
      
      False
      
      NaT
      NaT
      0
      5

Calculate whether trials are completed

The criteria for counting a trial as completed are defined above. Print some summary stats about completed trials.



In [20]:

    
startdate = datetime.strptime('01 January 2006', '%d %B %Y')
cutoff = datetime.now() - relativedelta(years=2)
print 'Cutoff date', cutoff

df['is_completed'] = (df.overall_status == 'Completed') & \
    (df.study_type.str.startswith('Interventional')) & \
    (df.completion_date >= startdate) & \
    (df.completion_date <= cutoff) & \
    (df.phase_normalised >= 2) & \
    (df.disposition_date.isnull())
df['is_overdue'] = (df.is_completed & \
                    df.results_date.isnull())
df_completed = df[df.is_completed] 
df_overdue = df[df.is_completed & df.results_date.isnull()]

print len(df), 'total trials found'
print len(df[~df.disposition_date.isnull()]), 'trials have dispositions filed'
print len(df_completed), 'are completed and due results, by our definition'
print len(df[df.is_completed & ~df.results_date.isnull()]), \
    'trials due results have submitted results on clinicaltrials.gov'
print len(df_overdue), \
    'trials due results have not submitted results on clinicaltrials.gov'









    



Cutoff date 2015-05-03 11:15:15.882637
240374 total trials found
3621 trials have dispositions filed
54301 are completed and due results, by our definition
16101 trials due results have submitted results on clinicaltrials.gov
38200 trials due results have not submitted results on clinicaltrials.gov

Check for results on PubMed

If trials have reported their results on PubMed, and if it's possible to find them on PubMed using a linked NCT ID, then we count those trials as having submitted results.

So, for all trials that we regard as completed and due results, and that haven't already reported results on clinicaltrials.gov, we search PubMed, looking for the NCT ID either as a Secondary Source ID, or in the title/abstract. We look for anything published between the completion date and now, that doesn't have the words "study protocol" in the title, and that is classified as results of a trial (using the "therapy" clinical keyword, broad version).

At the time of writing, about 9,000 of the 34,000 trials have results on PubMed. An example of an NCT ID with results on PubMed: NCT02460380. (TODO: Update this).

Note 1: we know from the BMJ paper that there are trials that do have results on PubMed, but that aren't linked using the NCT ID. The BMJ authors found these using a manual search. Some examples: NCT00002762: 19487378, NCT00002879: 18470909, NCT00003134: 19066728, NCT00003596: 18430910. We regard these as invalid, because you can only find results via an exhaustive manual search. We only count results as published for our purposes if they are either (i) submitted on ClinicalTrials.gov or (ii) retrievable on PubMed using the NCT ID. See more on this below.

Note 2: we know there are some trials that have results PMIDs directly in ClinicalTrials.gov, in the results_reference field of the XML. After discussion with Jess here, and Annice at ClinicalTrials.gov, I decided that these results are too often meaningless to be useful - lots of the time they aren't truly results, but are studies from years ago.

Note 3: we also experimented with retrieving the results using the narrow version of the "therapy" clinical keyword, and using no clinical keyword at all. We evaluated these by using multiple PubMed matches as surrogate measures for false identification. At the time of writing on 2016/10/24, we examined 34677 trial registry IDs: the PubMed broad keyword yielded 7815 matches with 1706 multiple matches; the PubMed narrow keyword yielded 6448 matches with 1238 multiple matches, and using no keyword yielded 7981 matches with 1860 multiple matches. We chose the broad keyword for our final results.



In [21]:

    
# Store results locally.
conn = sqlite3.connect('./data/trials-abstract.db')
cur = conn.cursor()  
c = "CREATE TABLE IF NOT EXISTS trials(nct_id TEXT PRIMARY KEY, "
c += "pubmed_results INT, pubmed_results_broad INT, pubmed_results_narrow INT)"
cur.execute(c)
conn.commit()

REGENERATE_PUBMED_LINKS = False
count = 0
df['pubmed_results'] = False
for i, row in df_overdue.iterrows():
    if count % 10000 == 0:
        print count, row.nct_id
    count += 1
    # First, check for results stored in the local db.
    c = "SELECT nct_id, pubmed_results, pubmed_results_broad, "
    c += "pubmed_results_narrow FROM trials WHERE nct_id='%s'" % row.nct_id
    cur.execute(c)
    data = cur.fetchone()
    has_results = False
    if data and (not REGENERATE_PUBMED_LINKS):
        has_results = bool(int(data[2]))
    else:
        # No local results, or we want to regenerate them: check PubMed.
        broad_results = \
            utils.get_pubmed_linked_articles(row.nct_id, 
                                             row.completion_date, 'broad')
        # Used in the past (see note 3 above).
        simple_results = \
            utils.get_pubmed_linked_articles(row.nct_id, 
                                             row.completion_date, '')
        narrow_results = \
            utils.get_pubmed_linked_articles(row.nct_id, 
                                             row.completion_date, 'narrow')
        c = "INSERT OR REPLACE INTO trials VALUES('%s', %s, %s, %s)" % \
            (row.nct_id, len(simple_results), len(broad_results), len(narrow_results))
        cur.execute(c)
        conn.commit()
        has_results = broad_results > 0
    df.set_value(i, 'pubmed_results', has_results) 
    
cur.close()
conn.close()
print 'done'









    



0 NCT00135811
10000 NCT00891358
20000 NCT02243930
30000 NCT01196546
done

Calculate final overdue count

Now we have looked for PubMed results, we can calculate the final overdue count, and print some summary statistics.



In [22]:

    
# Reset dataframes now we have the results from PubMed.
df['is_overdue'] = (df.is_completed & df.results_date.isnull() & ~df.pubmed_results)
print 'How many of the unreported trials were found on PubMed:'
print df[df.is_completed & df.results_date.isnull()].pubmed_results.value_counts()
df_completed = df[df.is_completed]
df_overdue = df[df.is_overdue]









    



How many of the unreported trials were found on PubMed:
False    27794
True     10406
Name: pubmed_results, dtype: int64



In [25]:

    
# Print summary stats for the entire dataset. 
print len(df_completed), 'trials should have published results'
print len(df_overdue), 'trials have not published results'
percent_submitted = (1 - (len(df_overdue) / float(len(df_completed)))) * 100
print '%s%% of completed trials have published results' % \
    '{:,.2f}'.format(percent_submitted)
print int(df_overdue.enrollment.sum()), 'total patients are enrolled in overdue trials'









    



54301 trials should have published results
27794 trials have not published results
48.81% of completed trials have published results
30570515 total patients are enrolled in overdue trials



In [26]:

    
# Print summary stats for major trial sponsors only.
NUM_TRIALS = 30
df_major = df_completed[
    df_completed.groupby('lead_sponsor').nct_id.transform(len) >= NUM_TRIALS]
print len(df_major), 'trials by major sponsors should have published results'
print len(df_major[df_major.is_overdue]), 'trials by major sponsors have not published results'
percent_submitted = (1 - (len(df_major[df_major.is_overdue]) / float(len(df_major)))) * 100
print '%s%% of completed trials by major sponsors have published results' % \
    '{:,.2f}'.format(percent_submitted)
print int(df_major[df_major.is_overdue].enrollment.sum()), 'total patients are enrolled in overdue trials'









    



29377 trials by major sponsors should have published results
13266 trials by major sponsors have not published results
54.84% of completed trials by major sponsors have published results
5602426 total patients are enrolled in overdue trials



In [27]:

    
df_completed.groupby('lead_sponsor_class').sum()[['is_overdue', 'is_completed']]









    Out[27]:






  
    
      
      is_overdue
      is_completed
    
    
      lead_sponsor_class
      
      
    
  
  
    
      Industry
      6280.0
      16997.0
    
    
      NIH
      421.0
      1185.0
    
    
      Other
      20778.0
      35267.0
    
    
      U.S. Fed
      315.0
      852.0



In [28]:

    
# Calculate publication rates by sector (raw data)
df_by_sector = df_completed.groupby('lead_sponsor_class').sum()[['is_overdue', 'is_completed']]
df_by_sector['percent_overdue'] = df_by_sector.is_overdue / df_by_sector.is_completed * 100
df_by_sector









    Out[28]:






  
    
      
      is_overdue
      is_completed
      percent_overdue
    
    
      lead_sponsor_class
      
      
      
    
  
  
    
      Industry
      6280.0
      16997.0
      36.947697
    
    
      NIH
      421.0
      1185.0
      35.527426
    
    
      Other
      20778.0
      35267.0
      58.916267
    
    
      U.S. Fed
      315.0
      852.0
      36.971831



In [30]:

    
# Calculate publication rates by sector (major sponsors only)
df_major_gp = df_major.groupby('lead_sponsor_class').sum()[['is_overdue', 'is_completed']]
df_major_gp['percent_overdue'] = df_major_gp.is_overdue / df_major_gp.is_completed * 100
df_major_gp









    Out[30]:






  
    
      
      is_overdue
      is_completed
      percent_overdue
    
    
      lead_sponsor_class
      
      
      
    
  
  
    
      Industry
      2474.0
      9511.0
      26.011986
    
    
      NIH
      345.0
      1030.0
      33.495146
    
    
      Other
      10330.0
      18334.0
      56.343406
    
    
      U.S. Fed
      117.0
      502.0
      23.306773

Write to CSV

Output final results to a CSV file, which we will use in the interactive. We reshape the data so it has a row for each sponsor, and two columns for each year: one column for the number of overdue results, and one for the total trials.

Also, write all the raw data to a single CSV file.



In [31]:

    
df_completed['year_completed'] = df_completed['completion_date'].dt.year.dropna().astype(int)
df_completed['year_completed'] = df_completed.year_completed.astype(int)

# Drop all sponsors with fewer than N completed trials.
df_final = df_completed[
    df_completed.groupby('lead_sponsor').nct_id.transform(len) >= NUM_TRIALS]

# Now reshape the data: a row for each sponsor, columns by year:
# lead_sponsor,2008_overdue,2008_total,2009_overdue,2009_total...
df_temp = df_final.set_index(['lead_sponsor', 'lead_sponsor_class', 'year_completed']) 
gb = df_temp.groupby(level=[0, 1, 2]).is_overdue
df2 = gb.agg({'overdue': 'sum', 'total': 'count'}) \
          .unstack().swaplevel(0, 1, 1).sort_index(1)
df2.columns = df2.columns.to_series().apply(lambda x: '{}_{}'.format(*x))
df3 = df2.reset_index()
df3['lead_sponsor_slug'] = df3.lead_sponsor.apply(slugify)









    



/usr/local/lib/python2.7/dist-packages/ipykernel/__main__.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':
/usr/local/lib/python2.7/dist-packages/ipykernel/__main__.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app



In [33]:

    
df3.to_csv('./data/completed.csv', index=None)
print len(df3), 'sponsors found with cutoff point at %s trials' % NUM_TRIALS

# Write the raw output to a full spreadsheet. 
df.to_csv('./data/all.csv', index=None)









    



323 sponsors found with cutoff point at 30 trials

For reference: Compare our results with BMJ authors

TODO: Make this a separate notebook?

A 2016 BMJ paper found that around 65% of papers reprted results. "Overall, 2892 of the 4347 clinical trials (66.5%) had been published or reported results as of July 2014."

Excellently, the BMJ authors publish their raw data on DataDryad so we can compare our results with theirs, to get an idea of the difference between our automated strategy and their partially manual strategy. (However, in their reported data it looks to me like the matched PMID rate is 59.9% of all NCT IDs.)

The BMJ authors were looking at a much smaller set of papers than us, because they focussed on academic medical centres. Their set is slightly different, because they include pre-Phase-2 trials, and 'Terminated' as well as 'Completed' trials. They also used a manual search strategy which involved searching Scopus and manually comparing results.



In [15]:

    
from openpyxl import load_workbook
import sys
bmj_results = load_workbook(filename = './data/chen-bmj.xlsx')









    



/usr/local/lib/python2.7/dist-packages/openpyxl/reader/worksheet.py:322: UserWarning: Data Validation extension is not supported and will be removed
  warn(msg)
/usr/local/lib/python2.7/dist-packages/openpyxl/workbook/names/named_range.py:125: UserWarning: Discarded range with reserved name
  warnings.warn("Discarded range with reserved name")



In [16]:

    
nct_ids = {}
count = 0
has_pmid = 0

# The Excel data has multiple worksheets, sigh. 
# And NCT IDs can occur more than once with different results, sigh.
# We only care about where there's at least one result.
# Fiddle about and reshape the data so that we know whether 
# each NCT ID has a result. 
for sheet in bmj_results.worksheets:
    for i, row in enumerate(sheet.rows):
        if i == 0:
            continue
        if row[0].value:
            count += 1
            if isinstance(row[6].value, long):
                val = str(row[6].value)
            else:
                val = row[6].value
            if val:
                has_pmid += 1
            # Always set val if it exists.
            # Otherwise, only set val if there's no current value
            # for this NCT ID.
            if val:
                nct_ids[row[0].value] = val
            else:
                if not row[0].value in nct_ids:
                    nct_ids[row[0].value] = val

print count, 'rows found in total'
print has_pmid, 'of those rows have a PMID'
print has_pmid / float(count) * 100, 'per cent of their NCT IDs have a PMID, including duplicates'
print
unique_nct_ids = len(nct_ids.keys())
print unique_nct_ids, '*unique* NCT IDs found in all rows'
pmids_found = sum(1 for x in nct_ids.values() if x)
print pmids_found, 'of these have PMIDs'
print pmids_found / float(unique_nct_ids) * 100, 'per cent of unique NCT IDs have a PMID'









    



4347 rows found in total
2452 of those rows have a PMID
56.4067172763 per cent of their NCT IDs have a PMID, including duplicates

4092 *unique* NCT IDs found in all rows
2295 of these have PMIDs
56.0850439883 per cent of unique NCT IDs have a PMID

Compare with our data

Now examine:

of the NCT IDs for which BMJ authors find PubMed results, how many we also find PubMed results for
of the same dataset, how many only BMJ find results for
of the NCT IDs for which BMJ authors do not find PubMed results, how many we do find PubMed results



In [17]:

    
df_bmj = pd.Series(nct_ids).to_frame(name='pmid')
df_bmj['pubmed_results'] = ~df_bmj.pmid.isnull()
df_bmj.index.name = 'nct_id'
df_bmj.reset_index(inplace=True)
print len(df_bmj), 'NCT IDs in the full BMJ dataset'
# df_bmj.head(20)

merged_results = \
    pd.merge(df_bmj, df_completed, #[['nct_id', 'pubmed_results']], 
             on='nct_id', how='inner', suffixes=('_bmj', '_ours'))
    
# NB I tried this first with a left join: but 1521 out of the 4500 papers 
# don't appear in our dataset, because the BMJ authors' inclusion criteria are 
# different from ours. To get a sample after a left join...
# merged_results[merged_results.we_have_results.isnull()].head()

merged_results['we_have_results'] = ~merged_results.is_overdue
merged_results.we_have_results.value_counts(dropna=False)
# merged_results.head()
print len(merged_results), 'NCT IDs are in both the BMJ dataset and ours'

papers_both_find_pm_results = \
    merged_results[merged_results.pubmed_results_bmj & merged_results.we_have_results]
papers_both_find_pm_results.head()
print len(papers_both_find_pm_results), 'we both find results for'
papers_only_they_find_results = \
    merged_results[merged_results.pubmed_results_bmj & ~merged_results.we_have_results]
print len(papers_only_they_find_results), 'only they find results for'  
papers_only_we_find_results = \
    merged_results[~merged_results.pubmed_results_bmj & merged_results.we_have_results]
print len(papers_only_we_find_results), 'only we find results for'
noone_finds_results = \
    merged_results[~merged_results.pubmed_results_bmj & ~merged_results.we_have_results]
print len(noone_finds_results), 'neither of us find results for'









    



4092 NCT IDs in the full BMJ dataset
2541 NCT IDs are in both the BMJ dataset and ours
1164 we both find results for
461 only they find results for
409 only we find results for
507 neither of us find results for



In [18]:

    
# Examine a sample of the papers only they find results for.
cols = ['nct_id', 'title', 'pubmed_results_bmj', 'pmid', 'we_have_results']
papers_only_they_find_results.sample(10)[cols]









    Out[18]:






  
    
      
      nct_id
      title
      pubmed_results_bmj
      pmid
      we_have_results
    
  
  
    
      758
      NCT00324506
      Safety and Efficacy of Cellcept and Avonex as ...
      True
      21180632
      False
    
    
      500
      NCT00218491
      Effectiveness of N-Acetylcysteine (NAC) in Tre...
      True
      23952889
      False
    
    
      2436
      NCT01220310
      Doctors and Web-based Self-management Support ...
      True
      23031610
      False
    
    
      1323
      NCT00529841
      Research Study for Children With Salt Wasting ...
      True
      16817818
      False
    
    
      332
      NCT00145197
      Reducing Underuse of Early-Stage Breast Cancer...
      True
      19033569
      False
    
    
      42
      NCT00027703
      Combination Chemotherapy With or Without Bevac...
      True
      22665541
      False
    
    
      1755
      NCT00662103
      Aerobic Exercise, Resistance Exercise, or Flex...
      True
      20048529
      False
    
    
      1719
      NCT00646204
      Namenda (Memantine) for Non-Motor Symptoms in ...
      True
      21193343
      False
    
    
      1309
      NCT00524966
      Randomized Controlled Trial of Bipolar Versus ...
      True
      21072688
      False
    
    
      93
      NCT00054418
      Risedronate in Preventing Bone Loss in Premeno...
      True
      19075260
      False



In [19]:

    
# Papers only we find results for. If the `results_date` field exists, it 
# means that the results are published on ClinicalTrials.gov. Otherwise 
# we found results on PubMed but they did not - perhaps because
# it's been a couple of years since they did their search.
# We find 43 papers on PubMed that the BMJ authors don't: 
print len(papers_only_we_find_results), 'papers for which only we find results'
print len(papers_only_we_find_results[papers_only_we_find_results.results_date.isnull()]),\
    'of those we find on PubMed, the rest on ClinicalTrials.gov'
cols = ['nct_id', 'title', 'completion_date', 'pubmed_results_bmj', 
        'pmid', 'we_have_results', 'results_date']
# papers_only_we_find_results.sample(20)[cols]
papers_only_we_find_results[papers_only_we_find_results.results_date.isnull()].sample(10)[cols]









    



409 papers for which only we find results
45 of those we find on PubMed, the rest on ClinicalTrials.gov






    Out[19]:






  
    
      
      nct_id
      title
      completion_date
      pubmed_results_bmj
      pmid
      we_have_results
      results_date
    
  
  
    
      1475
      NCT00580528
      Reducing Blood Pressure in Prehypertensive Old...
      2010-05-01
      False
      None
      True
      NaT
    
    
      2195
      NCT00925262
      Controlled Trial of Mental Health Intervention...
      2010-02-01
      False
      None
      True
      NaT
    
    
      315
      NCT00135668
      Dose-Response Study of Sodium Nitroprusside in...
      2008-03-01
      False
      None
      True
      NaT
    
    
      2514
      NCT01668303
      Family-Based Juvenile Drug Court Services
      2009-11-01
      False
      None
      True
      NaT
    
    
      1197
      NCT00480441
      Effectiveness Study of Dronabinol and BRENDA f...
      2009-06-01
      False
      None
      True
      NaT
    
    
      306
      NCT00133068
      Collaboration to Reduce Disparities in Hyperte...
      2008-08-01
      False
      None
      True
      NaT
    
    
      1932
      NCT00739479
      CCRC: Effects of Partially Hydrolyzed Whey Pep...
      2010-08-01
      False
      None
      True
      NaT
    
    
      151
      NCT00079547
      The Safety and Effectiveness of Low and High C...
      2008-03-01
      False
      None
      True
      NaT
    
    
      460
      NCT00206232
      Novel Treatment for Diastolic Heart Failure in...
      2010-07-01
      False
      None
      True
      NaT
    
    
      645
      NCT00276744
      Individualized Drug Treatment Selection Proces...
      2010-04-01
      False
      None
      True
      NaT



In [20]:

	nct_id	title	overall_status	study_type	completion_date	lead_sponsor	lead_sponsor_class	collaborator	collaborator_class	phase	locations	has_drug_intervention	drugs	disposition_date	results_date	enrollment	phase_normalised
240369	NCT00683956	Evaluation of a Closed-Loop Control System for...	Completed	Interventional	2009-07-01	Stanford University	Other			1	United States	True	Propofol;	NaT	NaT	35	1
240370	NCT01540734	Pharmacokinetic Study Investigating the Extent...	Completed	Interventional	2009-12-01	GlaxoSmithKline	Industry			1	United States	True	Marketed paracetamol; Experimental paracetamol...	NaT	NaT	28	1
240371	NCT02691429	Applicability of the Acai Fruit (Euterpe Olera...	Recruiting	Interventional	2016-11-01	Federal University of São Paulo	Other	Rafael R. Caiado Cristiane S. Peris Acácio Alv...	Other Other Other Other Other Other Other Othe...	N/A	Brazil	False		NaT	NaT	24	5
240372	NCT02947750	Neurovascular Transduction During Exercise in ...	Recruiting	Interventional	2021-11-01	Emory University	Other	National Heart, Lung, and Blood Institute (NHLBI)	NIH	2	United States	True	6R-BH4;	NaT	NaT	150	2
240373	NCT00005693	Computer Assisted Instruction Weight Managemen...	Completed	Observational	NaT	National Heart, Lung, and Blood Institute (NHLBI)	NIH			N/A		False		NaT	NaT	0	5

	is_overdue	is_completed
lead_sponsor_class
Industry	6280.0	16997.0
NIH	421.0	1185.0
Other	20778.0	35267.0
U.S. Fed	315.0	852.0

	is_overdue	is_completed	percent_overdue
lead_sponsor_class
Industry	2474.0	9511.0	26.011986
NIH	345.0	1030.0	33.495146
Other	10330.0	18334.0	56.343406
U.S. Fed	117.0	502.0	23.306773

	nct_id	title	pubmed_results_bmj	pmid	we_have_results
758	NCT00324506	Safety and Efficacy of Cellcept and Avonex as ...	True	21180632	False
500	NCT00218491	Effectiveness of N-Acetylcysteine (NAC) in Tre...	True	23952889	False
2436	NCT01220310	Doctors and Web-based Self-management Support ...	True	23031610	False
1323	NCT00529841	Research Study for Children With Salt Wasting ...	True	16817818	False
332	NCT00145197	Reducing Underuse of Early-Stage Breast Cancer...	True	19033569	False
42	NCT00027703	Combination Chemotherapy With or Without Bevac...	True	22665541	False
1755	NCT00662103	Aerobic Exercise, Resistance Exercise, or Flex...	True	20048529	False
1719	NCT00646204	Namenda (Memantine) for Non-Motor Symptoms in ...	True	21193343	False
1309	NCT00524966	Randomized Controlled Trial of Bipolar Versus ...	True	21072688	False
93	NCT00054418	Risedronate in Preventing Bone Loss in Premeno...	True	19075260	False

	nct_id	title	completion_date	pubmed_results_bmj	pmid	we_have_results	results_date
1475	NCT00580528	Reducing Blood Pressure in Prehypertensive Old...	2010-05-01	False	None	True	NaT
2195	NCT00925262	Controlled Trial of Mental Health Intervention...	2010-02-01	False	None	True	NaT
315	NCT00135668	Dose-Response Study of Sodium Nitroprusside in...	2008-03-01	False	None	True	NaT
2514	NCT01668303	Family-Based Juvenile Drug Court Services	2009-11-01	False	None	True	NaT
1197	NCT00480441	Effectiveness Study of Dronabinol and BRENDA f...	2009-06-01	False	None	True	NaT
306	NCT00133068	Collaboration to Reduce Disparities in Hyperte...	2008-08-01	False	None	True	NaT
1932	NCT00739479	CCRC: Effects of Partially Hydrolyzed Whey Pep...	2010-08-01	False	None	True	NaT
151	NCT00079547	The Safety and Effectiveness of Low and High C...	2008-03-01	False	None	True	NaT
460	NCT00206232	Novel Treatment for Diastolic Heart Failure in...	2010-07-01	False	None	True	NaT
645	NCT00276744	Individualized Drug Treatment Selection Proces...	2010-04-01	False	None	True	NaT