The medications table reflects the active medication orders for patients. These are orders but do not necessarily reflect administration to the patient. For example, while existence of data in the infusionDrug table confirms a patient received a continuous infusion, existence of the same data in this table only indicates that the infusion was ordered for the patient. Most orders are fulfilled, but not all. Furthermore, many orders are done pro re nata, or PRN, which means "when needed". Administration of these orders is difficult to quantify.
In the US, all orders must be reviewed by a pharmacist. The majority of hospitals have an HL7 medication interface system in place which automatically synchronizes the orders with eCareManager (the source of this database) as they are verified by the pharmacist in the source pharmacy system. For hospitals without a medication interface, the eICU staff may enter a selection of medications to facilitate population management and completeness for reporting purposes.
In [1]:
# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import psycopg2
import getpass
import pdvega
# for configuring connection
from configobj import ConfigObj
import os
%matplotlib inline
In [2]:
# Create a database connection using settings from config file
config='../db/config.ini'
# connection info
conn_info = dict()
if os.path.isfile(config):
config = ConfigObj(config)
conn_info["sqluser"] = config['username']
conn_info["sqlpass"] = config['password']
conn_info["sqlhost"] = config['host']
conn_info["sqlport"] = config['port']
conn_info["dbname"] = config['dbname']
conn_info["schema_name"] = config['schema_name']
else:
conn_info["sqluser"] = 'postgres'
conn_info["sqlpass"] = ''
conn_info["sqlhost"] = 'localhost'
conn_info["sqlport"] = 5432
conn_info["dbname"] = 'eicu'
conn_info["schema_name"] = 'public,eicu_crd'
# Connect to the eICU database
print('Database: {}'.format(conn_info['dbname']))
print('Username: {}'.format(conn_info["sqluser"]))
if conn_info["sqlpass"] == '':
# try connecting without password, i.e. peer or OS authentication
try:
if (conn_info["sqlhost"] == 'localhost') & (conn_info["sqlport"]=='5432'):
con = psycopg2.connect(dbname=conn_info["dbname"],
user=conn_info["sqluser"])
else:
con = psycopg2.connect(dbname=conn_info["dbname"],
host=conn_info["sqlhost"],
port=conn_info["sqlport"],
user=conn_info["sqluser"])
except:
conn_info["sqlpass"] = getpass.getpass('Password: ')
con = psycopg2.connect(dbname=conn_info["dbname"],
host=conn_info["sqlhost"],
port=conn_info["sqlport"],
user=conn_info["sqluser"],
password=conn_info["sqlpass"])
query_schema = 'set search_path to ' + conn_info['schema_name'] + ';'
In [3]:
patientunitstayid = 237395
In [4]:
query = query_schema + """
select *
from medication
where patientunitstayid = {}
order by drugorderoffset
""".format(patientunitstayid)
df = pd.read_sql_query(query, con)
df.head()
Out[4]:
In [5]:
df.columns
Out[5]:
In [6]:
# Look at a subset of columns
cols = ['medicationid','patientunitstayid',
'drugorderoffset','drugorderoffset', 'drugstopoffset',
'drugivadmixture', 'drugordercancelled', 'drugname','drughiclseqno', 'gtc',
'dosage','routeadmin','loadingdose', 'prn']
df[cols].head().T
Out[6]:
Here we can see that, roughly on ICU admission, the patient had an order for vancomycin, aztreonam, and tobramycin.
Let's look for patients who have an order for vancomycin using exact text matching.
In [7]:
drug = 'VANCOMYCIN'
query = query_schema + """
select
distinct patientunitstayid
from medication
where drugname like '%{}%'
""".format(drug)
df_drug = pd.read_sql_query(query, con)
print('{} unit stays with {}.'.format(df_drug.shape[0], drug))
Exact text matching is fairly weak, as there's no systematic reason to prefer upper case or lower case. Let's relax the case matching.
In [8]:
drug = 'VANCOMYCIN'
query = query_schema + """
select
distinct patientunitstayid
from medication
where drugname ilike '%{}%'
""".format(drug)
df_drug = pd.read_sql_query(query, con)
print('{} unit stays with {}.'.format(df_drug.shape[0], drug))
HICL codes are used to group together drugs which have the same underlying ingredient (i.e. most frequently this is used to group brand name drugs with the generic name drugs). We can see above the HICL for vancomycin is 10093, so let's try grabbing that.
In [9]:
hicl = 10093
query = query_schema + """
select
distinct patientunitstayid
from medication
where drughiclseqno = {}
""".format(hicl)
df_hicl = pd.read_sql_query(query, con)
print('{} unit stays with HICL = {}.'.format(df_hicl.shape[0], hicl))
No luck! I wonder what we missed? Let's go back to the original query, this time retaining HICL and the name of the drug.
In [10]:
drug = 'VANCOMYCIN'
query = query_schema + """
select
drugname, drughiclseqno, count(*) as n
from medication
where drugname ilike '%{}%'
group by drugname, drughiclseqno
order by n desc
""".format(drug)
df_drug = pd.read_sql_query(query, con)
df_drug.head()
Out[10]:
It appears there are more than one HICL - we can group by HICL in this query to get an idea.
In [11]:
df_drug['drughiclseqno'].value_counts()
Out[11]:
Unfortunately, we can't be sure that these HICLs always identify only vancomycin. For example, let's look at drugnames for HICL = 1403.
In [12]:
hicl = 1403
query = query_schema + """
select
drugname, count(*) as n
from medication
where drughiclseqno = {}
group by drugname
order by n desc
""".format(hicl)
df_hicl = pd.read_sql_query(query, con)
df_hicl.head()
Out[12]:
This HICL seems more focused on the use of creams than on vancomycin. Let's instead inspect the top 3.
In [13]:
for hicl in [4042, 10093, 37442]:
query = query_schema + """
select
drugname, count(*) as n
from medication
where drughiclseqno = {}
group by drugname
order by n desc
""".format(hicl)
df_hicl = pd.read_sql_query(query, con)
print('HICL {}'.format(hicl))
print('Number of rows: {}'.format(df_hicl['n'].sum()))
print('Top 5 rows by frequency:')
print(df_hicl.head())
print()
This is fairly convincing that these only refer to vancomycin. An alternative approach is to acquire the code book for HICL codes and look up vancomycin there.
In [14]:
query = query_schema + """
with t as
(
select distinct patientunitstayid
from medication
)
select
pt.hospitalid
, count(distinct pt.patientunitstayid) as number_of_patients
, count(distinct t.patientunitstayid) as number_of_patients_with_tbl
from patient pt
left join t
on pt.patientunitstayid = t.patientunitstayid
group by pt.hospitalid
""".format(patientunitstayid)
df = pd.read_sql_query(query, con)
df['data completion'] = df['number_of_patients_with_tbl'] / df['number_of_patients'] * 100.0
df.sort_values('number_of_patients_with_tbl', ascending=False, inplace=True)
df.head(n=10)
Out[14]:
In [15]:
df[['data completion']].vgplot.hist(bins=10,
var_name='Number of hospitals',
value_name='Percent of patients with data')
Here we can see there are a few hospitals with no interface, and thus 0 patients, though the majority have >90% data completion.