eICU Collaborative Research Database

Notebook 1: Exploring the patient table

The aim of this notebook is to introduce the patient table, a key table in the eICU Collaborative Research Database.

The patient table contains patient demographics and admission and discharge details for hospital and ICU stays. For more detail, see: http://eicu-crd.mit.edu/eicutables/patient/

Before starting, you will need to copy the eicu demo database file ('eicu_demo.sqlite3') to the data directory.

Documentation on the eICU Collaborative Research Database can be found at: http://eicu-crd.mit.edu/.

1. Getting set up


In [ ]:
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
import psycopg2
import os
import sqlite3

In [ ]:
# Plot settings
%matplotlib inline
plt.style.use('ggplot')
fontsize = 20 # size for x and y ticks
plt.rcParams['legend.fontsize'] = fontsize
plt.rcParams.update({'font.size': fontsize})

In [ ]:
# Connect to the database - which is assumed to be in the current directory
fn = 'eicu_demo.sqlite3'
con = sqlite3.connect(fn)
cur = con.cursor()

2. Display list of tables


In [ ]:
query = \
"""
SELECT type, name
FROM sqlite_master 
WHERE type='table'
ORDER BY name;
"""

list_of_tables = pd.read_sql_query(query,con)

In [ ]:
list_of_tables

3. Reviewing the patient table


In [ ]:
# query to load data from the patient table
query = \
"""
SELECT *
FROM patient
"""

print(query)

In [ ]:
# run the query and assign the output to a variable
patient_tab = pd.read_sql_query(query,con)

In [ ]:
# display the first few rows of the dataframe
patient_tab.head()

In [ ]:
# list all of the columns in the table
patient_tab.columns

Questions


In [ ]:
# select a limited number of columns to view
columns = ['uniquepid', 'patientunitstayid','gender','age','unitdischargestatus']
patient_tab[columns].head()

In [ ]:
# what are the unique values for age?
age_col = 'age'
patient_tab[age_col].sort_values().unique()

Questions

  • Try plotting a histogram of ages using the commands in the cell below. Why does the plot fail?
# try plotting a histogram of ages
figsize = (18,8)
patient_tab[age_col].plot(kind='hist',
                          figsize=figsize, 
                          fontsize=fontsize,
                          bins=15)

In [ ]:
# create a column containing numerical ages
# If ‘coerce’, then invalid parsing will be set as NaN
agenum_col = 'age_num'
patient_tab[agenum_col] = pd.to_numeric(patient_tab[age_col], errors='coerce')
patient_tab[agenum_col].sort_values().unique()

In [ ]:
# try plotting a histogram of ages
figsize = (18,8)
patient_tab[agenum_col].plot(kind='hist',
                             figsize=figsize, 
                             fontsize=fontsize,
                             bins=15)

Questions

  • Use the mean() method to find the mean age (hint: patient_tab[agenum_col].mean()). What is the mean? Why might we expect this to be lower than the true mean?
  • In the same way that you use .mean(), you can use .describe(). Use the describe() method to explore the admissionweight of patients in kg. What issue do you see? What are some methods that you could use to deal with this issue?

In [ ]:
# set threshold based on 99th quantile
adweight_col = 'admissionweight'
quant = patient_tab[adweight_col].quantile(0.99)
patient_tab[patient_tab[adweight_col] > quant] = None

In [ ]:
# describe the admission weights
patient_tab[adweight_col].describe()

Questions

  • What is the average change in weight between admissionweight and dischargeweight?
  • Plot a distribution of the weight change

In [ ]:
# set threshold based on 99th quantile
disweight_col = 'dischargeweight'
quant = patient_tab[disweight_col].quantile(0.99)
patient_tab[patient_tab[disweight_col] > quant] = None

In [ ]:
# describe the discharge weights
patient_tab[disweight_col].describe()

In [ ]:
patient_tab['weight_change'] = patient_tab[adweight_col] - patient_tab[disweight_col]

In [ ]:
# plot the weight changes
figsize = (18,8)
patient_tab['weight_change'].plot(kind='hist',
                             figsize=figsize, 
                             fontsize=fontsize,
                             bins=50)