patient
tableThe aim of this notebook is to introduce the patient
table, a key table in the eICU Collaborative Research Database.
The patient
table contains patient demographics and admission and discharge details for hospital and ICU stays. For more detail, see: http://eicu-crd.mit.edu/eicutables/patient/
Before starting, you will need to copy the eicu demo database file ('eicu_demo.sqlite3') to the data
directory.
Documentation on the eICU Collaborative Research Database can be found at: http://eicu-crd.mit.edu/.
In [ ]:
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
import psycopg2
import os
import sqlite3
In [ ]:
# Plot settings
%matplotlib inline
plt.style.use('ggplot')
fontsize = 20 # size for x and y ticks
plt.rcParams['legend.fontsize'] = fontsize
plt.rcParams.update({'font.size': fontsize})
In [ ]:
# Connect to the database - which is assumed to be in the current directory
fn = 'eicu_demo.sqlite3'
con = sqlite3.connect(fn)
cur = con.cursor()
In [ ]:
query = \
"""
SELECT type, name
FROM sqlite_master
WHERE type='table'
ORDER BY name;
"""
list_of_tables = pd.read_sql_query(query,con)
In [ ]:
list_of_tables
In [ ]:
# query to load data from the patient table
query = \
"""
SELECT *
FROM patient
"""
print(query)
In [ ]:
# run the query and assign the output to a variable
patient_tab = pd.read_sql_query(query,con)
In [ ]:
# display the first few rows of the dataframe
patient_tab.head()
In [ ]:
# list all of the columns in the table
patient_tab.columns
patientunitstayid
represent? (hint, see: http://eicu-crd.mit.edu/eicutables/patient/)patienthealthsystemstayid
represent?uniquepid
represent?
In [ ]:
# select a limited number of columns to view
columns = ['uniquepid', 'patientunitstayid','gender','age','unitdischargestatus']
patient_tab[columns].head()
In [ ]:
# what are the unique values for age?
age_col = 'age'
patient_tab[age_col].sort_values().unique()
In [ ]:
# create a column containing numerical ages
# If ‘coerce’, then invalid parsing will be set as NaN
agenum_col = 'age_num'
patient_tab[agenum_col] = pd.to_numeric(patient_tab[age_col], errors='coerce')
patient_tab[agenum_col].sort_values().unique()
In [ ]:
# try plotting a histogram of ages
figsize = (18,8)
patient_tab[agenum_col].plot(kind='hist',
figsize=figsize,
fontsize=fontsize,
bins=15)
mean()
method to find the mean age (hint: patient_tab[agenum_col].mean()
). What is the mean? Why might we expect this to be lower than the true mean?.mean()
, you can use .describe()
. Use the describe()
method to explore the admissionweight
of patients in kg. What issue do you see? What are some methods that you could use to deal with this issue?
In [ ]:
# set threshold based on 99th quantile
adweight_col = 'admissionweight'
quant = patient_tab[adweight_col].quantile(0.99)
patient_tab[patient_tab[adweight_col] > quant] = None
In [ ]:
# describe the admission weights
patient_tab[adweight_col].describe()
In [ ]:
# set threshold based on 99th quantile
disweight_col = 'dischargeweight'
quant = patient_tab[disweight_col].quantile(0.99)
patient_tab[patient_tab[disweight_col] > quant] = None
In [ ]:
# describe the discharge weights
patient_tab[disweight_col].describe()
In [ ]:
patient_tab['weight_change'] = patient_tab[adweight_col] - patient_tab[disweight_col]
In [ ]:
# plot the weight changes
figsize = (18,8)
patient_tab['weight_change'].plot(kind='hist',
figsize=figsize,
fontsize=fontsize,
bins=50)