Exploring the trajectory of a single patient

Import Python libraries

We first need to import some tools for working with data in Python.

  • NumPy is for working with numbers
  • Pandas is for analysing data
  • MatPlotLib is for making plots
  • Sqlite3 to connect to the database

In [ ]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import psycopg2
%matplotlib inline

Connect to the database

  • We can use the sqlite3 library to connect to the MIMIC database
  • Once the connection is established, we'll run a simple SQL query.

In [ ]:
# Connect to the MIMIC database
try: 
    conn = psycopg2.connect("dbname='mimic' user='tompollard' host='localhost' password='postgres'")
except: 
    print('meh')

In [ ]:
# Create our test query
test_query = """
SELECT subject_id, hadm_id, admittime, dischtime, diagnosis, admission_type, deathtime, discharge_location
FROM mimiciii.admissions;
"""

In [ ]:
# Run the query and assign the results to a variable
test = pd.read_sql_query(test_query,conn)

In [ ]:
# Display the first few rows
test

In [ ]:

Load the chartevents data

  • The chartevents table contains data charted at the patient bedside. It includes variables such as heart rate, respiratory rate, temperature, and so on.
  • We'll begin by loading the chartevents data for a single patient.

In [ ]:
query = """
SELECT de.icustay_id
  , EXTRACT(EPOCH FROM de.charttime-ie.intime)/3600/24 as HOURS
  , di.label
  , de.value
  , de.valuenum
  , de.uom
FROM mimiciii.chartevents de
INNER join mimiciii.d_items di
ON de.itemid = di.itemid
INNER join mimiciii.icustays ie
ON de.icustay_id = ie.icustay_id
WHERE de.icustay_id = 236942
ORDER BY charttime;
"""

ce = pd.read_sql_query(query,conn)


# OPTION 2: load chartevents from a CSV file
# ce = pd.read_csv('data/example_chartevents.csv', index_col='HOURSSINCEADMISSION')

In [ ]:
# Preview the data
# Use 'head' to limit the number of rows returned
ce.head()

Review the patient's heart rate

  • We can select individual columns using the column name.
  • For example, if we want to select just the label column, we write ce.LABEL or alternatively ce['LABEL']

In [ ]:
# Select a single column
ce['label'].head()
  • In a similar way, we can select rows from data using indexes.
  • For example, to select rows where the label is equal to 'Heart Rate', we would create an index using [ce.LABEL=='Heart Rate']

In [ ]:
# Select just the heart rate rows using an index
ce.label.unique()

Plot 1: How did the patients heart rate change over time?

  • Using the methods described above to select our data of interest, we can create our x and y axis values to create a time series plot of heart rate.

In [ ]:
# Which time stamps have a corresponding heart rate measurement?
print ce.index[ce.label=='Heart Rate']

In [ ]:
# Set x equal to the times
x_hr = ce.hours[ce.label=='Heart Rate']

# Set y equal to the heart rates
y_hr = ce.valuenum[ce.label=='Heart Rate']

# Plot time against heart rate
plt.figure(figsize=(14, 6))
plt.plot(x_hr,y_hr)


plt.xlabel('Time since admission, hours',fontsize=16)
plt.ylabel('Heart rate',fontsize=16)
plt.title('Heart rate over time from admission to the intensive care unit',fontsize=16)

Task 1

  • What is happening to this patient's heart rate?
  • Plot respiratory rate over time for the patient.
  • Is there anything unusual about the patient's respiratory rate?

In [ ]:
# Exercise 1 here

Plot 2: Did the patient's vital signs breach any alarm thresholds?

  • Alarm systems in the intensive care unit are commonly based on high and low thresholds defined by the carer.
  • False alarms are often a problem and so thresholds may be set arbitrarily to reduce alarms.
  • As a result, alarm settings carry limited information.

In [ ]:
plt.figure(figsize=(14, 6))

plt.plot(ce.hours[ce.label=='Respiratory Rate'], 
         ce.valuenum[ce.label=='Respiratory Rate'],
         'k+-', markersize=2, linewidth=1)

plt.plot(ce.hours[ce.label=='Resp Alarm - High'], 
         ce.valuenum[ce.label=='Resp Alarm - High'],
         'm--')

plt.plot(ce.hours[ce.label=='Resp Alarm - Low'], 
         ce.valuenum[ce.label=='Resp Alarm - Low'],
         'm--')

plt.xlabel('Time',fontsize=16)
plt.ylabel('Respiratory rate',fontsize=16)
plt.title('Respiratory rate since admission, with upper and lower alarm thresholds',fontsize=16)
plt.ylim(0,55)

Task 2

  • Based on the data, does it look like the alarms would have triggered for this patient?

Plot 3: What is patient's level of consciousness?

  • Glasgow Coma Scale (GCS) is a measure of consciousness.
  • It is commonly used for monitoring patients in the intensive care unit.
  • It consists of three components: eye response; verbal response; motor response.

In [ ]:
# Display the first few rows of the GCS eye response data
ce[ce.label=='GCS - Eye Opening'].head()

In [ ]:
# Prepare the size of the figure
plt.figure(figsize=(18, 12))

# Set x equal to the times
x_hr = ce.hours[ce.label=='Heart Rate']

# Set y equal to the heart rates
y_hr = ce.valuenum[ce.label=='Heart Rate']

plt.plot(x_hr,y_hr,'r-',label='Heart rate')

plt.plot(ce.hours[ce.label=='Heart rate Alarm - High'], 
         ce.valuenum[ce.label=='Heart rate Alarm - High'],
         'm--')

plt.plot(ce.hours[ce.label=='Heart rate Alarm - Low'], 
         ce.valuenum[ce.label=='Heart rate Alarm - Low'],
         'm--', label='Alarm threshold')

plt.plot(ce.hours[ce.label=='Respiratory Rate'], 
         ce.valuenum[ce.label=='Respiratory Rate'],
         'b-', markersize=6,label='Respiratory rate')

plt.plot(ce.hours[ce.label=='Resp Alarm - High'], 
         ce.valuenum[ce.label=='Resp Alarm - High'],
         'm--')

plt.plot(ce.hours[ce.label=='Resp Alarm - Low'], 
         ce.valuenum[ce.label=='Resp Alarm - Low'],
         'm--')

# Add a text label to the y-axis
plt.text(-3,155,'GCS - Eye Opening',fontsize=14)
plt.text(-3,150,'GCS - Motor Response',fontsize=14)
plt.text(-3,145,'GCS - Verbal Response',fontsize=14)   

# Iterate over list of GCS labels, plotting around 1 in 10 to avoid overlap
for i, txt in enumerate(ce.value[ce.label=='GCS - Eye Opening'].values):
    if np.mod(i,14)==0 and i < 75:
        plt.annotate(txt, (ce.hours[ce.label=='GCS - Eye Opening'].values[i],155),fontsize=14)
        
for i, txt in enumerate(ce.value[ce.label=='GCS - Motor Response'].values):
    if np.mod(i,14)==0 and i < 75:
        plt.annotate(txt, (ce.hours[ce.label=='GCS - Motor Response'].values[i],150),fontsize=14)

for i, txt in enumerate(ce.value[ce.label=='GCS - Verbal Response'].values):
    if np.mod(i,14)==0 and i < 75:
        plt.annotate(txt, (ce.hours[ce.label=='GCS - Verbal Response'].values[i],145),fontsize=14)

plt.title('Vital signs and Glasgow Coma Scale since admission',fontsize=18)

plt.xlabel('Time (hours)',fontsize=18)
plt.ylabel('Vital signs',fontsize=18)
plt.legend(loc=1)
plt.ylim(10,180)

Task 3

  • How is the patient's consciousness changing over time?

Stop here...

Plot 4: What other data do we have on the patient?

  • Using Pandas 'read_csv function' again, we'll now load the outputevents data - this table contains all information about patient outputs (urine output, drains, dialysis).

In [ ]:
# OPTION 1: load outputs from the patient
query = """
select de.icustay_id
  , EXTRACT(EPOCH FROM de.charttime-ie.intime)/3600 as HOURS
  , di.label
  , de.value
  , de.valueuom
from mimiciii.outputevents de 
inner join mimiciii.icustays ie
  on de.icustay_id = ie.icustay_id
inner join mimiciii.d_items di
  on de.itemid = di.itemid
where de.subject_id = 49205
order by charttime;
"""

oe = pd.read_sql_query(query,conn)

In [ ]:
oe.head()

In [ ]:
plt.figure(figsize=(14, 10))

plt.figure(figsize=(14, 6))
plt.title('Fluid output over time')

plt.plot(oe.hours, 
         oe.value.cumsum()/1000, 
         'ro', markersize=8, label='Output volume, L')

# plt.xlim(0,72)
# plt.ylim(0,10)
plt.legend()

To provide necessary context to this plot, it would help to include patient input data. This provides the necessary context to determine a patient's fluid balance - a key indicator in patient health.


In [ ]:
# OPTION 1: load inputs given to the patient (usually intravenously) using the database connection
query = """
select de.icustay_id
  , EXTRACT(EPOCH FROM de.starttime-ie.intime)/3600 as HOURS_START
  , EXTRACT(EPOCH FROM de.endtime-ie.intime)/3600 as HOURS_END
  , de.linkorderid
  , di.label
  , de.amount
  , de.amountuom
  , de.rate
  , de.rateuom
from mimiciii.inputevents_mv de 
inner join mimiciii.icustays ie
  on de.icustay_id = ie.icustay_id
inner join mimiciii.d_items di
  on de.itemid = di.itemid
where de.subject_id = 49205
order by endtime;
"""

ie = pd.read_sql_query(query,conn)

# # OPTION 2: load ioevents using the CSV file with endtime as the index
# ioe = pd.read_csv('inputevents.csv'
#                   ,header=None
#                   ,names=['subject_id','itemid','label','starttime','endtime','amount','amountuom','rate','rateuom']
#                   ,parse_dates=True)

In [ ]:
ie.head()

Note that the column headers are different: we have "HOURS_START" and "HOURS_END". This is because inputs are administered over a fixed period of time.


In [ ]:
ie['label'].unique()

In [ ]:
plt.figure(figsize=(14, 10))

# Plot the cumulative input against the cumulative output
plt.plot(ie.hours_end[ie.amountuom=='mL'], 
         ie.amount[ie.amountuom=='mL'].cumsum()/1000, 
         'go', markersize=8, label='Intake volume, L')

plt.plot(oe.hours, 
         oe.value.cumsum()/1000, 
         'ro', markersize=8, label='Output volume, L')

plt.title('Fluid balance over time',fontsize=16)
plt.xlabel('Hours',fontsize=16)
plt.ylabel('Volume (litres)',fontsize=16)
# plt.ylim(0,38)
plt.legend()

As the plot shows, the patient's intake tends to be above their output (as one would expect!) - but there are periods where they are almost one to one. One of the biggest challenges of working with ICU data is that context is everything - let's look at a treatment (lasix) that we know will affect this graph.


In [ ]:
plt.figure(figsize=(14, 10))

# Plot the cumulative input against the cumulative output
plt.plot(ie.hours_end[ie.amountuom=='mL'], 
         ie.amount[ie.amountuom=='mL'].cumsum()/1000, 
         'go', markersize=8, label='Intake volume, L')

plt.plot(oe.hours, 
         oe.value.cumsum()/1000, 
         'ro', markersize=8, label='Output volume, L')

# example on getting two columns from a dataframe: ie[['HOURS_START','HOURS_END']].head()

for i, idx in enumerate(ie.index[ie.label=='Furosemide (Lasix)']):
    plt.plot([ie.hours_start[ie.label=='Furosemide (Lasix)'][idx],
             ie.hours_end[ie.label=='Furosemide (Lasix)'][idx]],
            [ie.rate[ie.label=='Furosemide (Lasix)'][idx],
             ie.rate[ie.label=='Furosemide (Lasix)'][idx]],
            'b-',linewidth=4)
    

plt.title('Fluid balance over time',fontsize=16)
plt.xlabel('Hours',fontsize=16)
plt.ylabel('Volume (litres)',fontsize=16)
# plt.ylim(0,38)
plt.legend()

In [ ]:
ie['label'].unique()

Exercise 2

  • Plot the alarms for the mean arterial pressure ('Arterial Blood Pressure mean')
  • HINT: you can use ce.LABEL.unique() to find a list of variable names
  • Were the alarm thresholds breached?

In [ ]:
# Exercise 2 here

Plot 3: Were the patient's other vital signs stable?


In [ ]:
plt.figure(figsize=(14, 10))

plt.plot(ce.index[ce.label=='Heart Rate'], 
         ce.valuenum[ce.label=='Heart Rate'],
         'rx', markersize=8, label='HR')

plt.plot(ce.index[ce.label=='O2 saturation pulseoxymetry'], 
         ce.valuenum[ce.label=='O2 saturation pulseoxymetry'], 
         'g.', markersize=8, label='O2')

plt.plot(ce.index[ce.label=='Arterial Blood Pressure mean'], 
         ce.valuenum[ce.label=='Arterial Blood Pressure mean'], 
         'bv', markersize=8, label='MAP')

plt.plot(ce.index[ce.label=='Respiratory Rate'], 
         ce.valuenum[ce.label=='Respiratory Rate'], 
         'k+', markersize=8, label='RR')

plt.title('Vital signs over time from admission')
plt.ylim(0,130)
plt.legend()

Plot 5: Laboratory measurements

Using Pandas 'read_csv function' again, we'll now load the labevents data. This data corresponds to measurements made in a laboratory - usually on a sample of patient blood.


In [ ]:
# OPTION 1: load labevents data using the database connection
query = """
SELECT de.subject_id
  , de.charttime
  , di.label, de.value, de.valuenum
  , de.uom
FROM mimiciii.labevents de
INNER JOIN mimiciii.d_labitems di
  ON de.itemid = di.itemid
where de.subject_id = 49205
"""

le = pd.read_sql_query(query,conn)

# OPTION 2: load labevents from the CSV file
# le = pd.read_csv('data/example_labevents.csv', index_col='HOURSSINCEADMISSION')

In [ ]:
# preview the labevents data
le.head()

In [ ]:
# preview the ioevents data
le[le.label=='HEMOGLOBIN']

In [ ]:
plt.figure(figsize=(14, 10))

plt.plot(le.index[le.label=='HEMATOCRIT'], 
         le.valuenum[le.label=='HEMATOCRIT'], 
         'go', markersize=6, label='Haematocrit')

plt.plot(le.index[le.label=='HEMOGLOBIN'], 
         le.valuenum[le.label=='HEMOGLOBIN'], 
         'bv', markersize=8, label='Hemoglobin')

plt.title('Laboratory measurements over time from admission')
plt.ylim(0,38)
plt.legend()