patient

The patinet table is a core part of the eICU-CRD and contains all information related to tracking patient unit stays. The table also contains patient demographics and hospital level information.


In [1]:
# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import psycopg2
import getpass
import pdvega

# for configuring connection 
from configobj import ConfigObj
import os

%matplotlib inline


/home/alistairewj/.local/lib/python3.5/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
  """)

In [2]:
# Create a database connection using settings from config file
config='../db/config.ini'

# connection info
conn_info = dict()
if os.path.isfile(config):
    config = ConfigObj(config)
    conn_info["sqluser"] = config['username']
    conn_info["sqlpass"] = config['password']
    conn_info["sqlhost"] = config['host']
    conn_info["sqlport"] = config['port']
    conn_info["dbname"] = config['dbname']
    conn_info["schema_name"] = config['schema_name']
else:
    conn_info["sqluser"] = 'postgres'
    conn_info["sqlpass"] = ''
    conn_info["sqlhost"] = 'localhost'
    conn_info["sqlport"] = 5432
    conn_info["dbname"] = 'eicu'
    conn_info["schema_name"] = 'public,eicu_crd'
    
# Connect to the eICU database
print('Database: {}'.format(conn_info['dbname']))
print('Username: {}'.format(conn_info["sqluser"]))
if conn_info["sqlpass"] == '':
    # try connecting without password, i.e. peer or OS authentication
    try:
        if (conn_info["sqlhost"] == 'localhost') & (conn_info["sqlport"]=='5432'):
            con = psycopg2.connect(dbname=conn_info["dbname"],
                                   user=conn_info["sqluser"])            
        else:
            con = psycopg2.connect(dbname=conn_info["dbname"],
                                   host=conn_info["sqlhost"],
                                   port=conn_info["sqlport"],
                                   user=conn_info["sqluser"])
    except:
        conn_info["sqlpass"] = getpass.getpass('Password: ')

        con = psycopg2.connect(dbname=conn_info["dbname"],
                               host=conn_info["sqlhost"],
                               port=conn_info["sqlport"],
                               user=conn_info["sqluser"],
                               password=conn_info["sqlpass"])
query_schema = 'set search_path to ' + conn_info['schema_name'] + ';'


Database: eicu
Username: alistairewj

uniquePid

The uniquePid column identifies a single patient across multiple stays. Let's look at a single uniquepid.


In [6]:
uniquepid = '002-33870'
query = query_schema + """
select *
from patient
where uniquepid = '{}'
""".format(uniquepid)

df = pd.read_sql_query(query, con)
df.head()


Out[6]:
patientunitstayid patienthealthsystemstayid gender age ethnicity hospitalid wardid apacheadmissiondx admissionheight hospitaladmityear ... unitstaytype admissionweight dischargeweight unitdischargeyear unitdischargetime24 unitdischargetime unitdischargeoffset unitdischargelocation unitdischargestatus uniquepid
0 141178 128927 Female 52 Caucasian 60 83 162.6 2015 ... admit 54.4 54.4 2015 09:18:00 midday 8 Step-Down Unit (SDU) Alive 002-33870
1 141179 128927 Female 52 Caucasian 60 83 162.6 2015 ... stepdown/other NaN 60.4 2015 19:20:00 night 2042 Home Alive 002-33870

2 rows × 36 columns

Here we see two unit stays for a single patient. Note also that both unit stays have the same patienthealthsystemstayid - this indicates that they occurred within the same hospitalization.

We can see the unitstaytype was 'admit' for one stay, and 'stepdown/other' for another. Other columns can give us more information.


In [7]:
df[['patientunitstayid', 'wardid', 'unittype', 'unitstaytype', 'hospitaladmitoffset', 'unitdischargeoffset']]


Out[7]:
patientunitstayid wardid unittype unitstaytype hospitaladmitoffset unitdischargeoffset
0 141178 83 Med-Surg ICU admit -14 8
1 141179 83 Med-Surg ICU stepdown/other -22 2042

Note that it's not explicitly obvious which stay occurred first. Earlier stays will be closer to hospital admission, and therefore have a higher hospitaladmitoffset. Above, the stay with a hospitaladmitoffset of -14 was first (occurring 14 minutes after hospital admission), followed by the next stay with a hospitaladmitoffset of 22 (which occurred 22 minutes after hospital admission). Practically, we wouldn't consider the first admission a "real" ICU stay, and it's likely an idiosyncrasy of the administration system at this particular hospital. Notice how both rows have the same wardid.

Age

As ages over 89 are required to be deidentified by HIPAA, the age column is actually a string field, with ages over 89 replaced with the string value '> 89'.


In [8]:
query = query_schema + """
select age, count(*) as n
from patient
group by age
order by n desc
"""

df = pd.read_sql_query(query, con)
df.head()


Out[8]:
age n
0 > 89 7081
1 67 5078
2 68 4826
3 72 4804
4 71 4764

As is common in eICU-CRD, there are a subset of hospitals who routinely utilize this portion of the medical record (and thus have 90-100% data completion), while there are other hospitals who rarely use this interface and thus have poor data completion (0-10%).