lab

Laboratory tests that have have been mapped to a standard set of measurements. Unmapped measurements are recorded in the customLab table. The lab table is fairly well populated by hospitals. It is possible some rarely obtained lab measurements are not interfaced into the system and therefore will not be available in the database. Absence of a rare lab measurement, such as serum lidocaine concentrations, would not indicate the lab was not drawn. However, absence of a platelet count would likely indicate the value was not obtained.



In [1]:

    
# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import psycopg2
import getpass
import pdvega

# for configuring connection 
from configobj import ConfigObj
import os

%matplotlib inline









    



/home/alistairewj/.local/lib/python3.5/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
  """)



In [2]:

    
# Create a database connection using settings from config file
config='../db/config.ini'

# connection info
conn_info = dict()
if os.path.isfile(config):
    config = ConfigObj(config)
    conn_info["sqluser"] = config['username']
    conn_info["sqlpass"] = config['password']
    conn_info["sqlhost"] = config['host']
    conn_info["sqlport"] = config['port']
    conn_info["dbname"] = config['dbname']
    conn_info["schema_name"] = config['schema_name']
else:
    conn_info["sqluser"] = 'postgres'
    conn_info["sqlpass"] = ''
    conn_info["sqlhost"] = 'localhost'
    conn_info["sqlport"] = 5432
    conn_info["dbname"] = 'eicu'
    conn_info["schema_name"] = 'public,eicu_crd'
    
# Connect to the eICU database
print('Database: {}'.format(conn_info['dbname']))
print('Username: {}'.format(conn_info["sqluser"]))
if conn_info["sqlpass"] == '':
    # try connecting without password, i.e. peer or OS authentication
    try:
        if (conn_info["sqlhost"] == 'localhost') & (conn_info["sqlport"]=='5432'):
            con = psycopg2.connect(dbname=conn_info["dbname"],
                                   user=conn_info["sqluser"])            
        else:
            con = psycopg2.connect(dbname=conn_info["dbname"],
                                   host=conn_info["sqlhost"],
                                   port=conn_info["sqlport"],
                                   user=conn_info["sqluser"])
    except:
        conn_info["sqlpass"] = getpass.getpass('Password: ')

        con = psycopg2.connect(dbname=conn_info["dbname"],
                               host=conn_info["sqlhost"],
                               port=conn_info["sqlport"],
                               user=conn_info["sqluser"],
                               password=conn_info["sqlpass"])
query_schema = 'set search_path to ' + conn_info['schema_name'] + ';'









    



Database: eicu
Username: alistairewj

Examine a single patient



In [3]:

    
patientunitstayid = 2704494



In [4]:

    
query = query_schema + """
select *
from lab
where patientunitstayid = {}
order by labresultoffset
""".format(patientunitstayid)

df = pd.read_sql_query(query, con)
df.head()









    Out[4]:







  
    
      
      patientunitstayid
      labid
      labresultyear
      labresulttime24
      labresulttime
      labresultoffset
      labtypeid
      labname
      labresult
      labresulttext
      labmeasurenamesystem
      labmeasurenameinterface
      labresultrevisedyear
      labresultrevisedtime24
      labresultrevisedtime
      labresultrevisedoffset
    
  
  
    
      0
      2704494
      635381067
      2014
      00:05:00
      midnight
      -99620
      1.0
      sodium
      136.00
      136
      mmol/L
      mmol/L
      2014
      00:42:48
      midnight
      -99583
    
    
      1
      2704494
      648998951
      2014
      00:05:00
      midnight
      -99620
      3.0
      -monos
      6.00
      6
      %
      %
      2014
      00:52:16
      midnight
      -99573
    
    
      2
      2704494
      648998947
      2014
      00:05:00
      midnight
      -99620
      3.0
      Hgb
      9.60
      9.6
      g/dL
      g/dL
      2014
      00:52:10
      midnight
      -99573
    
    
      3
      2704494
      648998952
      2014
      00:05:00
      midnight
      -99620
      3.0
      MCHC
      34.30
      34.3
      g/dL
      g/dL
      2014
      00:52:10
      midnight
      -99573
    
    
      4
      2704494
      635381063
      2014
      00:05:00
      midnight
      -99620
      1.0
      creatinine
      4.23
      4.23
      mg/dL
      mg/dL
      2014
      00:42:48
      midnight
      -99583



In [5]:

    
query = query_schema + """
select *
from patient
where patientunitstayid = {}
""".format(patientunitstayid)

pt = pd.read_sql_query(query, con)
pt[['patientunitstayid', 'apacheadmissiondx', 'hospitaladmitoffset']]









    Out[5]:







  
    
      
      patientunitstayid
      apacheadmissiondx
      hospitaladmitoffset
    
  
  
    
      0
      2704494
      Thrombocytopenia
      -99779

Immediately we can note the very large negative labresultoffset. This likely means we have some lab values pre-ICU. In some cases this will be a lab measured in another hospital location such as the emergency department or hospital floor. In this case, the large value (-99620 minutes, or ~70 days) is surprising, but we can see from the patient table that the patient was admitted to the hospital -99779 minutes before their ICU stay (hospitaladmitoffset). This patient was admitted to the ICU with thrombocytopenia (apacheadmissiondx), and inspection of the diagnosis table indicates they have a form of cancer, so likely this is a long hospital stay where labs were taken on hospital admission.

Available labs

We can group the lab table to summarize all available labs.



In [6]:

    
query = query_schema + """
select labname, count(*) as n
from lab
group by labname
order by n desc
""".format(patientunitstayid)

lab = pd.read_sql_query(query, con)
print('{} total vlues for {} distinct labs.'.format(lab['n'].sum(), lab.shape[0]))
print('\nTop 5 labs by frequency:')
lab.head()









    



39132531 total vlues for 158 distinct labs.

Top 5 labs by frequency:






    Out[6]:







  
    
      
      labname
      n
    
  
  
    
      0
      bedside glucose
      3175835
    
    
      1
      potassium
      1493261
    
    
      2
      sodium
      1393205
    
    
      3
      glucose
      1319496
    
    
      4
      Hgb
      1298708

The lab table is a large table with over 39 million observations. The most frequent observation is bedside glucose which accounts for almost 10% of the lab table, followed by potassium and sodium.

Hospitals with data available



In [7]:

    
query = query_schema + """
with t as
(
select distinct patientunitstayid
from lab
)
select 
  pt.hospitalid
  , count(distinct pt.patientunitstayid) as number_of_patients
  , count(distinct t.patientunitstayid) as number_of_patients_with_tbl
from patient pt
left join t
  on pt.patientunitstayid = t.patientunitstayid
group by pt.hospitalid
""".format(patientunitstayid)

df = pd.read_sql_query(query, con)
df['data completion'] = df['number_of_patients_with_tbl'] / df['number_of_patients'] * 100.0
df.sort_values('number_of_patients_with_tbl', ascending=False, inplace=True)
df.head(n=10)









    Out[7]:







  
    
      
      hospitalid
      number_of_patients
      number_of_patients_with_tbl
      data completion
    
  
  
    
      11
      73
      7059
      6861
      97.195070
    
    
      54
      167
      6092
      5897
      96.799081
    
    
      106
      264
      5237
      5075
      96.906626
    
    
      184
      420
      4679
      4603
      98.375721
    
    
      58
      176
      4328
      4254
      98.290203
    
    
      134
      338
      4277
      4237
      99.064765
    
    
      71
      199
      4240
      4122
      97.216981
    
    
      90
      243
      4243
      4104
      96.724016
    
    
      206
      458
      3701
      3643
      98.432856
    
    
      200
      443
      3656
      3614
      98.851204



In [8]:

    
df.tail(n=10)









    Out[8]:







  
    
      
      hospitalid
      number_of_patients
      number_of_patients_with_tbl
      data completion
    
  
  
    
      18
      91
      15
      14
      93.333333
    
    
      45
      151
      14
      13
      92.857143
    
    
      138
      351
      27
      13
      48.148148
    
    
      16
      86
      15
      12
      80.000000
    
    
      129
      323
      11
      11
      100.000000
    
    
      146
      361
      13
      10
      76.923077
    
    
      21
      94
      9
      9
      100.000000
    
    
      154
      385
      3
      3
      100.000000
    
    
      36
      136
      3
      2
      66.666667
    
    
      176
      409
      33
      0
      0.000000



In [9]:

    
df[['data completion']].vgplot.hist(bins=10,
                                    var_name='Number of hospitals',
                                    value_name='Percent of patients with data')

Above we can see that very few hospitals are missing lab data. Most of the data at < 90% data completion is driven by a few hospitals with very few patients. The majority of hospitals have 90-100% of patients with data in the lab table (right side of histogram, 0-90% bin).

	patientunitstayid	labid	labresultyear	labresulttime24	labresulttime	labresultoffset	labtypeid	labname	labresult	labresulttext	labmeasurenamesystem	labmeasurenameinterface	labresultrevisedyear	labresultrevisedtime24	labresultrevisedtime	labresultrevisedoffset
0	2704494	635381067	2014	00:05:00	midnight	-99620	1.0	sodium	136.00	136	mmol/L	mmol/L	2014	00:42:48	midnight	-99583
1	2704494	648998951	2014	00:05:00	midnight	-99620	3.0	-monos	6.00	6	%	%	2014	00:52:16	midnight	-99573
2	2704494	648998947	2014	00:05:00	midnight	-99620	3.0	Hgb	9.60	9.6	g/dL	g/dL	2014	00:52:10	midnight	-99573
3	2704494	648998952	2014	00:05:00	midnight	-99620	3.0	MCHC	34.30	34.3	g/dL	g/dL	2014	00:52:10	midnight	-99573
4	2704494	635381063	2014	00:05:00	midnight	-99620	1.0	creatinine	4.23	4.23	mg/dL	mg/dL	2014	00:42:48	midnight	-99583

	labname	n
0	bedside glucose	3175835
1	potassium	1493261
2	sodium	1393205
3	glucose	1319496
4	Hgb	1298708

	hospitalid	number_of_patients	number_of_patients_with_tbl	data completion
11	73	7059	6861	97.195070
54	167	6092	5897	96.799081
106	264	5237	5075	96.906626
184	420	4679	4603	98.375721
58	176	4328	4254	98.290203
134	338	4277	4237	99.064765
71	199	4240	4122	97.216981
90	243	4243	4104	96.724016
206	458	3701	3643	98.432856
200	443	3656	3614	98.851204

	hospitalid	number_of_patients	number_of_patients_with_tbl	data completion
18	91	15	14	93.333333
45	151	14	13	92.857143
138	351	27	13	48.148148
16	86	15	12	80.000000
129	323	11	11	100.000000
146	361	13	10	76.923077
21	94	9	9	100.000000
154	385	3	3	100.000000
36	136	3	2	66.666667
176	409	33	0	0.000000