Data Visualization with Python

Authors: Prof. med. Thomas Ganslandt Thomas.Ganslandt@medma.uni-heidelberg.de
and Kim Hee HeeEun.Kim@medma.uni-heidelberg.de

Heinrich-Lanz-Center for Digital Health (HLZ) of the Medical Faculty Mannheim
Heidelberg University

This is a part of a tutorial prepared for TMF summer school on 03.07.2019

Prerequisite: Access the MIMIC-III Dataset

The MIMIC (Medical Information Mart for Intensive Care) is a freely accessible database containing Intensive Care Unit (ICU) patients. The demo dataset is limited to 100 patients and publicly available as CSV files or as a single Postgres database backup file

Instruction to access the MIMIC demo dataset:

Create an account on PhysioNet using the following link: https://physionet.org/register/

Navigate to the project page: https://physionet.org/content/mimiciii-demo/

Read the Data Use Agreement and click “I agree” to access the data

Prerequisite: MIMIC-III files locally

You should place the following MIMIC-III data files in the data/ subfolder:

ADMISSIONS.csv
PATIENTS.csv
CPTEVENTS.csv

Database description: https://mimic.physionet.org/gettingstarted/overview/
Table description: https://mimic.physionet.org/mimictables/admissions/
ER-Diagram: https://mit-lcp.github.io/mimic-schema-spy/relationships.html

Agenda

Pandas
Pandas-Profiling
Missingno
Wordcloud

Pandas

http://pandas.pydata.org/pandas-docs/stable/reference/
Pandas is a Python library for exploring, processing, and model data

Pandas supports charting a tabular dataset

DataFrame.plot([x, y], kind)

kind :

'line': line plot (default)

'bar': vertical bar plot

'barh': horizontal bar plot

'hist': histogram

'box': boxplot

'kde': Kernel Density Estimation plot

'density': same as 'kde'

'area': stacked area plot

'pie': pie plot

'scatter': scatter plot

'hexbin': Hexagonal binning plot

Visualize the admission table



In [ ]:

    
import pandas as pd
pd.set_option('display.max_columns', 999)
import pandas.io.sql as psql
# plot a figure directly on Notebook
import matplotlib.pyplot as plt
%matplotlib inline



In [ ]:

    
a = pd.read_csv("data/ADMISSIONS.csv")
a.columns = map(str.lower, a.columns)
a.groupby(['marital_status']).count()['row_id'].plot(kind='pie')



In [ ]:

    
a.groupby(['religion']).count()['row_id'].plot(kind = 'barh')



In [ ]:

    
p = pd.read_csv("data/PATIENTS.csv")
p.columns = map(str.lower, p.columns)
ap = pd.merge(a, p, on = 'subject_id' , how = 'inner')
ap.groupby(['religion','gender']).size().unstack().plot(kind="barh", stacked=True)



In [ ]:

    
c = pd.read_csv("data/CPTEVENTS.csv")
c.columns = map(str.lower, c.columns)
ac = pd.merge(a, c, on = 'hadm_id' , how = 'inner')
ac.groupby(['discharge_location','sectionheader']).size().unstack().plot(kind="barh", stacked=True)

Agenda

Pandas
Pandas-Profiling
Missingno
Wordcloud

Pandas-Profiling

https://github.com/pandas-profiling/pandas-profiling
Pandas-Profiling is a Python library for exploratory data analysis

Import pandas-profiling (1/3)



In [ ]:

    
# !conda install -c conda-forge pandas-profiling -y
import pandas_profiling

Load the admissions table (2/3)



In [ ]:

    
a = pd.read_csv("data/ADMISSIONS.csv")
a.columns = map(str.lower, a.columns)

Profile the table (3/3)



In [ ]:

    
# ignore the times when profiling since they are uninteresting
cols = [c for c in a.columns if not c.endswith('time')]
pandas_profiling.ProfileReport(a[cols])

Agenda

Pandas
Pandas-Profiling
Missingno
Wordcloud

Missingno

https://github.com/ResidentMario/missingno
Missingno offers a visual summary of the completeness of a dataset. This example brings some intuitive thoughts about ADMISSIONS table:

Not every patient is admitted to the emergency department as there are many missing values in edregtime and edouttime.
language data of patients is mendatory field, but it used to be not.



In [ ]:

    
# !conda install -c conda-forge missingno -y
import missingno as msno
msno.matrix(a)

Agenda

Pandas
Pandas-Profiling
Missingno
Wordcloud

Wordcloud

https://github.com/amueller/word_cloud
Wordcloud visualizes a given text in a word-cloud format
This example illustrates that majority of patients suffered from sepsis

Import the Wordcloud package (1/4)



In [ ]:

    
# !conda install -c conda-forge wordcloud -y
from wordcloud import WordCloud

Prepare an input text in string (2/4)



In [ ]:

    
text = str(a['diagnosis'].values)

Generate a word-cloud from the input text (3/4)



In [ ]:

    
wordcloud = WordCloud().generate(text)

Plot the word-cloud (4/4)



In [ ]:

    
import matplotlib.pyplot as plt
plt.figure(figsize = (10,10))
plt.imshow(wordcloud, interpolation = 'bilinear')
plt.axis("off")
plt.show()

Question?

Authors: Prof. med. Thomas Ganslandt Thomas.Ganslandt@medma.uni-heidelberg.de
and Kim Hee HeeEun.Kim@medma.uni-heidelberg.de

Heinrich-Lanz-Center for Digital Health (HLZ) of the Medical Faculty Mannheim
Heidelberg University

This is a part of a tutorial prepared for TMF summer school on 03.07.2019