NOTE:
There is only 1 encounter that contains the value consultloc=7 (Palliative Care Unit) - it is on the third visit for this particular individual. Because there is only 1 individual with this trait, this will cause problems, so I need to collapse this category to something. After discussing with Don (7/26/16), we decided to delete this encounter.
There are only 7 encounters that contain the values consultLoc=6 (Emergency Department) - this will cause problems with modeling, from viewing individuals with this value, it seems the most appropriate level to collapse this too is consultloc=1 (Hospital - ICU ( includes MICU, SICU, TICU)). After discussing with Don (7/26/16), we decided to delete these encounters.
It is important to note that we are dealing with a dataset of 5,066 encounters. As such, it is possible that a particular patient's care setting field (on QDACT) will change (or be different) over time. Therefore for the remainer of this notebook, we will only explore the first care setting assigned to a patient and how that correlated to their number of follow-up visits. Also, it is important to note that due to the nebulous design of this exploration, we are not adjusting for the multiple tests that follow. This could be a critique that many reviewers would have if this work is ever submitted.
Because this is only exploratory (not confirmatory or a clincal trial), I would recommend not adjusting (and have not done so below).
To explore the entire follow-up distribution of the CMMI population stratified by care setting, we will use an interactive graphic. Because it is interactive, it requires you to place your cursor in the first cell below (starting with 'from IPython.core.display...') and then press the play button in the toolbar above. You will need to press play 5 times. After pressing play 5 times, the interactive graphic will appear. Instructions for interpreting the graphic are given below the figure.
In [11]:
from IPython.core.display import display, HTML;from string import Template;
In [12]:
HTML('<script src="//d3js.org/d3.v3.min.js" charset="utf-8"></script>')
Out[12]:
In [13]:
css_text2 = '''
#main { float: left; width: 750px;}#sidebar { float: right; width: 100px;}#sequence { width: 600px; height: 70px;}#legend { padding: 10px 0 0 3px;}#sequence text, #legend text { font-weight: 400; fill: #000000; font-size: 0.75em;}#graph-div2 { position: relative;}#graph-div2 { stroke: #fff;}#explanation { position: absolute; top: 330px; left: 405px; width: 140px; text-align: center; color: #666; z-index: -1;}#percentage { font-size: 2.3em;}
'''
In [14]:
with open('interactive_circle_cl.js', 'r') as myfile:
data=myfile.read()
js_text_template2 = Template(data)
In [15]:
html_template = Template('''
<style> $css_text </style>
<div id="sequence"></div>
<div id="graph-div2"></div>
<div id="explanation" style="visibility: hidden;">
<span id="percentage"></span><br/>
of patients meet this criteria
</div>
</div>
<script> $js_text </script>
''');
js_text2 = js_text_template2.substitute({'graphdiv': 'graph-div2'});
HTML(html_template.substitute({'css_text': css_text2, 'js_text': js_text2}))
Out[15]:
The graphic above illustrates the pattern of follow-ups in the CMMI data set for each of the 1,640 unique patients. Using your cursor, you can hover over a particular color to find out the specific care setting. Each concentric circle going out from the middle represent a new follow-up visit for a person. For example, in the figure above, starting in the center, there is a red layer in the first concentric circle. If you hover over the first red circle, this says 41.8%. This means that 41.8% of the 1,640 patients reported 'Long Term Care' at their first visit. Hovering over the next layer that is black, gives a value of 7.26%. This means that 7.26% of the population had a first visit labeled as 'Long Term Care' and then had no additional visits.
In this AIM, we will look at testing the null hypothesis of no association between each row variable and the column variable (ConsultLoc). There is obviously a time aspect to this data, but for this aim, we will stick to the first encounter only.
Here is how the data is munged for this aim:
In [1]:
import pandas as pd
table = pd.read_csv(open('./python_scripts/11_primarydiagnosis_tables_catv2_consultLoc.csv','r'))
#Anxiety
table[0:5]
Out[1]:
In [ ]:
#Appetite
table[5:10]
In [ ]:
#Constipation
table[10:15]
In [ ]:
#Depression
table[15:20]
In [ ]:
#Drowsiness
table[20:25]
In [ ]:
#Nausea
table[25:30]
In [ ]:
#Pain
table[30:35]
In [ ]:
#Shortness
table[35:40]
In [ ]:
#Tiredness
table[40:45]
In [ ]:
#Well Being
table[45:50]
In [ ]:
# PPSScore
table[50:51]