It is important to note that we are dealing with a dataset of 5,066 encounters. As such, it is possible that a particular patient's primary diagnosis field (on QDACT) will change (or be different) over time. Therefore for the remainer of this notebook, we will only explore the first primary diagnosis assigned to a patient and how that correlated to their number of follow-up visits. Also, it is important to note that due to the nebulous design of this exploration, we are not adjusting for the multiple tests that follow. This could be a critique that many reviewers would have if this work is ever submitted. Because this is only exploratory (not confirmatory or a clincal trial), I would recommend not adjusting (and have not done so below).
To explore the entire follow-up distribution of the CMMI population stratified by primary diagnosis type, we will use an interactive graphic. Because it is interactive, it requires you to place your cursor in the first cell below (starting with 'from IPython.core.display...') and then press the play button in the toolbar above. You will need to press play 5 times. After pressing play 5 times, the interactive graphic will appear. Instructions for interpreting the graphic are given below the figure.
In [30]:
from IPython.core.display import display, HTML;from string import Template;
In [31]:
HTML('<script src="//d3js.org/d3.v3.min.js" charset="utf-8"></script>')
Out[31]:
In [32]:
css_text2 = '''
#main { float: left; width: 750px;}#sidebar { float: right; width: 100px;}#sequence { width: 600px; height: 70px;}#legend { padding: 10px 0 0 3px;}#sequence text, #legend text { font-weight: 400; fill: #000000; font-size: 0.75em;}#graph-div2 { position: relative;}#graph-div2 { stroke: #fff;}#explanation { position: absolute; top: 330px; left: 405px; width: 140px; text-align: center; color: #666; z-index: -1;}#percentage { font-size: 2.3em;}
'''
In [33]:
with open('interactive_circle_pd.js', 'r') as myfile:
data=myfile.read()
js_text_template2 = Template(data)
In [34]:
html_template = Template('''
<style> $css_text </style>
<div id="sequence"></div>
<div id="graph-div2"></div>
<div id="explanation" style="visibility: hidden;">
<span id="percentage"></span><br/>
of patients meet this criteria
</div>
</div>
<script> $js_text </script>
''');
js_text2 = js_text_template2.substitute({'graphdiv': 'graph-div2'});
HTML(html_template.substitute({'css_text': css_text2, 'js_text': js_text2}))
Out[34]:
The graphic above illustrates the pattern of follow-ups in the CMMI data set for each of the 1,640 unique patients. Using your cursor, you can hover over a particular color to find out the specific primary diagnosis. Each concentric circle going out from the middle represent a new follow-up visit for a person. For example, in the figure above, starting in the center, there are two purple layers in the first concentric circle. If you hover over the purple layer without any additional layers, you will see '9.33%'. This means that 9.33% of the 1,640 patients only had 1 follow-up visit with a primary diagnosis of Neurologic. Hovering over the other purple layer (the one with additional layers, gives a value of 24.3%. This means that 24.3% of the population had a first visit labeled as neurologic and then had at least one additional follow-up visit.
I'm not sure if there is a hypothesis we want to test in relation to these two variables (i.e. Primary diagnosis and number of follow-up visits). If so, it would seem pertinent to only use the first primary diagnosis to test the continuous variable number of visits. Let me know your thoughts on proceeding down this road.
In [ ]:
import pandas as pd
prim_cont = pd.read_csv(open("./python_scripts/09_prim_diag_table.csv","r"))
prim_cont
Before delving into investigating whether symptoms are associated with primary diagnosis (or not), we'll need to first transform the data into meaningful variables and also examine patterns of missingness to determine how to appropriatly analyze questions of interest. So, we will create the following variables for analysis:
assessment date
we will code the first visit as 0. Every visit after this will be calculated as follows (j is an integer from 1 to m where m is the number of visits):
$$t_{j}=AssessmentDate_{j}-AssessmentDate_{1}$$Unable to Ascertain
=Unable to AscertainWe will build the following model (note, I'm leaving off phid as: 1) Was this provider name? 2)If so, too much missing data: $$ Y{missing-syptom-yes-no}=\beta{0}+x{PPSScore}\beta{1}+x{PrimaryDiagnosisMissing-yes-no}\beta{2}+x{ConsultLoc}\beta{3}$$ $$ Y{missing-syptom-yes-no}=\beta{0}+x{PPSScore-missing-yes-no}\beta{1}+x{PrimaryDiagnosisMissing-yes-no}\beta{2}+x{ConsultLoc}\beta_{3}$$
In [40]:
#Anxiety Missing Model
import pickle
missings = pickle.load(open("./python_scripts/13-missing_model_validPPSScore.p", "rb"))
missings_consultlocmissing = pickle.load(open("./python_scripts/13-missing_model_missingPPSScore.p", "rb"))
#Anxiety
print(missings.get("anxiety")[0])
In [41]:
print(missings_consultlocmissing.get('anxiety'))
Note, before looking at any of the models below, constipation and nausea models have convergence warnings, meaning the data was too sparse for convergence of estimates to occur (these models need more though before moving forward).We will test at $\alpha=0.05$. Let's talk about this output and how we can incorporate this model (or something like it) in the analysis below.
In [42]:
print(missings.get("appetite")[0])
In [43]:
#Caution, should not use - did not converge
print(missings.get("constipation")[0])
In [44]:
print(missings.get("depression")[0])
In [45]:
print(missings.get("drowsiness")[0])
In [46]:
#Caution, should not use - did not converge
print(missings.get("nausea")[0])
In [47]:
print(missings.get("pain")[0])
In [48]:
print(missings.get("shortness")[0])
In [49]:
print(missings.get("tiredness")[0])
In [50]:
print(missings.get("wellbeing")[0])
In this analysis, we create an indicator variable for moderate/severe for the following 5 symptoms:
- Pain
- Dyspnea
- Fatigue
- Anxiety
- Constipation
We start by first create a yes/no variable for each symptom. For instance, for pain, if the symptom score is [5,10], then a '1' is coded, else a '0' is coded (this includes things like <5, a 994 code which indicates unable to ascertain, as well as missing values). If any of this needs to be changed let me know. Once created, we then create 3 new variables which look to determin how many of the symptoms are moderate or severe. What follows is a summary of findings.
In [2]:
#first read in the (summary) data
import pandas as pd
summ = pd.read_csv(open("./python_scripts/14_number_of_modsev_symptoms.csv","r"))
#now print those with '1 or more moderate/severe symptoms'
summ[0:2]
Out[2]:
In [3]:
#now print those with '2 or more moderate/severe symptoms'
summ[2:4]
Out[3]:
In [4]:
#now print those with '3 or more moderate/severe symptoms'
summ[4:6]
Out[4]:
In [22]:
import pandas as pd
table = pd.read_csv(open('./python_scripts/11_primarydiagnosis_tables_catv2.csv','r'))
#Anxiety
table[0:5]
Out[22]:
In [23]:
#Appetite
table[5:10]
Out[23]:
In [24]:
#Constipation
table[10:15]
Out[24]:
In [25]:
#Depression
table[15:20]
Out[25]:
In [26]:
#Drowsiness
table[20:25]
Out[26]:
In [27]:
#Nausea
table[25:30]
Out[27]:
In [28]:
#Pain
table[30:35]
Out[28]:
In [29]:
#Shortness
table[35:40]
Out[29]:
In [ ]:
#Tiredness
table[40:45]
In [61]:
#Well Being
table[45:50]
Out[61]:
In [ ]:
# PPSScore
table[50:51]
We will use a discrete choice model to fit the following model:
$$log\frac{\pi_{ij}}{\pi_{ij}}=\alpha_{j}+x_{i_{PPSScore}}\beta_{j_{PPSScore}}+x_{i_{PrimayDiagnosis}}\beta_{j_{PrimaryDiagnosis}}$$
In [2]:
import pickle
models = pickle.load(open("./python_scripts/12-model_results.p", "rb"))
In [3]:
#Anxiety
print(models.get("anxiety")[0])
First, note the following important items about this model:
- Here we are modeling a 3-level Anxiety score (None, Unable to Respond, and Mild/Moderate/Severe) - For anxiety, this combination was chosen so that convergence of the model could be met - The reference group is Mild/Moderate/Severe - This model assumes a lack of interaction between Primary Diagnosis and PPSScore (should we test this?)
Now that this model has been built, there are several ways to extract/display information to readers and collaborators. For this first model, I will present all options, for all other models below, I will simply display a similar table as above. Please keep in mind that each model can be described in these additional ways too.
Log OddsUsing the information above, we could talk about the log odds of a Primary Diagnosis as compared to Cancer. For example consider a patient (at baseline) who has a PPSScore of 50 and a primary diagnosis of Cardiovascular, we could draw the following conclusion: $$log(\frac{\hat{\pi_{None}}}{\hat{\pi_{Mild/Moderate/Severe}}})=0.0029+0.2377*1+0.0098*50=0.7306$$
This equation says that the log odds that the Anxiey response is None (rather than Mild/Moderate/Severe) is 0.7306.Estimating Response Probabilities
We could also use these estimates to derive probabilities about responses for a given patient (or estimate functions for types of patients. For example, consider our same pretend patient above: $$\hat{\pi_{None}}=\frac{e^{0.0029+0.2377*1+0.0098*50}}{1+e^{0.0029+0.2377*1+0.0098*50}+e^{2.7168+0.4798*1-0.0773*50}}=0.579$$ $$\hat{\pi_{Unable to Respond}}=\frac{e^{2.7168+0.4798*1-0.0773*50}}{1+e^{0.0029+0.2377*1+0.0098*50}+e^{2.7168+0.4798*1-0.0773*50}}=0.143$$ $$\hat{\pi_{Mild/Moderate/Severe}}=\frac{1}{1+e^{0.0029+0.2377*1+0.0098*50}+e^{2.7168+0.4798*1-0.0773*50}}=0.279$$
Testing an Overall Primary Diagnosis Effect
We could test the following hypothesis: $$H_{0}:\beta_{Cardiovascular}=\beta_{Gastrointestinal}=...=\beta_{Renal}$$ This would allow us to first test to see if any of the effects is different that the others.
Marginal Effects
Marginal effects can be calculated in various ways (e.g. The average of the marginal effects at each observation, or The marginal effects at the mean of each regressor, etc.), below I will display the latter.
In [ ]:
#Anxiety Marginal Effects
print(models.get("anxiety")[1])
In [ ]:
#Appetite
print(models.get("appetite")[0])
In [ ]:
#Constipation
print(models.get("constipation")[0])
In [ ]:
#Depression
print(models.get("depression")[0])
In [ ]:
#Drowsiness
print(models.get("drowsiness")[0])
In [ ]:
#Nausea
print(models.get("nausea")[0])
In [ ]:
#Pain
print(models.get("pain")[0])
In [ ]:
#Shortness
print(models.get("shortness")[0])
In [ ]:
#Tiredness
print(models.get("tiredness")[0])
In [ ]:
#Wellbeing
print(models.get("wellbeing")[0])