Exploring Primary Diagnoses

It is important to note that we are dealing with a dataset of 5,066 encounters. As such, it is possible that a particular patient's primary diagnosis field (on QDACT) will change (or be different) over time. Therefore for the remainer of this notebook, we will only explore the first primary diagnosis assigned to a patient and how that correlated to their number of follow-up visits. Also, it is important to note that due to the nebulous design of this exploration, we are not adjusting for the multiple tests that follow. This could be a critique that many reviewers would have if this work is ever submitted. Because this is only exploratory (not confirmatory or a clincal trial), I would recommend not adjusting (and have not done so below).

Follow-up Visit Distribution by Primary Diagnosis (Interactive Graphic)


To explore the entire follow-up distribution of the CMMI population stratified by primary diagnosis type, we will use an interactive graphic. Because it is interactive, it requires you to place your cursor in the first cell below (starting with 'from IPython.core.display...') and then press the play button in the toolbar above. You will need to press play 5 times. After pressing play 5 times, the interactive graphic will appear. Instructions for interpreting the graphic are given below the figure.


In [30]:
from IPython.core.display import display, HTML;from string import Template;

In [31]:
HTML('<script src="//d3js.org/d3.v3.min.js" charset="utf-8"></script>')


Out[31]:

In [32]:
css_text2 = '''
#main {  float: left;  width: 750px;}#sidebar {  float: right;  width: 100px;}#sequence {  width: 600px;  height: 70px;}#legend {  padding: 10px 0 0 3px;}#sequence text, #legend text {  font-weight: 400;  fill: #000000; font-size: 0.75em;}#graph-div2 {  position: relative;}#graph-div2 {  stroke: #fff;}#explanation {  position: absolute;  top: 330px;  left: 405px;  width: 140px;  text-align: center;  color: #666;  z-index: -1;}#percentage {  font-size: 2.3em;}
'''

In [33]:
with open('interactive_circle_pd.js', 'r') as myfile:
    data=myfile.read()

js_text_template2 = Template(data)

In [34]:
html_template = Template('''
<style> $css_text </style>
<div id="sequence"></div>
      <div id="graph-div2"></div>
        <div id="explanation" style="visibility: hidden;">
          <span id="percentage"></span><br/>
          of patients meet this criteria
        </div>
      </div>
<script> $js_text </script>
''');
js_text2 = js_text_template2.substitute({'graphdiv': 'graph-div2'});
HTML(html_template.substitute({'css_text': css_text2, 'js_text': js_text2}))


Out[34]:

Graphic Interpretation

The graphic above illustrates the pattern of follow-ups in the CMMI data set for each of the 1,640 unique patients. Using your cursor, you can hover over a particular color to find out the specific primary diagnosis. Each concentric circle going out from the middle represent a new follow-up visit for a person. For example, in the figure above, starting in the center, there are two purple layers in the first concentric circle. If you hover over the purple layer without any additional layers, you will see '9.33%'. This means that 9.33% of the 1,640 patients only had 1 follow-up visit with a primary diagnosis of Neurologic. Hovering over the other purple layer (the one with additional layers, gives a value of 24.3%. This means that 24.3% of the population had a first visit labeled as neurologic and then had at least one additional follow-up visit.

Statistical Inference

I'm not sure if there is a hypothesis we want to test in relation to these two variables (i.e. Primary diagnosis and number of follow-up visits). If so, it would seem pertinent to only use the first primary diagnosis to test the continuous variable number of visits. Let me know your thoughts on proceeding down this road.

Primary Diagnosis by Type of Primary Diagnosis


In this aim, we try to illustrate (graphically or in tabular form) how primary diagnoses are broken out into types of primary diagnoses.


In [ ]:
import pandas as pd
prim_cont = pd.read_csv(open("./python_scripts/09_prim_diag_table.csv","r"))
prim_cont

AIM: Association with Symptoms

Variable Definintions


Before delving into investigating whether symptoms are associated with primary diagnosis (or not), we'll need to first transform the data into meaningful variables and also examine patterns of missingness to determine how to appropriatly analyze questions of interest. So, we will create the following variables for analysis:

  • Time from first visit. Using a patient's first visit assessment date we will code the first visit as 0. Every visit after this will be calculated as follows (j is an integer from 1 to m where m is the number of visits): $$t_{j}=AssessmentDate_{j}-AssessmentDate_{1}$$
  • ESAS Symptom Class. We will create a 5-level nominal categorical variable from each ESAS Symptom score as follows:
    • 0=None
    • [1,4]=Mild
    • [5,7]=Moderate
    • [8,10]=Severe
    • Unable to Ascertain=Unable to Ascertain
  • PPSScore. we will treat this variable as continuous even though is only takes values [0,10,20,...,100]. Values of 999='Unknown'(n=733) will be treated as missing.
  • PrimaryDiagnosis. We will treat this variable as nominal categorial. The two categories 998='Missing'(n=41) and 999='Unknown'(n=40), will be set to missing.

Missing Data Mechanisms


Analysis 1

We will build the following model (note, I'm leaving off phid as: 1) Was this provider name? 2)If so, too much missing data: $$ Y{missing-syptom-yes-no}=\beta{0}+x{PPSScore}\beta{1}+x{PrimaryDiagnosisMissing-yes-no}\beta{2}+x{ConsultLoc}\beta{3}$$ $$ Y{missing-syptom-yes-no}=\beta{0}+x{PPSScore-missing-yes-no}\beta{1}+x{PrimaryDiagnosisMissing-yes-no}\beta{2}+x{ConsultLoc}\beta_{3}$$


In [40]:
#Anxiety Missing Model
import pickle
missings = pickle.load(open("./python_scripts/13-missing_model_validPPSScore.p", "rb"))
missings_consultlocmissing = pickle.load(open("./python_scripts/13-missing_model_missingPPSScore.p", "rb"))
#Anxiety
print(missings.get("anxiety")[0])


                               Logit Regression Results                               
======================================================================================
Dep. Variable:     ESASAnxiety_5level_missInd   No. Observations:                 1527
Model:                                  Logit   Df Residuals:                     1519
Method:                                   MLE   Df Model:                            7
Date:                        Wed, 24 Aug 2016   Pseudo R-squ.:                 0.04127
Time:                                10:16:47   Log-Likelihood:                -527.08
converged:                               True   LL-Null:                       -549.77
                                                LLR p-value:                 1.156e-07
========================================================================================================
                                           coef    std err          z      P>|z|      [95.0% Conf. Int.]
--------------------------------------------------------------------------------------------------------
Intercept                               -2.2930      0.350     -6.549      0.000        -2.979    -1.607
ConsultLoc_fmt[T.Hospital - ICU]         0.7126      0.339      2.103      0.035         0.048     1.377
ConsultLoc_fmt[T.Hospital - general]     1.0367      0.258      4.025      0.000         0.532     1.542
ConsultLoc_fmt[T.Long term care]        -0.1483      0.286     -0.518      0.605        -0.710     0.413
ConsultLoc_fmt[T.Other]                  1.1222      0.604      1.857      0.063        -0.062     2.307
ConsultLoc_fmt[T.Outpatient]             0.3824      0.578      0.662      0.508        -0.751     1.516
PPSScore_nm                             -0.0045      0.005     -0.869      0.385        -0.015     0.006
PrimaryDiagnosis_missInd                -0.0835      0.755     -0.111      0.912        -1.563     1.396
========================================================================================================

In [41]:
print(missings_consultlocmissing.get('anxiety'))


                               Logit Regression Results                               
======================================================================================
Dep. Variable:     ESASAnxiety_5level_missInd   No. Observations:                 1632
Model:                                  Logit   Df Residuals:                     1624
Method:                                   MLE   Df Model:                            7
Date:                        Wed, 24 Aug 2016   Pseudo R-squ.:                  0.2504
Time:                                11:01:53   Log-Likelihood:                -542.67
converged:                               True   LL-Null:                       -723.94
                                                LLR p-value:                 2.541e-74
========================================================================================================
                                           coef    std err          z      P>|z|      [95.0% Conf. Int.]
--------------------------------------------------------------------------------------------------------
Intercept                               -2.4986      0.208    -12.040      0.000        -2.905    -2.092
ConsultLoc_fmt[T.Hospital - ICU]         0.6046      0.304      1.992      0.046         0.010     1.200
ConsultLoc_fmt[T.Hospital - general]     0.8727      0.233      3.751      0.000         0.417     1.329
ConsultLoc_fmt[T.Long term care]        -0.5242      0.273     -1.918      0.055        -1.060     0.011
ConsultLoc_fmt[T.Other]                  0.9447      0.612      1.544      0.123        -0.254     2.144
ConsultLoc_fmt[T.Outpatient]             0.1261      0.590      0.214      0.831        -1.031     1.283
PPSScore_missInd                         3.0963      0.207     14.940      0.000         2.690     3.502
PrimaryDiagnosis_missInd                 0.2754      0.639      0.431      0.667        -0.978     1.528
========================================================================================================

Conclusions

Note, before looking at any of the models below, constipation and nausea models have convergence warnings, meaning the data was too sparse for convergence of estimates to occur (these models need more though before moving forward).We will test at $\alpha=0.05$. Let's talk about this output and how we can incorporate this model (or something like it) in the analysis below.


In [42]:
print(missings.get("appetite")[0])


                                Logit Regression Results                               
=======================================================================================
Dep. Variable:     ESASAppetite_5level_missInd   No. Observations:                 1527
Model:                                   Logit   Df Residuals:                     1519
Method:                                    MLE   Df Model:                            7
Date:                         Wed, 24 Aug 2016   Pseudo R-squ.:                 0.05744
Time:                                 10:16:48   Log-Likelihood:                -508.57
converged:                                True   LL-Null:                       -539.56
                                                 LLR p-value:                 6.053e-11
========================================================================================================
                                           coef    std err          z      P>|z|      [95.0% Conf. Int.]
--------------------------------------------------------------------------------------------------------
Intercept                               -2.2291      0.346     -6.439      0.000        -2.908    -1.551
ConsultLoc_fmt[T.Hospital - ICU]         0.8178      0.327      2.501      0.012         0.177     1.459
ConsultLoc_fmt[T.Hospital - general]     0.9995      0.253      3.947      0.000         0.503     1.496
ConsultLoc_fmt[T.Long term care]        -0.4957      0.298     -1.666      0.096        -1.079     0.088
ConsultLoc_fmt[T.Other]                  1.0722      0.603      1.779      0.075        -0.109     2.253
ConsultLoc_fmt[T.Outpatient]             0.0218      0.644      0.034      0.973        -1.240     1.284
PPSScore_nm                             -0.0048      0.005     -0.929      0.353        -0.015     0.005
PrimaryDiagnosis_missInd                -0.0521      0.758     -0.069      0.945        -1.538     1.434
========================================================================================================

In [43]:
#Caution, should not use - did not converge
print(missings.get("constipation")[0])


                                  Logit Regression Results                                 
===========================================================================================
Dep. Variable:     ESASConstipation_5level_missInd   No. Observations:                 1527
Model:                                       Logit   Df Residuals:                     1519
Method:                                        MLE   Df Model:                            7
Date:                             Wed, 24 Aug 2016   Pseudo R-squ.:                 0.06452
Time:                                     10:16:48   Log-Likelihood:                -429.07
converged:                                   False   LL-Null:                       -458.66
                                                     LLR p-value:                 2.199e-10
========================================================================================================
                                           coef    std err          z      P>|z|      [95.0% Conf. Int.]
--------------------------------------------------------------------------------------------------------
Intercept                               -2.4229      0.424     -5.712      0.000        -3.254    -1.592
ConsultLoc_fmt[T.Hospital - ICU]         1.1354      0.406      2.799      0.005         0.340     1.931
ConsultLoc_fmt[T.Hospital - general]     1.3818      0.335      4.127      0.000         0.726     2.038
ConsultLoc_fmt[T.Long term care]         0.1841      0.364      0.506      0.613        -0.530     0.898
ConsultLoc_fmt[T.Other]                  0.1696      1.072      0.158      0.874        -1.932     2.271
ConsultLoc_fmt[T.Outpatient]            -0.3137      1.061     -0.296      0.768        -2.394     1.767
PPSScore_nm                             -0.0147      0.006     -2.499      0.012        -0.026    -0.003
PrimaryDiagnosis_missInd               -16.4045   2766.009     -0.006      0.995     -5437.683  5404.874
========================================================================================================

In [44]:
print(missings.get("depression")[0])


                                 Logit Regression Results                                
=========================================================================================
Dep. Variable:     ESASDepression_5level_missInd   No. Observations:                 1527
Model:                                     Logit   Df Residuals:                     1519
Method:                                      MLE   Df Model:                            7
Date:                           Wed, 24 Aug 2016   Pseudo R-squ.:                 0.05170
Time:                                   10:16:48   Log-Likelihood:                -643.97
converged:                                  True   LL-Null:                       -679.08
                                                   LLR p-value:                 1.333e-12
========================================================================================================
                                           coef    std err          z      P>|z|      [95.0% Conf. Int.]
--------------------------------------------------------------------------------------------------------
Intercept                               -2.2431      0.307     -7.314      0.000        -2.844    -1.642
ConsultLoc_fmt[T.Hospital - ICU]         0.8523      0.295      2.889      0.004         0.274     1.431
ConsultLoc_fmt[T.Hospital - general]     1.1894      0.224      5.309      0.000         0.750     1.628
ConsultLoc_fmt[T.Long term care]        -0.0598      0.246     -0.243      0.808        -0.543     0.423
ConsultLoc_fmt[T.Other]                  1.0965      0.553      1.981      0.048         0.012     2.181
ConsultLoc_fmt[T.Outpatient]            -0.3330      0.635     -0.525      0.600        -1.577     0.911
PPSScore_nm                              0.0010      0.005      0.222      0.825        -0.008     0.010
PrimaryDiagnosis_missInd                 0.3623      0.567      0.639      0.523        -0.750     1.474
========================================================================================================

In [45]:
print(missings.get("drowsiness")[0])


                                 Logit Regression Results                                
=========================================================================================
Dep. Variable:     ESASDrowsiness_5level_missInd   No. Observations:                 1527
Model:                                     Logit   Df Residuals:                     1519
Method:                                      MLE   Df Model:                            7
Date:                           Wed, 24 Aug 2016   Pseudo R-squ.:                 0.05011
Time:                                   10:16:49   Log-Likelihood:                -563.13
converged:                                  True   LL-Null:                       -592.84
                                                   LLR p-value:                 1.971e-10
========================================================================================================
                                           coef    std err          z      P>|z|      [95.0% Conf. Int.]
--------------------------------------------------------------------------------------------------------
Intercept                               -2.4554      0.351     -6.988      0.000        -3.144    -1.767
ConsultLoc_fmt[T.Hospital - ICU]         1.1325      0.336      3.367      0.001         0.473     1.792
ConsultLoc_fmt[T.Hospital - general]     1.3676      0.270      5.069      0.000         0.839     1.896
ConsultLoc_fmt[T.Long term care]         0.2270      0.291      0.779      0.436        -0.344     0.798
ConsultLoc_fmt[T.Other]                  1.2855      0.611      2.105      0.035         0.088     2.483
ConsultLoc_fmt[T.Outpatient]             0.2309      0.651      0.354      0.723        -1.046     1.507
PPSScore_nm                             -0.0045      0.005     -0.919      0.358        -0.014     0.005
PrimaryDiagnosis_missInd                -0.1408      0.758     -0.186      0.853        -1.626     1.344
========================================================================================================

In [46]:
#Caution, should not use - did not converge
print(missings.get("nausea")[0])


                               Logit Regression Results                              
=====================================================================================
Dep. Variable:     ESASNausea_5level_missInd   No. Observations:                 1527
Model:                                 Logit   Df Residuals:                     1519
Method:                                  MLE   Df Model:                            7
Date:                       Wed, 24 Aug 2016   Pseudo R-squ.:                 0.07903
Time:                               10:16:49   Log-Likelihood:                -354.57
converged:                             False   LL-Null:                       -385.00
                                               LLR p-value:                 1.020e-10
========================================================================================================
                                           coef    std err          z      P>|z|      [95.0% Conf. Int.]
--------------------------------------------------------------------------------------------------------
Intercept                               -2.8895      0.527     -5.483      0.000        -3.922    -1.857
ConsultLoc_fmt[T.Hospital - ICU]         1.1209      0.528      2.124      0.034         0.087     2.155
ConsultLoc_fmt[T.Hospital - general]     1.7994      0.437      4.121      0.000         0.944     2.655
ConsultLoc_fmt[T.Long term care]         0.3478      0.476      0.730      0.465        -0.586     1.281
ConsultLoc_fmt[T.Other]                  1.9986      0.752      2.658      0.008         0.525     3.473
ConsultLoc_fmt[T.Outpatient]             1.5064      0.734      2.051      0.040         0.067     2.946
PPSScore_nm                             -0.0180      0.007     -2.688      0.007        -0.031    -0.005
PrimaryDiagnosis_missInd               -22.8372    8.4e+04     -0.000      1.000     -1.65e+05  1.65e+05
========================================================================================================

In [47]:
print(missings.get("pain")[0])


                              Logit Regression Results                             
===================================================================================
Dep. Variable:     ESASPain_5level_missInd   No. Observations:                 1527
Model:                               Logit   Df Residuals:                     1519
Method:                                MLE   Df Model:                            7
Date:                     Wed, 24 Aug 2016   Pseudo R-squ.:                 0.03518
Time:                             10:16:49   Log-Likelihood:                -288.62
converged:                            True   LL-Null:                       -299.15
                                             LLR p-value:                  0.003699
========================================================================================================
                                           coef    std err          z      P>|z|      [95.0% Conf. Int.]
--------------------------------------------------------------------------------------------------------
Intercept                               -2.8326      0.530     -5.343      0.000        -3.872    -1.794
ConsultLoc_fmt[T.Hospital - ICU]         0.8412      0.500      1.683      0.092        -0.138     1.821
ConsultLoc_fmt[T.Hospital - general]     0.9427      0.404      2.335      0.020         0.151     1.734
ConsultLoc_fmt[T.Long term care]        -0.0695      0.443     -0.157      0.875        -0.938     0.799
ConsultLoc_fmt[T.Other]                  1.3002      0.830      1.567      0.117        -0.327     2.927
ConsultLoc_fmt[T.Outpatient]             0.7780      0.817      0.952      0.341        -0.824     2.380
PPSScore_nm                             -0.0141      0.008     -1.811      0.070        -0.029     0.001
PrimaryDiagnosis_missInd                 0.1196      1.041      0.115      0.909        -1.922     2.161
========================================================================================================

In [48]:
print(missings.get("shortness")[0])


                                    Logit Regression Results                                    
================================================================================================
Dep. Variable:     ESASShortnessOfBreath_5level_missInd   No. Observations:                 1527
Model:                                            Logit   Df Residuals:                     1519
Method:                                             MLE   Df Model:                            7
Date:                                  Wed, 24 Aug 2016   Pseudo R-squ.:                 0.03393
Time:                                          10:16:49   Log-Likelihood:                -256.64
converged:                                         True   LL-Null:                       -265.66
                                                          LLR p-value:                   0.01184
========================================================================================================
                                           coef    std err          z      P>|z|      [95.0% Conf. Int.]
--------------------------------------------------------------------------------------------------------
Intercept                               -2.4835      0.580     -4.281      0.000        -3.621    -1.346
ConsultLoc_fmt[T.Hospital - ICU]        -0.0528      0.626     -0.084      0.933        -1.280     1.174
ConsultLoc_fmt[T.Hospital - general]     0.8871      0.432      2.052      0.040         0.040     1.735
ConsultLoc_fmt[T.Long term care]         0.1309      0.456      0.287      0.774        -0.764     1.025
ConsultLoc_fmt[T.Other]                  0.6575      1.099      0.598      0.550        -1.496     2.811
ConsultLoc_fmt[T.Outpatient]             0.3282      1.091      0.301      0.764        -1.810     2.466
PPSScore_nm                             -0.0242      0.009     -2.697      0.007        -0.042    -0.007
PrimaryDiagnosis_missInd                 0.2103      1.043      0.202      0.840        -1.833     2.254
========================================================================================================

In [49]:
print(missings.get("tiredness")[0])


                                Logit Regression Results                                
========================================================================================
Dep. Variable:     ESASTiredness_5level_missInd   No. Observations:                 1527
Model:                                    Logit   Df Residuals:                     1519
Method:                                     MLE   Df Model:                            7
Date:                          Wed, 24 Aug 2016   Pseudo R-squ.:                 0.04362
Time:                                  10:16:50   Log-Likelihood:                -525.78
converged:                                 True   LL-Null:                       -549.77
                                                  LLR p-value:                 3.615e-08
========================================================================================================
                                           coef    std err          z      P>|z|      [95.0% Conf. Int.]
--------------------------------------------------------------------------------------------------------
Intercept                               -1.9580      0.335     -5.844      0.000        -2.615    -1.301
ConsultLoc_fmt[T.Hospital - ICU]         0.3003      0.331      0.907      0.364        -0.348     0.949
ConsultLoc_fmt[T.Hospital - general]     0.7799      0.236      3.309      0.001         0.318     1.242
ConsultLoc_fmt[T.Long term care]        -0.5233      0.272     -1.925      0.054        -1.056     0.009
ConsultLoc_fmt[T.Other]                  0.4913      0.659      0.745      0.456        -0.801     1.783
ConsultLoc_fmt[T.Outpatient]            -0.6298      0.756     -0.833      0.405        -2.112     0.853
PPSScore_nm                             -0.0056      0.005     -1.077      0.281        -0.016     0.005
PrimaryDiagnosis_missInd                -0.1920      0.754     -0.255      0.799        -1.669     1.285
========================================================================================================

In [50]:
print(missings.get("wellbeing")[0])


                                Logit Regression Results                                
========================================================================================
Dep. Variable:     ESASWellBeing_5level_missInd   No. Observations:                 1527
Model:                                    Logit   Df Residuals:                     1519
Method:                                     MLE   Df Model:                            7
Date:                          Wed, 24 Aug 2016   Pseudo R-squ.:                 0.02312
Time:                                  10:16:50   Log-Likelihood:                -971.44
converged:                                 True   LL-Null:                       -994.43
                                                  LLR p-value:                 8.782e-08
========================================================================================================
                                           coef    std err          z      P>|z|      [95.0% Conf. Int.]
--------------------------------------------------------------------------------------------------------
Intercept                               -1.1649      0.231     -5.047      0.000        -1.617    -0.713
ConsultLoc_fmt[T.Hospital - ICU]         0.3324      0.224      1.487      0.137        -0.106     0.771
ConsultLoc_fmt[T.Hospital - general]     0.6406      0.159      4.037      0.000         0.330     0.952
ConsultLoc_fmt[T.Long term care]        -0.1535      0.161     -0.951      0.341        -0.470     0.163
ConsultLoc_fmt[T.Other]                  0.5828      0.468      1.245      0.213        -0.335     1.501
ConsultLoc_fmt[T.Outpatient]             0.0389      0.359      0.108      0.914        -0.664     0.742
PPSScore_nm                              0.0079      0.004      2.157      0.031         0.001     0.015
PrimaryDiagnosis_missInd                 0.0529      0.442      0.120      0.905        -0.814     0.920
========================================================================================================

Number of Moderate Severe Symptoms


In this analysis, we create an indicator variable for moderate/severe for the following 5 symptoms:

- Pain
- Dyspnea
- Fatigue
- Anxiety
- Constipation

We start by first create a yes/no variable for each symptom. For instance, for pain, if the symptom score is [5,10], then a '1' is coded, else a '0' is coded (this includes things like <5, a 994 code which indicates unable to ascertain, as well as missing values). If any of this needs to be changed let me know. Once created, we then create 3 new variables which look to determin how many of the symptoms are moderate or severe. What follows is a summary of findings.


In [2]:
#first read in the (summary) data
import pandas as pd
summ = pd.read_csv(open("./python_scripts/14_number_of_modsev_symptoms.csv","r"))
#now print those with '1 or more moderate/severe symptoms'
summ[0:2]


Out[2]:
Unnamed: 0 Other Diagnosis N=(87) Cancer N=(206) Cardiovascular N=(256) Pulmonary N=(227) Gastrointestinal N=(56) Renal N=(65) Neurologic N=(551) Infectious N=(164) pvalue
0 One or More Moderate/Severe - 0 48/87 (55.17%) 92/206 (44.66%) 157/256 (61.33%) 104/227 (45.81%) 38/56 (67.86%) 46/65 (70.77%) 462/551 (83.85%) 121/164 (73.78%) 3.553260e-34
1 One or More Moderate/Severe - 1 39/87 (44.83%) 114/206 (55.34%) 99/256 (38.67%) 123/227 (54.19%) 18/56 (32.14%) 19/65 (29.23%) 89/551 (16.15%) 43/164 (26.22%) 3.553260e-34

In [3]:
#now print those with '2 or more moderate/severe symptoms'
summ[2:4]


Out[3]:
Unnamed: 0 Other Diagnosis N=(87) Cancer N=(206) Cardiovascular N=(256) Pulmonary N=(227) Gastrointestinal N=(56) Renal N=(65) Neurologic N=(551) Infectious N=(164) pvalue
2 Two or More Moderate/Severe - 0 72/87 (82.76%) 156/206 (75.73%) 225/256 (87.89%) 162/227 (71.37%) 47/56 (83.93%) 59/65 (90.77%) 523/551 (94.92%) 150/164 (91.46%) 1.244658e-19
3 Two or More Moderate/Severe - 1 15/87 (17.24%) 50/206 (24.27%) 31/256 (12.11%) 65/227 (28.63%) 9/56 (16.07%) 6/65 (9.23%) 28/551 (5.08%) 14/164 (8.54%) 1.244658e-19

In [4]:
#now print those with '3 or more moderate/severe symptoms'
summ[4:6]


Out[4]:
Unnamed: 0 Other Diagnosis N=(87) Cancer N=(206) Cardiovascular N=(256) Pulmonary N=(227) Gastrointestinal N=(56) Renal N=(65) Neurologic N=(551) Infectious N=(164) pvalue
4 Three or More Moderate/Severe - 0 84/87 (96.55%) 189/206 (91.75%) 244/256 (95.31%) 207/227 (91.19%) 52/56 (92.86%) 62/65 (95.38%) 542/551 (98.37%) 163/164 (99.39%) 0.000009
5 Three or More Moderate/Severe - 1 3/87 (3.45%) 17/206 (8.25%) 12/256 (4.69%) 20/227 (8.81%) 4/56 (7.14%) 3/65 (4.62%) 9/551 (1.63%) 1/164 (0.61%) 0.000009

Unadjusted (PPSScore) Association: Symptoms (Categorical) and Primary Diagnosis


Proposed Analysis

I will create a 5x8 contingency table (for each symptom score - for instance, Nausea). Then I will provide a Chi-Square test of independence (treating the N of the table as fixed). This will test the null hypothesis of independance.


In [22]:
import pandas as pd
table = pd.read_csv(open('./python_scripts/11_primarydiagnosis_tables_catv2.csv','r'))
#Anxiety
table[0:5]


Out[22]:
Unnamed: 0 Other Diagnosis N=(87) Cancer N=(206) Cardiovascular N=(256) Pulmonary N=(227) Gastrointestinal N=(56) Renal N=(65) Neurologic N=(551) Infectious N=(164) pvalue
0 Anxiety - Mild 10/66 (15.15%) 42/174 (24.14%) 46/225 (20.44%) 38/184 (20.65%) 7/40 (17.50%) 8/52 (15.38%) 34/481 (7.07%) 26/129 (20.16%) 2.629163e-55
1 Anxiety - Moderate 6/66 (9.09%) 11/174 (6.32%) 8/225 (3.56%) 26/184 (14.13%) 4/40 (10.00%) 3/52 (5.77%) 10/481 (2.08%) 2/129 (1.55%) 2.629163e-55
2 Anxiety - None 32/66 (48.48%) 102/174 (58.62%) 125/225 (55.56%) 68/184 (36.96%) 18/40 (45.00%) 28/52 (53.85%) 117/481 (24.32%) 39/129 (30.23%) 2.629163e-55
3 Anxiety - Severe 0/66 (0.00%) 3/174 (1.72%) 1/225 (0.44%) 9/184 (4.89%) 1/40 (2.50%) 1/52 (1.92%) 6/481 (1.25%) 0/129 (0.00%) 2.629163e-55
4 Anxiety - Unable to Respond 18/66 (27.27%) 16/174 (9.20%) 45/225 (20.00%) 43/184 (23.37%) 10/40 (25.00%) 12/52 (23.08%) 314/481 (65.28%) 62/129 (48.06%) 2.629163e-55

In [23]:
#Appetite
table[5:10]


Out[23]:
Unnamed: 0 Other Diagnosis N=(87) Cancer N=(206) Cardiovascular N=(256) Pulmonary N=(227) Gastrointestinal N=(56) Renal N=(65) Neurologic N=(551) Infectious N=(164) pvalue
5 Appetite - Mild 16/65 (24.62%) 58/177 (32.77%) 58/225 (25.78%) 38/179 (21.23%) 9/43 (20.93%) 13/53 (24.53%) 59/485 (12.16%) 24/129 (18.60%) 2.923193e-46
6 Appetite - Moderate 11/65 (16.92%) 51/177 (28.81%) 29/225 (12.89%) 30/179 (16.76%) 6/43 (13.95%) 9/53 (16.98%) 27/485 (5.57%) 13/129 (10.08%) 2.923193e-46
7 Appetite - None 16/65 (24.62%) 40/177 (22.60%) 84/225 (37.33%) 59/179 (32.96%) 16/43 (37.21%) 16/53 (30.19%) 84/485 (17.32%) 24/129 (18.60%) 2.923193e-46
8 Appetite - Severe 3/65 (4.62%) 16/177 (9.04%) 8/225 (3.56%) 4/179 (2.23%) 2/43 (4.65%) 1/53 (1.89%) 17/485 (3.51%) 3/129 (2.33%) 2.923193e-46
9 Appetite - Unable to Respond 19/65 (29.23%) 12/177 (6.78%) 46/225 (20.44%) 48/179 (26.82%) 10/43 (23.26%) 14/53 (26.42%) 298/485 (61.44%) 65/129 (50.39%) 2.923193e-46

In [24]:
#Constipation
table[10:15]


Out[24]:
Unnamed: 0 Other Diagnosis N=(87) Cancer N=(206) Cardiovascular N=(256) Pulmonary N=(227) Gastrointestinal N=(56) Renal N=(65) Neurologic N=(551) Infectious N=(164) pvalue
10 Constipation - Mild 14/75 (18.67%) 35/185 (18.92%) 28/229 (12.23%) 31/191 (16.23%) 3/45 (6.67%) 7/53 (13.21%) 37/493 (7.51%) 14/136 (10.29%) 3.759994e-43
11 Constipation - Moderate 5/75 (6.67%) 18/185 (9.73%) 6/229 (2.62%) 12/191 (6.28%) 2/45 (4.44%) 2/53 (3.77%) 6/493 (1.22%) 3/136 (2.21%) 3.759994e-43
12 Constipation - None 41/75 (54.67%) 116/185 (62.70%) 154/229 (67.25%) 105/191 (54.97%) 29/45 (64.44%) 37/53 (69.81%) 175/493 (35.50%) 59/136 (43.38%) 3.759994e-43
13 Constipation - Severe 2/75 (2.67%) 7/185 (3.78%) 3/229 (1.31%) 4/191 (2.09%) 2/45 (4.44%) 0/53 (0.00%) 6/493 (1.22%) 1/136 (0.74%) 3.759994e-43
14 Constipation - Unable to Respond 13/75 (17.33%) 9/185 (4.86%) 38/229 (16.59%) 39/191 (20.42%) 9/45 (20.00%) 7/53 (13.21%) 269/493 (54.56%) 59/136 (43.38%) 3.759994e-43

In [25]:
#Depression
table[15:20]


Out[25]:
Unnamed: 0 Other Diagnosis N=(87) Cancer N=(206) Cardiovascular N=(256) Pulmonary N=(227) Gastrointestinal N=(56) Renal N=(65) Neurologic N=(551) Infectious N=(164) pvalue
15 Depression - Mild 9/64 (14.06%) 35/164 (21.34%) 29/203 (14.29%) 25/170 (14.71%) 7/41 (17.07%) 8/48 (16.67%) 31/468 (6.62%) 16/123 (13.01%) 1.514900e-43
16 Depression - Moderate 4/64 (6.25%) 15/164 (9.15%) 14/203 (6.90%) 15/170 (8.82%) 2/41 (4.88%) 1/48 (2.08%) 11/468 (2.35%) 5/123 (4.07%) 1.514900e-43
17 Depression - None 31/64 (48.44%) 92/164 (56.10%) 107/203 (52.71%) 72/170 (42.35%) 17/41 (41.46%) 21/48 (43.75%) 99/468 (21.15%) 28/123 (22.76%) 1.514900e-43
18 Depression - Severe 1/64 (1.56%) 4/164 (2.44%) 2/203 (0.99%) 6/170 (3.53%) 3/41 (7.32%) 3/48 (6.25%) 4/468 (0.85%) 1/123 (0.81%) 1.514900e-43
19 Depression - Unable to Respond 19/64 (29.69%) 18/164 (10.98%) 51/203 (25.12%) 52/170 (30.59%) 12/41 (29.27%) 15/48 (31.25%) 323/468 (69.02%) 73/123 (59.35%) 1.514900e-43

In [26]:
#Drowsiness
table[20:25]


Out[26]:
Unnamed: 0 Other Diagnosis N=(87) Cancer N=(206) Cardiovascular N=(256) Pulmonary N=(227) Gastrointestinal N=(56) Renal N=(65) Neurologic N=(551) Infectious N=(164) pvalue
20 Drowsiness - Mild 11/63 (17.46%) 28/170 (16.47%) 35/219 (15.98%) 25/176 (14.20%) 9/43 (20.93%) 9/53 (16.98%) 31/476 (6.51%) 21/127 (16.54%) 1.058811e-41
21 Drowsiness - Moderate 2/63 (3.17%) 22/170 (12.94%) 14/219 (6.39%) 7/176 (3.98%) 2/43 (4.65%) 4/53 (7.55%) 15/476 (3.15%) 2/127 (1.57%) 1.058811e-41
22 Drowsiness - None 29/63 (46.03%) 102/170 (60.00%) 119/219 (54.34%) 91/176 (51.70%) 20/43 (46.51%) 25/53 (47.17%) 125/476 (26.26%) 34/127 (26.77%) 1.058811e-41
23 Drowsiness - Severe 3/63 (4.76%) 5/170 (2.94%) 3/219 (1.37%) 5/176 (2.84%) 2/43 (4.65%) 2/53 (3.77%) 7/476 (1.47%) 1/127 (0.79%) 1.058811e-41
24 Drowsiness - Unable to Respond 18/63 (28.57%) 13/170 (7.65%) 48/219 (21.92%) 48/176 (27.27%) 10/43 (23.26%) 13/53 (24.53%) 298/476 (62.61%) 69/127 (54.33%) 1.058811e-41

In [27]:
#Nausea
table[25:30]


Out[27]:
Unnamed: 0 Other Diagnosis N=(87) Cancer N=(206) Cardiovascular N=(256) Pulmonary N=(227) Gastrointestinal N=(56) Renal N=(65) Neurologic N=(551) Infectious N=(164) pvalue
25 Nausea - Mild 6/75 (8.00%) 30/182 (16.48%) 12/235 (5.11%) 7/193 (3.63%) 12/47 (25.53%) 4/56 (7.14%) 10/498 (2.01%) 8/138 (5.80%) 2.719071e-47
26 Nausea - Moderate 2/75 (2.67%) 9/182 (4.95%) 2/235 (0.85%) 2/193 (1.04%) 2/47 (4.26%) 2/56 (3.57%) 4/498 (0.80%) 1/138 (0.72%) 2.719071e-47
27 Nausea - None 52/75 (69.33%) 131/182 (71.98%) 183/235 (77.87%) 141/193 (73.06%) 23/47 (48.94%) 38/56 (67.86%) 223/498 (44.78%) 67/138 (48.55%) 2.719071e-47
28 Nausea - Severe 1/75 (1.33%) 2/182 (1.10%) 1/235 (0.43%) 0/193 (0.00%) 1/47 (2.13%) 1/56 (1.79%) 1/498 (0.20%) 0/138 (0.00%) 2.719071e-47
29 Nausea - Unable to Respond 14/75 (18.67%) 10/182 (5.49%) 37/235 (15.74%) 43/193 (22.28%) 9/47 (19.15%) 11/56 (19.64%) 260/498 (52.21%) 62/138 (44.93%) 2.719071e-47

In [28]:
#Pain
table[30:35]


Out[28]:
Unnamed: 0 Other Diagnosis N=(87) Cancer N=(206) Cardiovascular N=(256) Pulmonary N=(227) Gastrointestinal N=(56) Renal N=(65) Neurologic N=(551) Infectious N=(164) pvalue
30 Pain - Mild 21/82 (25.61%) 62/195 (31.79%) 59/234 (25.21%) 37/202 (18.32%) 16/53 (30.19%) 10/56 (17.86%) 55/508 (10.83%) 35/153 (22.88%) 1.714499e-39
31 Pain - Moderate 18/82 (21.95%) 39/195 (20.00%) 20/234 (8.55%) 26/202 (12.87%) 5/53 (9.43%) 3/56 (5.36%) 26/508 (5.12%) 10/153 (6.54%) 1.714499e-39
32 Pain - None 24/82 (29.27%) 74/195 (37.95%) 111/234 (47.44%) 95/202 (47.03%) 18/53 (33.96%) 29/56 (51.79%) 176/508 (34.65%) 50/153 (32.68%) 1.714499e-39
33 Pain - Severe 8/82 (9.76%) 12/195 (6.15%) 10/234 (4.27%) 10/202 (4.95%) 5/53 (9.43%) 3/56 (5.36%) 17/508 (3.35%) 6/153 (3.92%) 1.714499e-39
34 Pain - Unable to Respond 11/82 (13.41%) 8/195 (4.10%) 34/234 (14.53%) 34/202 (16.83%) 9/53 (16.98%) 11/56 (19.64%) 234/508 (46.06%) 52/153 (33.99%) 1.714499e-39

In [29]:
#Shortness
table[35:40]


Out[29]:
Unnamed: 0 Other Diagnosis N=(87) Cancer N=(206) Cardiovascular N=(256) Pulmonary N=(227) Gastrointestinal N=(56) Renal N=(65) Neurologic N=(551) Infectious N=(164) pvalue
35 Shortness - Mild 12/80 (15.00%) 51/191 (26.70%) 81/237 (34.18%) 59/210 (28.10%) 12/52 (23.08%) 13/58 (22.41%) 34/509 (6.68%) 32/156 (20.51%) 1.801612e-85
36 Shortness - Moderate 5/80 (6.25%) 19/191 (9.95%) 26/237 (10.97%) 46/210 (21.90%) 0/52 (0.00%) 2/58 (3.45%) 9/509 (1.77%) 11/156 (7.05%) 1.801612e-85
37 Shortness - None 51/80 (63.75%) 106/191 (55.50%) 79/237 (33.33%) 38/210 (18.10%) 29/52 (55.77%) 33/58 (56.90%) 214/509 (42.04%) 49/156 (31.41%) 1.801612e-85
38 Shortness - Severe 0/80 (0.00%) 6/191 (3.14%) 15/237 (6.33%) 35/210 (16.67%) 2/52 (3.85%) 1/58 (1.72%) 2/509 (0.39%) 5/156 (3.21%) 1.801612e-85
39 Shortness - Unable to Respond 12/80 (15.00%) 9/191 (4.71%) 36/237 (15.19%) 32/210 (15.24%) 9/52 (17.31%) 9/58 (15.52%) 250/509 (49.12%) 59/156 (37.82%) 1.801612e-85

In [ ]:
#Tiredness
table[40:45]

In [61]:
#Well Being
table[45:50]


Out[61]:
Unnamed: 0 Other Diagnosis N=(87) Cancer N=(206) Cardiovascular N=(256) Pulmonary N=(227) Gastrointestinal N=(56) Renal N=(65) Neurologic N=(551) Infectious N=(164) pvalue
45 Well Being - Mild 13/46 (28.26%) 36/111 (32.43%) 50/165 (30.30%) 33/102 (32.35%) 6/29 (20.69%) 11/38 (28.95%) 26/388 (6.70%) 11/95 (11.58%) 1.434081e-39
46 Well Being - Moderate 8/46 (17.39%) 35/111 (31.53%) 26/165 (15.76%) 18/102 (17.65%) 5/29 (17.24%) 6/38 (15.79%) 22/388 (5.67%) 9/95 (9.47%) 1.434081e-39
47 Well Being - None 4/46 (8.70%) 14/111 (12.61%) 16/165 (9.70%) 3/102 (2.94%) 6/29 (20.69%) 2/38 (5.26%) 8/388 (2.06%) 8/95 (8.42%) 1.434081e-39
48 Well Being - Severe 3/46 (6.52%) 7/111 (6.31%) 6/165 (3.64%) 7/102 (6.86%) 1/29 (3.45%) 1/38 (2.63%) 7/388 (1.80%) 2/95 (2.11%) 1.434081e-39
49 Well Being - Unable to Respond 18/46 (39.13%) 19/111 (17.12%) 67/165 (40.61%) 41/102 (40.20%) 11/29 (37.93%) 18/38 (47.37%) 325/388 (83.76%) 65/95 (68.42%) 1.434081e-39

In [ ]:
# PPSScore
table[50:51]

Potential Conlusions:

In this table we present the symptoms (rows as continuous).

First Visit Association: Symptoms (Categorical) and Primary Diagnosis Adjusted for PPSScore


Analysis

We will use a discrete choice model to fit the following model:

$$log\frac{\pi_{ij}}{\pi_{ij}}=\alpha_{j}+x_{i_{PPSScore}}\beta_{j_{PPSScore}}+x_{i_{PrimayDiagnosis}}\beta_{j_{PrimaryDiagnosis}}$$

In [2]:
import pickle
models = pickle.load(open("./python_scripts/12-model_results.p", "rb"))

In [3]:
#Anxiety
print(models.get("anxiety")[0])


                          MNLogit Regression Results                          
==============================================================================
Dep. Variable:                      y   No. Observations:                 1310
Model:                        MNLogit   Df Residuals:                     1283
Method:                           MLE   Df Model:                           24
Date:                Wed, 24 Aug 2016   Pseudo R-squ.:                  0.1918
Time:                        11:09:30   Log-Likelihood:                -1276.4
converged:                       True   LL-Null:                       -1579.4
                                        LLR p-value:                1.377e-112
=============================================================================================================
    y=ESASAnxiety_4level[Moderate/Severe]       coef    std err          z      P>|z|      [95.0% Conf. Int.]
-------------------------------------------------------------------------------------------------------------
Intercept                                    -0.6945      0.617     -1.125      0.260        -1.904     0.515
PrimaryDiagnosis_miss[T.Cardiovascular]      -0.5002      0.487     -1.028      0.304        -1.454     0.454
PrimaryDiagnosis_miss[T.Gastrointestinal]     0.8179      0.667      1.226      0.220        -0.490     2.125
PrimaryDiagnosis_miss[T.Infectious]          -1.4384      0.801     -1.797      0.072        -3.008     0.131
PrimaryDiagnosis_miss[T.Neurologic]           0.3117      0.449      0.694      0.487        -0.568     1.191
PrimaryDiagnosis_miss[T.Other Diagnosis]      0.4107      0.636      0.646      0.518        -0.836     1.657
PrimaryDiagnosis_miss[T.Pulmonary]            1.0535      0.401      2.630      0.009         0.268     1.839
PrimaryDiagnosis_miss[T.Renal]                0.4034      0.693      0.582      0.560        -0.955     1.762
PPSScore_nm                                  -0.0083      0.010     -0.853      0.394        -0.027     0.011
-------------------------------------------------------------------------------------------------------------
               y=ESASAnxiety_4level[None]       coef    std err          z      P>|z|      [95.0% Conf. Int.]
-------------------------------------------------------------------------------------------------------------
Intercept                                     0.6483      0.385      1.682      0.093        -0.107     1.404
PrimaryDiagnosis_miss[T.Cardiovascular]       0.1810      0.257      0.704      0.481        -0.323     0.685
PrimaryDiagnosis_miss[T.Gastrointestinal]     0.0308      0.487      0.063      0.950        -0.923     0.984
PrimaryDiagnosis_miss[T.Infectious]          -0.4385      0.315     -1.393      0.164        -1.055     0.179
PrimaryDiagnosis_miss[T.Neurologic]           0.3847      0.275      1.400      0.162        -0.154     0.923
PrimaryDiagnosis_miss[T.Other Diagnosis]      0.3320      0.409      0.812      0.417        -0.469     1.133
PrimaryDiagnosis_miss[T.Pulmonary]           -0.2125      0.280     -0.759      0.448        -0.761     0.336
PrimaryDiagnosis_miss[T.Renal]                0.4256      0.444      0.958      0.338        -0.445     1.296
PPSScore_nm                                   0.0036      0.006      0.599      0.549        -0.008     0.015
-------------------------------------------------------------------------------------------------------------
  y=ESASAnxiety_4level[Unable to Respond]       coef    std err          z      P>|z|      [95.0% Conf. Int.]
-------------------------------------------------------------------------------------------------------------
Intercept                                     3.0441      0.487      6.252      0.000         2.090     3.998
PrimaryDiagnosis_miss[T.Cardiovascular]       0.5516      0.396      1.393      0.164        -0.224     1.328
PrimaryDiagnosis_miss[T.Gastrointestinal]     0.9567      0.643      1.487      0.137        -0.304     2.218
PrimaryDiagnosis_miss[T.Infectious]           1.3064      0.416      3.141      0.002         0.491     2.121
PrimaryDiagnosis_miss[T.Neurologic]           2.9276      0.371      7.886      0.000         2.200     3.655
PrimaryDiagnosis_miss[T.Other Diagnosis]      1.2495      0.522      2.393      0.017         0.226     2.273
PrimaryDiagnosis_miss[T.Pulmonary]            0.6685      0.408      1.639      0.101        -0.131     1.468
PrimaryDiagnosis_miss[T.Renal]                1.0910      0.569      1.917      0.055        -0.025     2.207
PPSScore_nm                                  -0.0841      0.008    -10.893      0.000        -0.099    -0.069
=============================================================================================================

Conclusions

First, note the following important items about this model:

- Here we are modeling a 3-level Anxiety score (None, Unable to Respond, and Mild/Moderate/Severe)
- For anxiety, this combination was chosen so that convergence of the model could be met
- The reference group is Mild/Moderate/Severe
- This model assumes a lack of interaction between Primary Diagnosis and PPSScore (should we test this?)

Now that this model has been built, there are several ways to extract/display information to readers and collaborators. For this first model, I will present all options, for all other models below, I will simply display a similar table as above. Please keep in mind that each model can be described in these additional ways too.
Log Odds

Using the information above, we could talk about the log odds of a Primary Diagnosis as compared to Cancer. For example consider a patient (at baseline) who has a PPSScore of 50 and a primary diagnosis of Cardiovascular, we could draw the following conclusion: $$log(\frac{\hat{\pi_{None}}}{\hat{\pi_{Mild/Moderate/Severe}}})=0.0029+0.2377*1+0.0098*50=0.7306$$
This equation says that the log odds that the Anxiey response is None (rather than Mild/Moderate/Severe) is 0.7306.

Estimating Response Probabilities

We could also use these estimates to derive probabilities about responses for a given patient (or estimate functions for types of patients. For example, consider our same pretend patient above: $$\hat{\pi_{None}}=\frac{e^{0.0029+0.2377*1+0.0098*50}}{1+e^{0.0029+0.2377*1+0.0098*50}+e^{2.7168+0.4798*1-0.0773*50}}=0.579$$ $$\hat{\pi_{Unable to Respond}}=\frac{e^{2.7168+0.4798*1-0.0773*50}}{1+e^{0.0029+0.2377*1+0.0098*50}+e^{2.7168+0.4798*1-0.0773*50}}=0.143$$ $$\hat{\pi_{Mild/Moderate/Severe}}=\frac{1}{1+e^{0.0029+0.2377*1+0.0098*50}+e^{2.7168+0.4798*1-0.0773*50}}=0.279$$

Testing an Overall Primary Diagnosis Effect

We could test the following hypothesis: $$H_{0}:\beta_{Cardiovascular}=\beta_{Gastrointestinal}=...=\beta_{Renal}$$ This would allow us to first test to see if any of the effects is different that the others.

Marginal Effects

Marginal effects can be calculated in various ways (e.g. The average of the marginal effects at each observation, or The marginal effects at the mean of each regressor, etc.), below I will display the latter.


In [ ]:
#Anxiety Marginal Effects
print(models.get("anxiety")[1])

In [ ]:
#Appetite
print(models.get("appetite")[0])

In [ ]:
#Constipation
print(models.get("constipation")[0])

In [ ]:
#Depression
print(models.get("depression")[0])

In [ ]:
#Drowsiness
print(models.get("drowsiness")[0])

In [ ]:
#Nausea
print(models.get("nausea")[0])

In [ ]:
#Pain
print(models.get("pain")[0])

In [ ]:
#Shortness
print(models.get("shortness")[0])

In [ ]:
#Tiredness
print(models.get("tiredness")[0])

In [ ]:
#Wellbeing
print(models.get("wellbeing")[0])