Death Investigation and the Reporting of Mortality due to Drugs and HIV

By: Timothy Hambidge

Data Bootcamp

Final Project: Spring 2017

Goal

Death investigation regulations exist on a state-by-state basis. Through regulation, states choose whether to investigate cause of death using a coroner or a medical examiner. Medical examiners, by practice, almost always perform an autopsy as part of death investigation, while coroners are usually not required to do so. Additionally, medical examiners are appointed while coroners may be elected, which could lead to different reporting norms if political motivations exist. The goal of this analysis is to determine if the type of death investigation has a significantly significant impact on the autopsy rate for Drug use and HIV—two potentially politically charged causes of death.

Data

The data from this project is drawn from two sources: (1) the Centers for Disease Control and Prevention mortality data, and (2) an article by National Public Radio detailing death examination practices by state.

To extract the CDC data:

Access the CDC Current Multiple causes of Death Data using the CDC Wonder data request tool
Prepare text extractions of total HIV and drug-related deaths by state from 1999-2015, including autopsy status
Copy and paste text files into excel and upload to Github for use with pandas
See further data manipulation in the code below



In [5]:

    
# Import necessary packages
import pandas as pd             # data package
import matplotlib.pyplot as plt # graphics 
import numpy as np              # numpy package
import sys                      # system module, used to get Python version 
import geopy as geo             # geographical package
import seaborn as sns           # seaborn package
import datetime as dt           # date tools, used to note current date

import plotly.plotly as py   
from plotly.graph_objs import * 
from plotly.offline import iplot, iplot_mpl     # plotting functions
import plotly.graph_objs as go                  # ditto  

import plotly.tools as tls                      # To use plotly features
tls.set_credentials_file(username='thambidge', api_key='qibq1p0ZHIpH0Ynk3QP7')

import cufflinks as cf                       # gives us df.iplot that feels like df.plot
cf.set_config_file(offline=True, offline_show_link=False)

%matplotlib inline 


print('\nPython version: ', sys.version) 
print('Pandas version: ', pd.__version__)
print('Numpy version: ', np.__version__)
print('Geopy version: ', geo.__version__)
print('Seaborn version: ', sns.__version__)
print("Today's date:", dt.date.today())
# print('Plotly version: ', plotly.__version__)









    











    



Python version:  3.6.0 |Anaconda custom (x86_64)| (default, Dec 23 2016, 13:19:00) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
Pandas version:  0.19.2
Numpy version:  1.11.3
Geopy version:  1.11.0
Seaborn version:  0.7.1
Today's date: 2017-05-10



In [6]:

    
#CDC HIV mortality data pulled, cleaned, and indexed
url1 = 'https://github.com/thambidge/Data-Bootcamp/files/990443/CDC-.HIV.xlsx' 
HIV = pd.read_excel(url1,
                    usecols=(0,2,6))
HIV.rename(columns={'Deaths': 'HIV Deaths'}, inplace=True)
TotalHIV = HIV.groupby(['State','Autopsy']).sum()



In [7]:

    
# #CDC Drug-related mortality data pulled, cleaned, and indexed
url2 = 'https://github.com/thambidge/Data-Bootcamp/files/990472/CDC.-.Drugandalcohol.xlsx' 
Drugs = pd.read_excel(url2,
                      usecols=(0,2,4,6))
Drugs = Drugs[Drugs['Drug/Alcohol Induced Causes'].str.contains('Drug-Induced Causes')]
Drugs = Drugs.set_index(['State','Autopsy'])
Drugs.rename(columns={'Deaths': 'Drug-related Deaths'}, inplace=True)



In [8]:

    
# Merge of the HIV and Drug dataframes - both multi-indexed
combo = pd.merge(TotalHIV, Drugs, how='outer', left_index=True, right_index=True)
combo.drop('Drug/Alcohol Induced Causes', axis=1, inplace=True)



In [17]:

    
#Convert total deaths to percents, by State and Autopsy
percent = combo.apply(lambda c: c / c.sum(level='State') * 100, axis=0)

Map of death examination practices by state

The map below in intended to show death examination practices by state before looking at autopsy rates later.



In [10]:

    
trace1 = Choropleth(
    z=['1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1','1'],
    autocolorscale=False,
    colorscale=[[0, 'rgb(255,255,255)'], [1, 'rgb(186,58,51)']],
    hoverinfo='text',
    locationmode='USA-states',
    locations=['NV', 'ID', 'WY', 'CO', 'ND', 'SD', 'NE', 'KS','IL','LA','SC','IN'],
    name='Coroner',
    showscale=False,
    text=['Nevada', 'Idaho', 'Wyoming', 'Colorado', 'North Dakota', 'South Dakota', 
          'Nebraska','Kansas','Illinois','Louisiana','South Carolina','Indiana'],
    zauto=False,
    zmax=1,
    zmin=0,
)
trace2 = Choropleth(
    z=['1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1','1', '1', '1', '1', '1', '1', '1', '1', '1', '1','1'],
    autocolorscale=False,
    colorscale=[[0, 'rgb(255,255,255)'], [1, 'rgb(68,94,150)']],
    hoverinfo='text',
    locationmode='USA-states',
    locations=['OR', 'UT', 'AZ', 'NM', 'TX', 'OK', 'TN', 'FL', 'NC',
               'WV', 'VA','DOC','DE','MD','ME','NH','VT', 'MA','RI','CT','AK','MI'],
    name='Medical Examiner',
    showscale=False,
    text=['Oregon', 'Utah', 'Arizona', 'New Mexico','Texas','Oklahoma', 
          'Tennessee', 'Florida', 'North Carolina', 'West Virginia', 'Virginia',
          'DC', 'Deleware','Maryland','Maine','New Hampshire','Vermont','Massachusetts',
          'Rhode Island','Connecticut','Alaska','Michigan'],
    zauto=False,
    zmax=1,
    zmin=0,
)
trace3 = Choropleth(
    z=['1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1','1', '1', '1', '1', '1', '1', '1'],
    autocolorscale=False,
    colorscale=[[0, 'rgb(255, 255, 255)'], [1, 'rgb(187, 170, 144)']],
    hoverinfo='text',
    locationmode='USA-states',
    locations=['WA', 'CA', 'MT', 'MN', 'IA', 'MO', 'AR', 'HI','WI','IL','AL','MS',
               'GA', 'KY','OH','PA','NJ','NY'],
    name='Mixed',
    showscale=False,
    text=['Washington', 'California', 'Montana', 'Minnesota','Iowa','Missouri','Arkansas', 
          'Hawaii', 'Wisconsin', 'Illinois', 'Alabama','Mississippi','Georgia','Kentucky',
          'Ohio','Pennsylvania','New Jersey','New York'],
    zauto=False,
    zmax=1,
    zmin=0,
)

data = Data([trace1, trace2, trace3])
layout = Layout(
    autosize=False,
    geo=dict(
        countrycolor='rgb(102, 102, 102)',
        countrywidth=0.1,
        lakecolor='rgb(255, 255, 255)',
        landcolor='rgba(237, 247, 138, 0.28)',
        lonaxis=dict(
            gridwidth=1.5999999999999999,
            range=[-180, -50],
            showgrid=False
        ),
        projection=dict(
            type='albers usa'
        ),
        scope='usa',
        showland=True,
        showrivers=False,
        showsubunits=True,
        subunitcolor='rgb(102, 102, 102)',
        subunitwidth=0.5
    ),
    hovermode='closest',
    images=list([
        dict(
            x=1,
            y=0.6,
            sizex=0.155,
            sizey=0.4,
            source='http://i.imgur.com/hlorTcz.png',
            xanchor='right',
            xref='paper',
            yanchor='bottom',
            yref='paper'
        )
    ]),
    showlegend=True,
    title='<b>Type of Death Investigation by State</b>',
    width= 800,
    margin = dict(
        l=0,
        r=50,
        b=100,
        t=100,
        pad=4)
)
fig = Figure(data=data, layout=layout)
py.iplot(fig, filename='Map')









    Out[10]:

Horizontal Bar Charts

The bar charts below look at autopsy rates for deaths attributed to either HIV or drug use. Both charts use the autopsy rate instead of total number of autopsies to normalize for differences in death rate across states with different populations.

You'll notice that there are very few autopsies performed for deaths attributed to HIV. This is likely because the cause of death is known before the patient dies as a result of medical care received. The variation is autopsy rate for drug use was much greater, leading to the potential for additional future research.



In [11]:

    
HIVchart = percent.unstack(level=-1).plot(y='HIV Deaths', kind='barh', stacked=True,figsize=(15,10), fontsize = (12))
HIVchart.set_title("Percent HIV Deaths with Autopsy Performed", fontsize = (16))
HIVchart.set_ylabel("State", fontsize = (14))
HIVchart.legend(["No","Unknown","Yes"],bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0., fontsize = (12))









    Out[11]:





<matplotlib.legend.Legend at 0x10dfc69b0>



In [12]:

    
Drugschart = percent.unstack(level=-1).plot(y='Drug-related Deaths', kind='barh', stacked=True,figsize=(15,10), fontsize = (12))
Drugschart.set_title("Percent Drug-related Deaths with Autopsy Performed", fontsize = (16))
Drugschart.set_ylabel("State", fontsize = (14))
Drugschart.legend(["No","Unknown","Yes"],bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0., fontsize = (12))









    Out[12]:





<matplotlib.legend.Legend at 0x111ffe630>

Catergorical scatterplot

In the scatterplot below looking at autopsy rate for Drug-related deaths, there appears to be a clearer effect from the type of examination: a denser collection of data points at high autopsy rate for those states that require a medical examiner. This prompted the OLS regression that comes next.



In [18]:

    
# Scatterplot with a categorical variable
url3 = 'https://github.com/thambidge/Data-Bootcamp/files/991580/scatter.xlsx' 
scatter = pd.read_excel(url3)

sns.stripplot(x="Type of Medical Examination", y="% Autopsy Performed", data=scatter, size=8)









    Out[18]:





<matplotlib.axes._subplots.AxesSubplot at 0x110559400>

OLS Regression

I completed an off-the-cuff OLS regression to try to quantify the association between type of death examination and autopsy rate for drug-related deaths. This was done using dummy variables in place of the categorical "type of examination" variable. While the results appear to be significant with the association between Medical Examiner and autospy rate the strongest (as expected) there is no strong theory for why states with mixed reporting standards had a lower association with autopsy rate than the states with coroners.



In [16]:

    
#OLS regression on autopsy rate
scatter_type = pd.get_dummies(scatter['Type of Medical Examination'])
scatter_new = pd.concat([scatter, scatter_type], axis=1)

import statsmodels.api as sm
    
y = scatter_new['% Autopsy Performed']
X = scatter_new[['Coroner', 'Medical Examiner','Mix']]
X = sm.add_constant(X)
model11 = sm.OLS(y, X).fit()
model11.summary()









    Out[16]:





OLS Regression Results

  Dep. Variable:     % Autopsy Performed    R-squared:             0.016


  Model:                     OLS            Adj. R-squared:       -0.025


  Method:               Least Squares       F-statistic:          0.3979


  Date:               Wed, 10 May 2017      Prob (F-statistic):    0.674 


  Time:                   23:54:54          Log-Likelihood:      -201.00


  No. Observations:            51           AIC:                   408.0


  Df Residuals:                48           BIC:                   413.8


  Df Model:                     2                                       


  Covariance Type:        nonrobust                                     




                      coef      std err       t       P>|t|   [0.025     0.975]  


  const                46.3823      1.411     32.871   0.000     43.545     49.219


  Coroner              16.0989      3.080      5.227   0.000      9.906     22.291


  Medical Examiner     16.9449      2.361      7.176   0.000     12.197     21.692


  Mix                  13.3385      2.615      5.100   0.000      8.080     18.597




  Omnibus:         8.988    Durbin-Watson:         1.434


  Prob(Omnibus):   0.011    Jarque-Bera (JB):      8.412


  Skew:           -0.957    Prob(JB):             0.0149


  Kurtosis:        3.547    Cond. No.           1.17e+16

Conclusions and Further Research

After looking at the results, it is clear that more detailed research is needed to tell the true effect of death examination type of autopsy rates and cause of death reporting. Data needs to be obtained at a more granular level and other variables that cause variation across states, such as democraphic and socioeconomic data, needs to be taken in to account. A policy implication of this work is that all states should push for accurate reporting to better understand how to improve the health of the population.

Sources

Centers for Disease Control and Prevention, National Center for Health Statistics. Underlying Cause of Death 1999-2015 on CDC WONDER Online Database, released December, 2016. Data are from the Multiple Cause of Death Files, 1999-2015, as compiled from data provided by the 57 vital statistics jurisdictions through the Vital Statistics Cooperative Program. Accessed at http://wonder.cdc.gov/ucd-icd10.html on May 9, 2017 11:48:39 AM

NPR: How Is Death Investigated in Your State?

Dep. Variable:	% Autopsy Performed	R-squared:	0.016
Model:	OLS	Adj. R-squared:	-0.025
Method:	Least Squares	F-statistic:	0.3979
Date:	Wed, 10 May 2017	Prob (F-statistic):	0.674
Time:	23:54:54	Log-Likelihood:	-201.00
No. Observations:	51	AIC:	408.0
Df Residuals:	48	BIC:	413.8
Df Model:	2
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	46.3823	1.411	32.871	0.000	43.545	49.219
Coroner	16.0989	3.080	5.227	0.000	9.906	22.291
Medical Examiner	16.9449	2.361	7.176	0.000	12.197	21.692
Mix	13.3385	2.615	5.100	0.000	8.080	18.597

Omnibus:	8.988	Durbin-Watson:	1.434
Prob(Omnibus):	0.011	Jarque-Bera (JB):	8.412
Skew:	-0.957	Prob(JB):	0.0149
Kurtosis:	3.547	Cond. No.	1.17e+16