Death investigation regulations exist on a state-by-state basis. Through regulation, states choose whether to investigate cause of death using a coroner or a medical examiner. Medical examiners, by practice, almost always perform an autopsy as part of death investigation, while coroners are usually not required to do so. Additionally, medical examiners are appointed while coroners may be elected, which could lead to different reporting norms if political motivations exist. The goal of this analysis is to determine if the type of death investigation has a significantly significant impact on the autopsy rate for Drug use and HIV—two potentially politically charged causes of death.
The data from this project is drawn from two sources: (1) the Centers for Disease Control and Prevention mortality data, and (2) an article by National Public Radio detailing death examination practices by state.
To extract the CDC data:
In [5]:
# Import necessary packages
import pandas as pd # data package
import matplotlib.pyplot as plt # graphics
import numpy as np # numpy package
import sys # system module, used to get Python version
import geopy as geo # geographical package
import seaborn as sns # seaborn package
import datetime as dt # date tools, used to note current date
import plotly.plotly as py
from plotly.graph_objs import *
from plotly.offline import iplot, iplot_mpl # plotting functions
import plotly.graph_objs as go # ditto
import plotly.tools as tls # To use plotly features
tls.set_credentials_file(username='thambidge', api_key='qibq1p0ZHIpH0Ynk3QP7')
import cufflinks as cf # gives us df.iplot that feels like df.plot
cf.set_config_file(offline=True, offline_show_link=False)
%matplotlib inline
print('\nPython version: ', sys.version)
print('Pandas version: ', pd.__version__)
print('Numpy version: ', np.__version__)
print('Geopy version: ', geo.__version__)
print('Seaborn version: ', sns.__version__)
print("Today's date:", dt.date.today())
# print('Plotly version: ', plotly.__version__)
In [6]:
#CDC HIV mortality data pulled, cleaned, and indexed
url1 = 'https://github.com/thambidge/Data-Bootcamp/files/990443/CDC-.HIV.xlsx'
HIV = pd.read_excel(url1,
usecols=(0,2,6))
HIV.rename(columns={'Deaths': 'HIV Deaths'}, inplace=True)
TotalHIV = HIV.groupby(['State','Autopsy']).sum()
In [7]:
# #CDC Drug-related mortality data pulled, cleaned, and indexed
url2 = 'https://github.com/thambidge/Data-Bootcamp/files/990472/CDC.-.Drugandalcohol.xlsx'
Drugs = pd.read_excel(url2,
usecols=(0,2,4,6))
Drugs = Drugs[Drugs['Drug/Alcohol Induced Causes'].str.contains('Drug-Induced Causes')]
Drugs = Drugs.set_index(['State','Autopsy'])
Drugs.rename(columns={'Deaths': 'Drug-related Deaths'}, inplace=True)
In [8]:
# Merge of the HIV and Drug dataframes - both multi-indexed
combo = pd.merge(TotalHIV, Drugs, how='outer', left_index=True, right_index=True)
combo.drop('Drug/Alcohol Induced Causes', axis=1, inplace=True)
In [17]:
#Convert total deaths to percents, by State and Autopsy
percent = combo.apply(lambda c: c / c.sum(level='State') * 100, axis=0)
In [10]:
trace1 = Choropleth(
z=['1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1','1'],
autocolorscale=False,
colorscale=[[0, 'rgb(255,255,255)'], [1, 'rgb(186,58,51)']],
hoverinfo='text',
locationmode='USA-states',
locations=['NV', 'ID', 'WY', 'CO', 'ND', 'SD', 'NE', 'KS','IL','LA','SC','IN'],
name='Coroner',
showscale=False,
text=['Nevada', 'Idaho', 'Wyoming', 'Colorado', 'North Dakota', 'South Dakota',
'Nebraska','Kansas','Illinois','Louisiana','South Carolina','Indiana'],
zauto=False,
zmax=1,
zmin=0,
)
trace2 = Choropleth(
z=['1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1','1', '1', '1', '1', '1', '1', '1', '1', '1', '1','1'],
autocolorscale=False,
colorscale=[[0, 'rgb(255,255,255)'], [1, 'rgb(68,94,150)']],
hoverinfo='text',
locationmode='USA-states',
locations=['OR', 'UT', 'AZ', 'NM', 'TX', 'OK', 'TN', 'FL', 'NC',
'WV', 'VA','DOC','DE','MD','ME','NH','VT', 'MA','RI','CT','AK','MI'],
name='Medical Examiner',
showscale=False,
text=['Oregon', 'Utah', 'Arizona', 'New Mexico','Texas','Oklahoma',
'Tennessee', 'Florida', 'North Carolina', 'West Virginia', 'Virginia',
'DC', 'Deleware','Maryland','Maine','New Hampshire','Vermont','Massachusetts',
'Rhode Island','Connecticut','Alaska','Michigan'],
zauto=False,
zmax=1,
zmin=0,
)
trace3 = Choropleth(
z=['1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1','1', '1', '1', '1', '1', '1', '1'],
autocolorscale=False,
colorscale=[[0, 'rgb(255, 255, 255)'], [1, 'rgb(187, 170, 144)']],
hoverinfo='text',
locationmode='USA-states',
locations=['WA', 'CA', 'MT', 'MN', 'IA', 'MO', 'AR', 'HI','WI','IL','AL','MS',
'GA', 'KY','OH','PA','NJ','NY'],
name='Mixed',
showscale=False,
text=['Washington', 'California', 'Montana', 'Minnesota','Iowa','Missouri','Arkansas',
'Hawaii', 'Wisconsin', 'Illinois', 'Alabama','Mississippi','Georgia','Kentucky',
'Ohio','Pennsylvania','New Jersey','New York'],
zauto=False,
zmax=1,
zmin=0,
)
data = Data([trace1, trace2, trace3])
layout = Layout(
autosize=False,
geo=dict(
countrycolor='rgb(102, 102, 102)',
countrywidth=0.1,
lakecolor='rgb(255, 255, 255)',
landcolor='rgba(237, 247, 138, 0.28)',
lonaxis=dict(
gridwidth=1.5999999999999999,
range=[-180, -50],
showgrid=False
),
projection=dict(
type='albers usa'
),
scope='usa',
showland=True,
showrivers=False,
showsubunits=True,
subunitcolor='rgb(102, 102, 102)',
subunitwidth=0.5
),
hovermode='closest',
images=list([
dict(
x=1,
y=0.6,
sizex=0.155,
sizey=0.4,
source='http://i.imgur.com/hlorTcz.png',
xanchor='right',
xref='paper',
yanchor='bottom',
yref='paper'
)
]),
showlegend=True,
title='<b>Type of Death Investigation by State</b>',
width= 800,
margin = dict(
l=0,
r=50,
b=100,
t=100,
pad=4)
)
fig = Figure(data=data, layout=layout)
py.iplot(fig, filename='Map')
Out[10]:
The bar charts below look at autopsy rates for deaths attributed to either HIV or drug use. Both charts use the autopsy rate instead of total number of autopsies to normalize for differences in death rate across states with different populations.
You'll notice that there are very few autopsies performed for deaths attributed to HIV. This is likely because the cause of death is known before the patient dies as a result of medical care received. The variation is autopsy rate for drug use was much greater, leading to the potential for additional future research.
In [11]:
HIVchart = percent.unstack(level=-1).plot(y='HIV Deaths', kind='barh', stacked=True,figsize=(15,10), fontsize = (12))
HIVchart.set_title("Percent HIV Deaths with Autopsy Performed", fontsize = (16))
HIVchart.set_ylabel("State", fontsize = (14))
HIVchart.legend(["No","Unknown","Yes"],bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0., fontsize = (12))
Out[11]:
In [12]:
Drugschart = percent.unstack(level=-1).plot(y='Drug-related Deaths', kind='barh', stacked=True,figsize=(15,10), fontsize = (12))
Drugschart.set_title("Percent Drug-related Deaths with Autopsy Performed", fontsize = (16))
Drugschart.set_ylabel("State", fontsize = (14))
Drugschart.legend(["No","Unknown","Yes"],bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0., fontsize = (12))
Out[12]:
In the scatterplot below looking at autopsy rate for Drug-related deaths, there appears to be a clearer effect from the type of examination: a denser collection of data points at high autopsy rate for those states that require a medical examiner. This prompted the OLS regression that comes next.
In [18]:
# Scatterplot with a categorical variable
url3 = 'https://github.com/thambidge/Data-Bootcamp/files/991580/scatter.xlsx'
scatter = pd.read_excel(url3)
sns.stripplot(x="Type of Medical Examination", y="% Autopsy Performed", data=scatter, size=8)
Out[18]:
I completed an off-the-cuff OLS regression to try to quantify the association between type of death examination and autopsy rate for drug-related deaths. This was done using dummy variables in place of the categorical "type of examination" variable. While the results appear to be significant with the association between Medical Examiner and autospy rate the strongest (as expected) there is no strong theory for why states with mixed reporting standards had a lower association with autopsy rate than the states with coroners.
In [16]:
#OLS regression on autopsy rate
scatter_type = pd.get_dummies(scatter['Type of Medical Examination'])
scatter_new = pd.concat([scatter, scatter_type], axis=1)
import statsmodels.api as sm
y = scatter_new['% Autopsy Performed']
X = scatter_new[['Coroner', 'Medical Examiner','Mix']]
X = sm.add_constant(X)
model11 = sm.OLS(y, X).fit()
model11.summary()
Out[16]:
After looking at the results, it is clear that more detailed research is needed to tell the true effect of death examination type of autopsy rates and cause of death reporting. Data needs to be obtained at a more granular level and other variables that cause variation across states, such as democraphic and socioeconomic data, needs to be taken in to account. A policy implication of this work is that all states should push for accurate reporting to better understand how to improve the health of the population.
Centers for Disease Control and Prevention, National Center for Health Statistics. Underlying Cause of Death 1999-2015 on CDC WONDER Online Database, released December, 2016. Data are from the Multiple Cause of Death Files, 1999-2015, as compiled from data provided by the 57 vital statistics jurisdictions through the Vital Statistics Cooperative Program. Accessed at http://wonder.cdc.gov/ucd-icd10.html on May 9, 2017 11:48:39 AM