May 2016
Written by Kara Frantzich at NYU Stern
Contact: kara.frantzich@stern.nyu.edu
Suicide is the 10th most common cause of death in the United States. In 2014, 42,704 people in the US committed suicide (CDC), compared to about 16,000 homicides in the same year.
This project's goal is to investigate the relationsip between urban and rural populations and suicide rates. Are cities healthier for people by providing a social network for those thay may conisder suicide? Or does rural life make it more unlikely for people to take their own lives?
In [158]:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
from pandas.io import wb
Data used is from the Center for Disease Control from 2014.
The CDC has defined different levels of urbanization, outlined below:
Definitions of Urbanization Levels, by population of area
Large Metro Center 1,000,000 or more people
Large Metro Suburban 1,000,000 or more outside of city center
Medium Metro 250,000-999,999
Small Metro Less than 250,000
Non-metro Center Less than 49,999
Non-metro Non-Core Less than 49,999 outside of town center
I first looked at suicide rates through these pre-defined concepts of urban and rural.
In [132]:
data_1 = '/Users/karaf/Documents/Data_Bootcamp/Suicide_Rates_by_Urbanization.csv' # file location
df1 = pd.read_csv(data_1, index_col=0)
In [133]:
df1
Out[133]:
In [134]:
# GDP bar chart
fig, ax = plt.subplots()
df1['Suicide Rate'].plot(ax=ax, kind='barh', alpha=0.5)
ax.set_title('Suicide rates by urbanization levels', loc='left', fontsize=16)
ax.set_xlabel('Suicides per 100,000 people')
ax.set_ylabel('')
ax.spines['left'].set_position(('outward', 10))
ax.spines['bottom'].set_position(('outward', 10))
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.yaxis.set_ticks_position('left')
ax.xaxis.set_ticks_position('bottom')
In [159]:
# scatterplot of life expectancy vs gdp per capita
fig, ax = plt.subplots()
ax.scatter(df1['Total Population'], df1['Suicide Rate'], # x,y variables
s=df1['Total Suicides']/5, # size of bubbles
alpha=0.5)
ax.set_title('Suicide rate by population size', loc='left', fontsize=16)
ax.set_xlabel('Population in 100 Millions')
ax.set_ylabel('Suicide Rate')
ax.text(20,20, 'Bubble size represents suicide totals', horizontalalignment='right')
Out[159]:
These visualizations show that there are higher suicide rates as the the level of ruralness increases.
Next, I looked at suicide rates by county. Of the 42,704 suicdes in 2014, the CDC is able to track 35,580 by the county level (the remaining 7,124 were part of the data that was repressed in the CDC's datasets).
In [171]:
data_2 = '/Users/karaf/Documents/Data_Bootcamp/Suicide_Rates_by_County.csv' # file location
df2 = pd.read_csv(data_2, index_col=0)
In [172]:
df2
Out[172]:
In [181]:
# scatterplot of life expectancy vs gdp per capita
fig, ax = plt.subplots()
ax.scatter(df2['Population'], df2['Suicide Rate per 100,000'])
ax.set_title('Suicide rate by population size', loc='left', fontsize=16)
ax.set_xlabel('Population in 10 Millions')
ax.set_ylabel('Suicide Rate')
Out[181]:
This shows that there is a much higher suicde rate in counties with lower populations.
Next, I graphed only the counties with populations of less that 300K:
In [182]:
data_3 = '/Users/karaf/Documents/Data_Bootcamp/Suicide_Rates_by_County_Below 300K.csv' # file location
df3 = pd.read_csv(data_3, index_col=0)
In [185]:
# scatterplot of life expectancy vs gdp per capita
fig, ax = plt.subplots()
ax.scatter(df3['Population'], df3['Suicide Rate per 100,000'])
ax.set_title('Suicide rate by population size', loc='left', fontsize=16)
ax.set_xlabel('Population')
ax.set_ylabel('Suicide Rate')
Out[185]:
This correlation seems too perfect to be true. This effect is caused by the suppresion of data by the CDC. According to the CDC's website, "Data are Suppressed when the data meet the criteria for confidentiality constraints".
There is still a clear connection between high suicide rate and low populations in counties.
One reason for this may be that mental health issues are not treated as well in rural settings as they are in urban areas. There may also be greater economic disparity in the areas with higher suicide rates. More research would need to be completed to understand this correlation more thoroughly.
http://www.cdc.gov/nchs/data/series/sr_02/sr02_166.pdf
http://afsp.org/about-suicide/suicide-statistics/
Centers for Disease Control and Prevention, National Center for Health Statistics. Underlying Cause of Death 1999-2014 on CDC WONDER Online Database, released 2015. Data are from the Multiple Cause of Death Files, 1999-2014, as compiled from data provided by the 57 vital statistics jurisdictions through the Vital Statistics Cooperative Program. Accessed at http://wonder.cdc.gov/ucd-icd10.html on May 1, 2016 9:10:22 AM