Urban vs. rural living and suicide rates

May 2016

Written by Kara Frantzich at NYU Stern

Contact: kara.frantzich@stern.nyu.edu

Suicide in the United States

Suicide is the 10th most common cause of death in the United States. In 2014, 42,704 people in the US committed suicide (CDC), compared to about 16,000 homicides in the same year.

This project's goal is to investigate the relationsip between urban and rural populations and suicide rates. Are cities healthier for people by providing a social network for those thay may conisder suicide? Or does rural life make it more unlikely for people to take their own lives?

Packages Imported

I use matplotlib.pyplot to plot scatter plots. I use pandas, a Python package that allows for fast data manipulation and analysis, to organize my dataset.


In [158]:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
from pandas.io import wb

The Data

Data used is from the Center for Disease Control from 2014.

The CDC has defined different levels of urbanization, outlined below:

Definitions of Urbanization Levels, by population of area

  Large Metro Center     1,000,000 or more people
  Large Metro Suburban   1,000,000 or more outside of city center
  Medium Metro         250,000-999,999
  Small Metro           Less than 250,000 
  Non-metro Center     Less than 49,999
  Non-metro Non-Core     Less than 49,999 outside of town center

I first looked at suicide rates through these pre-defined concepts of urban and rural.


In [132]:
data_1 = '/Users/karaf/Documents/Data_Bootcamp/Suicide_Rates_by_Urbanization.csv' # file location
df1 = pd.read_csv(data_1, index_col=0)

In [133]:
df1


Out[133]:
Total Suicides Total Population Suicide Rate
Large Metro Center 10546 97933810 10.8
Large Metro Suburban 9860 79063551 12.5
Medium Metro 9675 66466520 14.6
Small Metro 4638 29204061 15.9
Non-metro Center 4545 27242864 16.7
Non-metro Non-Core 3440 18946250 18.2

In [134]:
# GDP bar chart
fig, ax = plt.subplots()
df1['Suicide Rate'].plot(ax=ax, kind='barh', alpha=0.5)
ax.set_title('Suicide rates by urbanization levels', loc='left', fontsize=16)
ax.set_xlabel('Suicides per 100,000 people')
ax.set_ylabel('')

ax.spines['left'].set_position(('outward', 10))
ax.spines['bottom'].set_position(('outward', 10))
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.yaxis.set_ticks_position('left')
ax.xaxis.set_ticks_position('bottom')



In [159]:
# scatterplot of life expectancy vs gdp per capita
fig, ax = plt.subplots()
ax.scatter(df1['Total Population'], df1['Suicide Rate'],     # x,y variables
            s=df1['Total Suicides']/5,          # size of bubbles
            alpha=0.5)   
ax.set_title('Suicide rate by population size', loc='left', fontsize=16)
ax.set_xlabel('Population in 100 Millions')
ax.set_ylabel('Suicide Rate')
ax.text(20,20, 'Bubble size represents suicide totals', horizontalalignment='right')


Out[159]:
<matplotlib.text.Text at 0x23aefee3ba8>

These visualizations show that there are higher suicide rates as the the level of ruralness increases.

Next, I looked at suicide rates by county. Of the 42,704 suicdes in 2014, the CDC is able to track 35,580 by the county level (the remaining 7,124 were part of the data that was repressed in the CDC's datasets).


In [171]:
data_2 = '/Users/karaf/Documents/Data_Bootcamp/Suicide_Rates_by_County.csv' # file location
df2 = pd.read_csv(data_2, index_col=0)

In [172]:
df2


Out[172]:
County Code Deaths Population Suicide Rate per 100,000
County
Baldwin County, AL 1003 38 200111 18.989461
Calhoun County, AL 1015 15 115916 12.940405
Coffee County, AL 1031 12 50909 23.571471
Colbert County, AL 1033 11 54543 20.167574
Cullman County, AL 1043 21 81289 25.833754
DeKalb County, AL 1049 10 71065 14.071625
Elmore County, AL 1051 20 80977 24.698371
Etowah County, AL 1055 15 103531 14.488414
Houston County, AL 1069 12 104193 11.517088
Jackson County, AL 1071 14 52665 26.583120
Jefferson County, AL 1073 73 660793 11.047333
Lauderdale County, AL 1077 12 93096 12.889920
Lee County, AL 1081 20 154255 12.965544
Limestone County, AL 1083 15 90787 16.522189
Madison County, AL 1089 46 350299 13.131639
Marshall County, AL 1095 15 94636 15.850205
Mobile County, AL 1097 61 415123 14.694440
Montgomery County, AL 1101 24 226189 10.610596
Morgan County, AL 1103 24 119607 20.065715
Russell County, AL 1113 11 59608 18.453899
St. Clair County, AL 1115 18 86697 20.761964
Shelby County, AL 1117 26 206655 12.581355
Talladega County, AL 1121 15 81322 18.445193
Tuscaloosa County, AL 1125 21 202212 10.385140
Walker County, AL 1127 17 65471 25.965695
Anchorage Borough, AK 2020 58 301010 19.268463
Fairbanks North Star Borough, AK 2090 22 99357 22.142375
Kenai Peninsula Borough, AK 2122 21 57477 36.536354
Matanuska-Susitna Borough, AK 2170 22 97882 22.476043
Apache County, AZ 4001 25 71828 34.805368
... ... ... ... ...
Wood County, WV 54107 22 86237 25.511092
Brown County, WI 55009 34 256670 13.246581
Calumet County, WI 55015 10 49491 20.205694
Chippewa County, WI 55017 10 63460 15.757958
Columbia County, WI 55021 10 56615 17.663163
Dane County, WI 55025 64 516284 12.396278
Dodge County, WI 55027 13 88574 14.676993
Eau Claire County, WI 55035 14 101564 13.784412
Fond du Lac County, WI 55039 16 101759 15.723425
Jefferson County, WI 55055 12 84395 14.218852
Kenosha County, WI 55059 24 168068 14.279934
La Crosse County, WI 55063 26 118011 22.031844
Manitowoc County, WI 55071 13 80160 16.217565
Marathon County, WI 55073 26 135780 19.148623
Milwaukee County, WI 55079 94 956406 9.828462
Outagamie County, WI 55087 26 182006 14.285243
Portage County, WI 55097 12 70482 17.025624
Racine County, WI 55101 30 195163 15.371766
Rock County, WI 55105 25 161188 15.509839
St. Croix County, WI 55109 11 86759 12.678800
Sheboygan County, WI 55117 20 115290 17.347558
Walworth County, WI 55127 16 103527 15.454905
Washington County, WI 55131 12 133251 9.005561
Waukesha County, WI 55133 47 395118 11.895181
Waupaca County, WI 55135 10 52066 19.206392
Winnebago County, WI 55139 21 169511 12.388577
Fremont County, WY 56013 10 40703 24.568214
Laramie County, WY 56021 22 96389 22.824181
Natrona County, WY 56025 21 81624 25.727727
Sheridan County, WY 56033 10 30032 33.297816

898 rows × 4 columns


In [181]:
# scatterplot of life expectancy vs gdp per capita
fig, ax = plt.subplots()
ax.scatter(df2['Population'], df2['Suicide Rate per 100,000'])   
ax.set_title('Suicide rate by population size', loc='left', fontsize=16)
ax.set_xlabel('Population in 10 Millions')
ax.set_ylabel('Suicide Rate')


Out[181]:
<matplotlib.text.Text at 0x23af142bf60>

This shows that there is a much higher suicde rate in counties with lower populations.

Next, I graphed only the counties with populations of less that 300K:


In [182]:
data_3 = '/Users/karaf/Documents/Data_Bootcamp/Suicide_Rates_by_County_Below 300K.csv' # file location
df3 = pd.read_csv(data_3, index_col=0)

In [185]:
# scatterplot of life expectancy vs gdp per capita
fig, ax = plt.subplots()
ax.scatter(df3['Population'], df3['Suicide Rate per 100,000'])   
ax.set_title('Suicide rate by population size', loc='left', fontsize=16)
ax.set_xlabel('Population')
ax.set_ylabel('Suicide Rate')


Out[185]:
<matplotlib.text.Text at 0x23af1584eb8>

This correlation seems too perfect to be true. This effect is caused by the suppresion of data by the CDC. According to the CDC's website, "Data are Suppressed when the data meet the criteria for confidentiality constraints".

Conclusion

There is still a clear connection between high suicide rate and low populations in counties.

One reason for this may be that mental health issues are not treated as well in rural settings as they are in urban areas. There may also be greater economic disparity in the areas with higher suicide rates. More research would need to be completed to understand this correlation more thoroughly.

Resources

http://www.cdc.gov/nchs/data/series/sr_02/sr02_166.pdf

http://afsp.org/about-suicide/suicide-statistics/

Centers for Disease Control and Prevention, National Center for Health Statistics. Underlying Cause of Death 1999-2014 on CDC WONDER Online Database, released 2015. Data are from the Multiple Cause of Death Files, 1999-2014, as compiled from data provided by the 57 vital statistics jurisdictions through the Vital Statistics Cooperative Program. Accessed at http://wonder.cdc.gov/ucd-icd10.html on May 1, 2016 9:10:22 AM