Authors: Ryszard Madej & Katherine Kustas
Summary:
This project investigates a number of questions about the nature of crime in America in the last 20 years (1994 – 2013 available data):
1) What crimes have been most prevalent in the past twenty years?
2) Which years saw the largest drop in crime in the US?
3) What factors contributed to the decline in crime rates?
Data Sources:
The data from this project is sourced from the Federal Bureau of Investigation (FBI), the domestic intelligence and security service of the United States, which simultaneously serves as the nation's prime federal law enforcement agency. The data consists of tables providing the estimated number of offenses and the rate (per 100,000 inhabitants) of crime in the United States for 1994 through 2013, as well as the 2-, 5-, and 10-year trends for 2013 based on these estimates.
The data used in creating these tables were from all law enforcement agencies participating in the UCR Program (including those submitting less than 12 months of data).
The crime statistics for the nation include estimated offense totals (except arson) for agencies submitting less than 12 months of offense reports for each year.
Important to note is that only data provided under the legacy definition of rape are shown in this table. (Calculating rape trends with the data provided under the revised definition of rape would not be possible, as there is currently only one year of data available.)
In addition, data from the Center for Disease Control and Prevention (CDC) is also used to compare against our data from the FBI. The survey we used, the Youth Risk Behavior Surveillance System (YRBSS), monitors six types of health-risk behaviors that contribute to the leading causes of death and disability among youth and adults, including—
The survey was conducted by the CDC in conjunction with state, territorial, and local education and health agencies and tribal governments.
In [395]:
import sys # system module
import pandas as pd # data package
import matplotlib as mpl # graphics package
import matplotlib.pyplot as plt # pyplot module
import datetime as dt # date and time module
import numpy as np
# make plots show up in notebook
%matplotlib inline
In [407]:
#import data and then display each data frame
path1 = 'data/fbi_table_20years.xlsx'
df_20yr = pd.read_excel(path1,
index_col=0)
path2 = 'data/fbi_table_20years_edited.xlsx'
df_20yr_real = pd.read_excel(path2,
index_col=0)
path3 = 'data/fbi_table_20years_rates.xlsx'
df_20yr_rates = pd.read_excel(path3,
index_col=0)
path4 = 'data/CDS_Data.xlsx'
df_CDC = pd.read_excel(path4,
index_col=0)
In [19]:
df_20yr
Out[19]:
In [428]:
df_20yr_real
Out[428]:
In [112]:
df_20yr_rates
Out[112]:
In [408]:
df_CDC
Out[408]:
In [435]:
#create a line plot from crime rates data frame
fig, ax = plt.subplots()
df_20yr_rates.plot(ax=ax,
kind='line', # line plot
title='Different Crimes vs. Time\n\n',
grid = True,
ylim = (-50,3100),
marker = 'o',
use_index = True)
plt.legend(loc = 'upper right')
ax.set_title('Crime rates over time\n',fontsize = 16) #format title and axis labels
ax.set_xlabel('Year', fontsize = 14)
ax.set_ylabel('Crime Rate', fontsize = 14)
ax.set_xlim(1994, 2013) #set limits for x and y axis
ax.set_ylim(-50,3100)
fig.set_size_inches(15, 13)
Analysis:
In the above graph, we can observe a steady decline (despite a few isolated increases) in crime rates across different categories of crime from 1994 to 2013. A number of explanations have been proposed to explain the trend. Historian Neil Howe has suggested that decline might come from the entrance of millennials into the potential criminal demographic. Both will be explored in further detail later in this project.
In [ ]:
#find totals of each column in order to find which crime was most prevalent over the course of the past 20 years
murder_total = 0
rape_total = 0
robbery_total = 0
agg_ass_total = 0
burglary_total = 0
larceny_total = 0
veh_total = 0
totals_list = []
list_total = 0
#find total number of murders
for i in (df_20yr_real.index):
murder_total += df_20yr_real['Murder and\nnonnegligent \nmanslaughter'][i]
list_total += murder_total
totals_list.append(murder_total)
#find total number of rapes
for i in (df_20yr_real.index):
rape_total += df_20yr_real['Rape\n(legacy\ndefinition)2'][i]
list_total += rape_total
totals_list.append(rape_total)
#find total number of robberies
for i in (df_20yr_real.index):
robbery_total += df_20yr_real['Robbery'][i]
list_total += robbery_total
totals_list.append(robbery_total)
#find total number of assaults
for i in (df_20yr_real.index):
agg_ass_total += df_20yr_real['Aggravated \nassault'][i]
list_total += agg_ass_total
totals_list.append(agg_ass_total)
#find total number of burglaries
for i in (df_20yr_real.index):
burglary_total += df_20yr_real['Burglary'][i]
list_total += burglary_total
totals_list.append(burglary_total)
#find total number of larcenies
for i in (df_20yr_real.index):
larceny_total += df_20yr_real['Larceny-\ntheft'][i]
list_total += larceny_total
totals_list.append(larceny_total)
#find total number of vehicle thefts
for i in (df_20yr_real.index):
veh_total += df_20yr_real['Motor \nvehicle \ntheft'][i]
list_total += veh_total
totals_list.append(veh_total)
In [431]:
#plot pie chart using above data
k = ['Murder and nonnegligent manslaughter', 'Rape', 'Robbery', 'Aggravated assault', 'Burglary', \
'Larceny theft', 'Motor vehicle theft']
percent_list = []
for i in totals_list:
percent = i/list_total
percent_list.append(percent) #convert values to percentages
arr = np.array(percent_list)
percent = 100.*arr/arr.sum()
labels = ['{0} : {1:1.2f}%'.format(x,y) for x,y in zip(k, percent)]
colours = ['red','black', 'green', 'lightskyblue', 'yellow', 'purple', 'darkblue'] #style the pie chart
patches, texts = plt.pie(totals_list, colors=colours, startangle=90)
fig = plt.gcf()
fig.set_size_inches(7.5, 7.5)
plt.legend(patches, labels, loc="best", bbox_to_anchor=(1.02, 0.94), borderaxespad=0)
plt.axis('equal')
plt.title('Prevalence of Various Crimes: 1994-2013 (as percentage of total crime)\n', fontsize = 16)
plt.tight_layout()
plt.show()
Analysis:
Here we can see the relative prevalence of various types of crime in the United States. Larceny theft accounts for over 50% of the crime committed in the US over the relevant 20-year period followed by burglary and motor vehicle theft contributing about 19% and about 10%, respectively. Rape, murder, aggravated assault, and robbery each contributed about 1%, 0.14%, about 8% and around 4% as well.
In [427]:
#calculate total number of crimes per year
row_total = 0
row_total_list = []
count = 0
for i in (df_20yr_real.index):
for x in (df_20yr_real.columns):
row_total += df_20yr_real[x][i]
row_total_list.append(row_total)
row_total = 0
#calculate percent change in crimes between each year and then add to new column in data frame
percent_change_list = []
for k in range(0,len(row_total_list)):
if k > 0:
percent_change = (((row_total_list[k]/row_total_list[k-1]) - 1) * -1) * 100
if percent_change < 0:
percent_change = 0.0
percent_change_list.append(percent_change)
count+=1
else:
percent_change_list.append(0.0)
count+=1
# add the percent change column to our data frame
#df_20yr_real['Percent Change'] = percent_change_list
#del df_20yr_real['Percent Change']
In [430]:
#plot bar graph using above percent change data
fig, ax = plt.subplots()
fig.set_size_inches(16, 6.5)
df_20yr_real['Percent Change'].plot(kind='bar',
ax=ax,
legend = False,
color = ['blue','purple'],
alpha = 0.65,
rot = 0,
width = 0.9,
align = 'center')
plt.style.use('bmh')
ax.set_xlabel('Year', fontsize = 14)
ax.set_ylabel('Percent Change', fontsize = 14) #style bar graph
ax.set_title('Yearly change in total crime\n', fontsize = 16)
ax.set_ylim(0,7)
Out[430]:
Analysis:
We can see from the above bar chart that there was a substantial decrease in crime during the year 1997 and 1998, this could be attributed to a number of increasingly rigorous policing tactics around the country, Bratton’s Zero Tolerance policing in New York City for example.
In addition to stricter policing which, according to some sources was controversial and led to an increase in dissent and crime, there was a large influx of millennials into the criminal age demographic (approximately 12-24 years of age) at which they are most likely to commit or be victims of violent crime.
In [434]:
#create a line plot from CDC data frame
fig, ax = plt.subplots()
df_CDC.plot(ax=ax,
kind='line', # line plot
grid = True,
marker = 'o',
use_index = True)
plt.legend(loc = 'upper right') #format legend
ax.set_title('High schoolers partaking in risky behaviors',fontsize = 16) #format title and axis labels
ax.set_xlabel('Year', fontsize = 14)
ax.set_ylabel('Percent of Students', fontsize = 14)
fig.set_size_inches(15, 8)
Analysis:
The above line graphs show the total crime in the United States graphed against some key indicators in the CDC Youth Risk Behavior Survey. In the graph above, it can be seen that High School age youths are partaking in “risky” behaviors at increasingly lower rates over the last twenty years. We have plotted the percentage of high school youths that have ever drank a beer, rarely wore a helmet when biking, ever tried smoking a cigarette, and ever had sexual intercourse – indicators of risky behavior among teenagers.
As the percentage of US high schoolers partaking in these risky activities decreases we see a correlating decline in crime in the United States. The entrance of millennials – increasingly looked after, sheltered, and advised to not take risks – at least partially helps to explain the sharp decline in crime rates during the late 90’s that has persisted to the present day.
The influx of a less risky, milder generation has defined the criminal scene for the last two decades – encouraging more responsible behavior among our youth has to some degree resulted in a decline in crime. It is, therefore, obvious that building a safer country depends on cultivating today’s youth, ensuring that they are given opportunities and support to pursue more constructive and less risky behaviors. This will help us create a productive generation of citizens that propels the United States into a safer future.
In [ ]: