The debate of what makes the best criminal justice system has always been one that is riddled with controversies. One particularly interesting comparison is that of punitive justice and rehabilitative justice. While some argue that inmates should be treated in inhumane ways, as a sort of revenge for their wrongdoings, others believe that the purpose of inprisonment is to somehow "heal" the inmates to prevent them from engaging in criminal activities once they are released back into society. One way to argue whether a justice system is successful is to consider recidivism rates – the tendency of a convicted felon to reoffend.
Norway is a country that is known for their uniquely rehabilitative focused justice system, which has seen one of the lowest rates of recidivism worldwide. Kriminalomsorgen (the Norwegian Correctional Service) have since the late 20th century invested a lot of money in improving prison conditions in an effort to improve inmates' lives and hence reduce recidivism. For instance, approximately $1 million was spent on art work at the maximum-security prison Halden to "ease the psychological burdens of imprisonment," according to Time Magazine. But does it work?
In 2008, the Norwegian Storting (supreme legislature) released a white paper to reduce crime and enforce "punishment that works." The goal was that after their sentence, prisoners will live crime-less lives. The purpose of this project is to evaluate whether this has indeed had an effect on recidivism rates.
The data used for this project is retrieved entirely from Statistics Norway and from 3 different data sets:
For the purpose of uploading the data to this Jupyter Notebook, the 3 different data sets were imported to Excel and put into 3 separate sheets in the excel document. This excel document was then uploaded to a github page.
In [1]:
# Importing packages
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import sys
import datetime as dt
from pandas_datareader import wb, data as web
# plotly imports
from plotly.offline import iplot, iplot_mpl # plotting functions
import plotly.graph_objs as go # ditto
import plotly # just to print version and init notebook
import cufflinks as cf # gives us df.iplot that feels like df.plot
cf.set_config_file(offline=True, offline_show_link=False)
# for graphics
%matplotlib inline
This data will reveal whether or not the 2008 paper had a direct effect on the spending, which will be important for any potential conclusions I will be able to draw. Presumably, we will see an increase in spending in 2008 that either stays high or increases as time passes.
In [2]:
# importing from github
url = 'https://raw.githubusercontent.com/asteckmest/data-bootcamp-project/master/SSB_recidivism.xlsx'
# letting "expenses" be a data set, retrieved from the file, and setting header and index
expenses = pd.read_excel(url, header=[2], index_col=[0], sheetname='Expenses')
expenses
Out[2]:
In [3]:
# renaming the index
expenses.index.name = 'Years'
expenses
Out[3]:
In [4]:
# creating a layout to be used for the pyplot
layout = dict(width=950, height=600, # plot width/height
yaxis={"title": "Spending (million NOK)"}, # yaxis label
title="Norwegian Correctional Service Expenditure", # title
xaxis={"title": "Years"} # xaxis label
)
In [5]:
# looking only at the 3 most significant variables for the purpose of this research topic
expenses.T[['Operating expenditure (NOK million)',
'Own production (NOK million)',
'Total expenditure (NOK million)']].iplot(layout=layout)
From this data we see that expenditure has increased since 2005, and it is clear that the white paper had a direct effect in 2008. This can be seen by the sudden sharp increase in total spending, operating expenses and own production. This means that we can expect there to be direct effects on recidivism as well.
In [6]:
# adding a vertical line for the year 2008
expenses.T[['Operating expenditure (NOK million)',
'Own production (NOK million)',
'Total expenditure (NOK million)']].iplot(hspan={'x0':'2008','x1':'2009','color':'rgba(30,30,30,0.3)','fill':True,'opacity':.4})
This dataframe is larger and slightly more complex because there are 3 variables, namely the number of years that have passed since the "base" year (year that prisoner was released from first sentence), the sex of the offender and their age. It will be interesting to see how these variables changed over time, especially after 2008.
In [7]:
# retrieving the data set, setting a multi-index for Base Year, Sex and Age, choosing header as the years
years = pd.read_excel(url, header=[2], index_col=[0,1,2], sheetname='By_Years_Sex_Age')
years
Out[7]:
In [8]:
# renaming the index
years.index.names = ['Base Year', 'Sex', 'Age']
# show the first 5 rows
years.head()
Out[8]:
In [9]:
# dropping the rows with NaN and disclaimer information in index
years = years.loc[~(years.index.labels[0] == -1)]
years = years.drop('Persons charged only covers charged persons with a valid Norwegian personal identification number, and at least one stipulated time of crime in the base year for the principal offence. A charged person is considered to have recidivism if they have a new charge brought against them in the subsequent five-year period, and if at least one of the principal offences is committed after the principal offence in the base year.', level='Base Year')
years = years.drop('Unknown sex', level = 'Sex')
years = years.drop('Unknown age', level = 'Age')
years
Out[9]:
Let's first consider, across all ages and sexes, how recidivism changes with years after release of the first sentence. For example, it will be interesting to see whether recidivism is highest after 1 year, implying that the released criminals were not "rehabilitated" during their prison time, or whether it is highest after 3 or 4 years, which would then imply that the issue lies with integrating back into society.
In [10]:
# retrieving desired data from 'years' by selecting the total values for all sexes and ages
years_total = years.iloc[years.index.get_level_values('Age') == 'Total']
years_total = years_total.iloc[years_total.index.get_level_values('Sex') == 'Total']
years_total
Out[10]:
Comment: This dataframe shows that there were 69,617 people who were released in 2002, and of those people 34,545 were arrested again within 5 years and 35,072 were not. This gives a 5-year recidivism rate of approximately 50% for the year 2002. It will be more interesting to convert this table into percent values before plotting it, so let's do that.
In [11]:
# group by Base Year (level 0), then apply a function that, for every column,
# represents the value of each row as a percentage of the first row
years_total = years_total.apply(lambda x: x.div(x.iloc[0]).mul(100))
years_total
Out[11]:
In [12]:
# dropping "all years" for the plot
years_total = years_total.drop('All years (0-5)', level = 'Base Year')
years_total
Out[12]:
In [13]:
layout_total = dict(width=950, height=600, # plot width/height
yaxis={"title": "Recidivism (as % of released offenders)"}, # yaxis label
title="Recidivism Rates Across Years 0-5", # title
xaxis={"title": "Years"} # xaxis label
)
In [14]:
years_total.T.iplot(layout=layout_total)
First of all, it is not surprising that the recidivism rates decrease as years from the release go by. This shows that it is the period immediately after being released from prison (especially the first year, which has a significantly greater rate than the rest) that is the most challenging for past offenders. This, in turn, implies that efforts should be spent on integrating offenders back into society.
There are a couple of interesting things to note here. We see that the 5-year recidivism rate has steadily decreased from 49.6% in 2002 to 44.5% in 2010, which in turn means that the percentage of people released that go back to society and do not get convicted again after 5 years has increased from 50.4% to 55.5%. That said, although there is a slight decrease in recidivism rates between 2008 and 2010, it does not appear as if the 2008 white paper has had a significant effect yet. Of course, 2 years is a short period to be expecting to see great results.
In [15]:
years_total.T.iplot(vline=['2008'], hspan={'y0':'44.473','y1':'55.53','color':'rgba(30,30,30,0.3)','fill':True,'opacity':.4})
In [16]:
years_women = years.iloc[years.index.get_level_values('Sex') == 'Females']
years_women
Out[16]:
In [17]:
# looking at recidivism across years for females
years_women = years_women.iloc[years_women.index.get_level_values('Age') == 'Total']
years_women
Out[17]:
In [18]:
# group by Base Year (level 0), then apply a function that, for every column,
# represents the value of each row as a percentage of the first row
years_women = years_women.apply(lambda x: x.div(x.iloc[0]).mul(100))
years_women
Out[18]:
In [19]:
years_women = years_women.drop('All years (0-5)', level = 'Base Year')
years_women
Out[19]:
In [20]:
# And now to the men
years_men = years.iloc[years.index.get_level_values('Sex') == 'Males']
years_men
Out[20]:
In [21]:
# looking at recidivism across years for males
years_men = years_men.iloc[years_men.index.get_level_values('Age') == 'Total']
years_men
Out[21]:
In [22]:
years_men = years_men.apply(lambda x: x.div(x.iloc[0]).mul(100))
years_men
Out[22]:
In [23]:
# repeating cleaning
years_men = years_men.drop('All years (0-5)', level='Base Year')
years_men
Out[23]:
In [38]:
# Plotting the data together to compare
fig, ax = plt.subplots(1, 2, figsize=(12, 5), sharex=True)
years_women.T.plot(ax=ax[0], kind='line', legend=False)
years_men.T.plot(ax=ax[1], kind='line')
ax[0].set_title('Women with Recidivism')
ax[0].set_xlabel('Years')
ax[0].set_ylabel('Recidivism Rates (% of total released)')
ax[0].set_xticklabels(range(2002, 2011, 1), rotation='horizontal')
ax[1].set_title('Men with Recidivism')
ax[1].set_xlabel('Years')
ax[1].set_ylabel('Recidivism Rates (% of total released)')
ax[1].legend(bbox_to_anchor=(1, 1), loc='best', ncol=1)
Out[38]:
From this data we note some striking things. First of all, in both cases total recidivism rates have decreased - from 34% to 30% for women and from 53% to 48% for men. The vast difference between female and male recidivism rates is also interesting - a difference of almost 20%.
In [86]:
years_30 = years.iloc[years.index.get_level_values('Age') == '30-39 years']
years_30
Out[86]:
In [65]:
years_18 = years_18.apply(lambda x: x.div(x.iloc[0]).mul(100))
years_18
Out[65]:
In [87]:
# dropping both sexes, because it is equal to total
years_30 = years_30.drop('Both sexes', level = 'Sex')
years_30 = years_30.drop('Total', level = 'Sex')
years_30
Out[87]:
In [92]:
# plotting the data
layout_30 = dict(width=950, height=600, # plot width/height
yaxis={"title": "Recidivism for 30-39"}, # yaxis label
title="Recidivism Rates for Ages 30-39", # title
xaxis={"title": "Years"} # xaxis label
)
years_30.T.iplot(layout=layout_30)
From this data we actually see a decrease across all years after 2007, though it seems unlikely that this is due to the white paper.
This data set is interesting because it gives a view into what groups of offenders have higher recidivism, and also provides some insight into whether people who were arrested for one type of offence are more likely to be arrested for the same type or for a more or less severe crime. One would perhaps assume that an investments in rehabilitation would decrease recidivism among violent crimes more than it would economic crimes, for instance.
In [40]:
offences = pd.read_excel(url, header=[2], index_col=[0,1], sheetname='By_Principal_Offense')
offences
Out[40]:
In [43]:
# renaming the index
offences.index.names = ['Principal Offence', 'Recidivism Offence']
# show the first 5 rows
offences.head()
Out[43]:
In [48]:
# dropping columns that have raw numbers, keeping those with percentages
offences = offences.drop(offences.columns[[0, 2, 4, 6, 8, 10, 12, 14, 16, 17]], axis=1)
offences
Out[48]:
In [51]:
# dropping NaN
offences = offences.loc[~(offences.index.labels[0] == -1)]
offences = offences.drop('Persons charged only covers charged persons with a valid Norwegian personal identification number, and at least one stipulated time of crime in the base year for the principal offence. A charged person is considered to have recidivism if they have a new charge brought against them in the subsequent five-year period, and if at least one of the principal offences is committed after the principal offence in the base year.', level='Principal Offence')
offences
Out[51]:
In [52]:
# renaming the columns
offences = offences.rename(columns={'Unnamed: 1': '2002',
'Unnamed: 3': '2003',
'Unnamed: 5': '2004',
'Unnamed: 7': '2005',
'Unnamed: 9': '2006',
'Unnamed: 11': '2007',
'Unnamed: 13': '2008',
'Unnamed: 15': '2009'})
offences
Out[52]:
In [53]:
offences_total = offences.iloc[offences.index.get_level_values('Principal Offence') == 'All groups of offences']
offences_total
Out[53]:
In [54]:
# dropping total
offences_total = offences_total.drop('Total', level='Recidivism Offence')
offences_total
Out[54]:
In [85]:
# plotting the data
layout_offences = dict(width=850, height=600, # plot width/height
yaxis={"title": "Recidivism for groups"}, # yaxis label
title="Recidivism Rates for Types of Offences", # title
xaxis={"title": "Years"} # xaxis label
)
offences_total.T.iplot(layout=layout_offences)
In [89]:
offences_total.T.iplot(legend=False)
From the three datasets explored above, we can note the following key things: