Can we attribute a fall in violent crime in Chicago to its new conceal and carry laws? This was the argument many conservative blogs and news outlets were making in early April after statistics were released showing a marked drop in the murder rate in Chicago. These articles attributed "Chicago's first-quarter murder total [hitting] its lowest number since 1958" to the deterrent effects of Chicago's concealed and carry permits being issued in late February (RedState). Other conservative outlets latched onto the news to argue the policy "is partly responsible for Chicago's across-the-board drop in the crime" (TheBlaze) or that the policy contributed to the "murder rate promptly [falling] to 1958 levels" (TownHall).
Several articles hedged about the causal direction of any relationship and pointed out that this change is hard to separate from falling general crime rates as well as the atrocious winter weather this season (PJMedia, Wonkette, HuffPo) The central claim here is whether the adoption of the conceal and carry policy in March 2014 contributed to significant changes in crime rates rather than other social, historical, or environmental factors.
However, an April 7 feature story by David Bernstein and Noah Isackson in Chicago magazine found substantial evidence of violent crimes like homicides, robberies, burglaries, and assaults being reclassified, downgraded to more minor crimes, and even closed as noncriminal incidents. They argue that after Police Superintendent Garry McCarthy arrived in May 2011, the drop in crime has improbably plummeted in spite of high unemployment and significant contraction in the Chicago Police Department's beat cops. An audit by Chicago's inspector general into these crime numbers suggests assaults and batteries may have been underreported by more than 24%. This raises a second question: can we attribute the fall in violent crime in Chicago to systematic underreporting of criminal statistics?
In this post, I do four things:
In [381]:
Image(filename='homicides_month_hour_heatmap.png')
Out[381]:
In [382]:
Image(filename='personal_model_comparison.png')
Out[382]:
In [383]:
Image(filename='2014_homicide_predictions.png')
Out[383]:
In [384]:
Image(filename='personal_crime_rates_down.png')
Out[384]:
In [271]:
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.sandbox.regression.predstd import wls_prediction_std
import seaborn as sb
from collections import Counter
from IPython.core.display import Image
import scipy.stats as stats
This will scrape historical weather data from Weather Underground between 2001 and 2014 and save each year's data as a CSV files.
In [ ]:
for i in range(2001,2015):
with open('{0}.csv'.format(str(i)),'wb') as f:
data = urllib2.urlopen('http://www.wunderground.com/history/airport/ORD/{0}/1/1/CustomHistory.html?dayend=1&monthend=1&yearend={1}&format=1'.format(str(i),str(i+1))).read()
f.write(data)
Next we want to combine each year's CSV file into a master CSV file for all years. We do this by loading each CSV and saving it into the weather
list of DataFrames and then using pd.concat
to combine them all together. Then we save the resulting data as weather.csv
to load up later.
In [ ]:
weather = list()
for i in range(2001,2015):
weather.append(pd.read_csv('{0}.csv'.format(str(i)),skiprows=1))
weather_df = pd.concat(weather)
weather_df.set_index('CST',inplace=True)
weather_df.index = pd.to_datetime(weather_df.index,unit='D')
weather_df.to_csv('weather.csv')
Read in weather.csv
and clean up the data so that it's indexed by a proper DateTimeIndex
and junk in the rainfall data is removed (in case we wanted to use it later).
In [2]:
weather_df = pd.read_csv('weather.csv',low_memory=False,index_col=0)
weather_df.index = pd.to_datetime(weather_df.index,unit='D')
weather_df['PrecipitationIn'] = weather_df['PrecipitationIn'].replace('T',np.nan)
weather_df['PrecipitationIn'] = weather_df['PrecipitationIn'].dropna().astype(float)
weather_df.tail()
Out[2]:
List out all the different variables in the weather data.
In [3]:
weather_df.columns
Out[3]:
In [4]:
crime_df = pd.read_csv('all_crime.csv')
crime_df['Datetime'] = pd.to_datetime(crime_df['Date'],format="%m/%d/%Y %I:%M:%S %p")
crime_df['Date'] = crime_df['Datetime'].apply(lambda x:x.date())
crime_df['Weekday'] = crime_df['Datetime'].apply(lambda x:x.weekday())
crime_df['Hour'] = crime_df['Datetime'].apply(lambda x:x.hour)
crime_df['Day'] = crime_df['Datetime'].apply(lambda x:x.day)
crime_df['Week'] = crime_df['Datetime'].apply(lambda x:x.week)
crime_df['Month'] = crime_df['Datetime'].apply(lambda x:x.month)
crime_df.head()
Out[4]:
In [5]:
dict(Counter(crime_df['Primary Type']))
Out[5]:
In [10]:
personal_crimes = ['ASSAULT','BATTERY','CRIM SEXUAL ASSAULT','HOMICIDE']
property_crimes = ['ARSON','BURGLARY','MOTOR VEHICLE THEFT','ROBBERY','THEFT']
Join the temperature and crime data together based on their sharing a common DateTimeIndex
.
In [19]:
arson_gb = crime_df[crime_df['Primary Type'] == 'ARSON'].groupby('Date')['ID'].agg(len)
assault_gb = crime_df[crime_df['Primary Type'] == 'ASSAULT'].groupby('Date')['ID'].agg(len)
battery_gb = crime_df[crime_df['Primary Type'] == 'BATTERY'].groupby('Date')['ID'].agg(len)
burglary_gb = crime_df[crime_df['Primary Type'] == 'BURGLARY'].groupby('Date')['ID'].agg(len)
homicide_gb = crime_df[crime_df['Primary Type'] == 'HOMICIDE'].groupby('Date')['ID'].agg(len)
sexual_assault_gb = crime_df[crime_df['Primary Type'] == 'CRIM SEXUAL ASSAULT'].groupby('Date')['ID'].agg(len)
robbery_gb = crime_df[crime_df['Primary Type'] == 'ROBBERY'].groupby('Date')['ID'].agg(len)
theft_gb = crime_df[crime_df['Primary Type'] == 'THEFT'].groupby('Date')['ID'].agg(len)
vehicle_theft_gb = crime_df[crime_df['Primary Type'] == 'MOTOR VEHICLE THEFT'].groupby('Date')['ID'].agg(len)
personal_gb = crime_df[crime_df['Primary Type'].isin(personal_crimes)].groupby('Date')['ID'].agg(len)
property_gb = crime_df[crime_df['Primary Type'].isin(property_crimes)].groupby('Date')['ID'].agg(len)
arson_gb.index = pd.to_datetime(arson_gb.index,unit='D')
assault_gb.index = pd.to_datetime(assault_gb.index,unit='D')
battery_gb.index = pd.to_datetime(battery_gb.index,unit='D')
burglary_gb.index = pd.to_datetime(burglary_gb.index,unit='D')
homicide_gb.index = pd.to_datetime(homicide_gb.index,unit='D')
sexual_assault_gb.index = pd.to_datetime(sexual_assault_gb.index,unit='D')
robbery_gb.index = pd.to_datetime(robbery_gb.index,unit='D')
theft_gb.index = pd.to_datetime(theft_gb.index,unit='D')
vehicle_theft_gb.index = pd.to_datetime(vehicle_theft_gb.index,unit='D')
personal_gb.index = pd.to_datetime(personal_gb.index,unit='D')
property_gb.index = pd.to_datetime(property_gb.index,unit='D')
ts = pd.DataFrame({'Arson':arson_gb.ix[:'2014-3-31'],
'Assault':assault_gb.ix[:'2014-3-31'],
'Battery':battery_gb.ix[:'2014-3-31'],
'Burglary':burglary_gb.ix[:'2014-3-31'],
'Homicide':homicide_gb.ix[:'2014-3-31'],
'Sexual_assault':sexual_assault_gb.ix[:'2014-3-31'],
'Robbery':robbery_gb.ix[:'2014-3-31'],
'Vehicle_theft':vehicle_theft_gb.ix[:'2014-3-31'],
'Theft':theft_gb.ix[:'2014-3-31'],
'Personal':personal_gb.ix[:'2014-3-31'],
'Property':property_gb.ix[:'2014-3-31'],
'Temperature':weather_df['Mean TemperatureF'].ix[:'2014-3-31'],
'Binned temperature':weather_df['Mean TemperatureF'].ix[:'2014-3-31']//10.*10,
'Humidity':weather_df[' Mean Humidity'].ix[:'2014-3-31'],
'Precipitation':weather_df['PrecipitationIn'].ix[:'2014-3-31']
})
ts['Time'] = range((max(ts.index)-min(ts.index)).days+1)
ts.reset_index(inplace=True)
ts.set_index('index',drop=False,inplace=True)
ts['Weekday'] = ts['index'].apply(lambda x:x.weekday())
ts['Hour'] = ts['index'].apply(lambda x:x.hour)
ts['Week'] = ts['index'].apply(lambda x:x.week)
ts['Month'] = ts['index'].apply(lambda x:x.month)
ts['Year'] = ts['index'].apply(lambda x:x.year)
ts['Weekend'] = ts['Weekday'].isin([5,6]).astype(int)
Define a helper function that we'll be using later on.
In [20]:
# adapted from http://matplotlib.org/examples/api/barchart_demo.html
def autolabel(rects):
max_height = max([rect.get_height() for rect in rects if hasattr(rect,'get_height') and not np.isnan(rect.get_height())])
for rect in rects:
if hasattr(rect,'get_height'):
height = rect.get_height()
if not np.isnan(height):
ax.text(rect.get_x()+rect.get_width()/2., height-.05*max_height, '%d'%int(height),
ha='center', va='bottom',color='w')
Plot the occurrence of crimes over time. There are several apparent features:
Many crimes have very strong annual seasonality: rates are higher in the summer months and lower in the winter months.
There's a decreasing trend in many types of crimes over time.
I've marked March 1, 2014 with a vertical black dotted line to indicate when the gun policy went into effect.
In [25]:
figsize(12,6)
ts2 = pd.DataFrame({'Arson':arson_gb/float(arson_gb.max()),
'Assault':assault_gb/float(assault_gb.max()),
'Battery':battery_gb/float(battery_gb.max()),
'Burglary':burglary_gb/float(burglary_gb.max()),
'Homicide':homicide_gb/float(homicide_gb.max()),
'Sexual assault':sexual_assault_gb/float(sexual_assault_gb.max()),
'Robbery':robbery_gb/float(robbery_gb.max()),
'Theft':theft_gb/float(theft_gb.max()),})
ax = ts2.resample('M').plot(lw=4,alpha=.75,colormap='hsv')
ax.set_ylabel('Incidents (normalized)',fontsize=18)
#ax.right_ax.set_ylabel('Temperature (F)',fontsize=18)
ax.set_xlabel('Time',fontsize=18)
ax.set_ylim((0,1))
#ax.set_yscale('log')
ax.grid(False,which='minor')
plt.axvline('2014-3-1',c='k',ls='--')
ax.legend(loc='upper center',ncol=4)
Out[25]:
We can plot the strength of the correlations between these crime statistics and some other variables as well. We add in Temperature
,Humidity
, and Precipitation
. Redder colors are stronger correlations, bluer colors are weaker or negative correlations. The strongest correlation we find (darkest red) is between Battery and Assault: when batteries are high assaults are also high. The weakest correlation we find is between Humidity and Temperature: when temperature is high, humidity is low. Note that I've set the diagonal to zero (Arson is perfectly correlated with Arson).
In [28]:
figsize(8,6)
#ts.corr().columns
a = np.array(ts[['Arson','Burglary','Robbery','Theft','Assault','Battery','Homicide','Sexual_assault','Temperature','Humidity','Precipitation']].resample('M').corr())
np.fill_diagonal(a,0)
plt.pcolor(a,cmap='RdBu_r',edgecolors='k')
plt.xlim((0,11))
plt.ylim((0,11))
plt.xticks(arange(.5,11.5),['Arson','Burglary','Robbery','Theft','Assault','Battery','Homicide','Sexual assault','Temperature','Humidity','Precipitation'],rotation=90,fontsize=15)
plt.yticks(arange(.5,11.5),['Arson','Burglary','Robbery','Theft','Assault','Battery','Homicide','Sexual assault','Temperature','Humidity','Precipitation'],fontsize=15)
plt.title('Correlation between crime occurences',fontsize=20)
plt.colorbar()
plt.grid(b=True,which='major',alpha=.5)
In [29]:
figsize(12,6)
ts3 = pd.DataFrame({'Personal':personal_gb/float(personal_gb.max()),
'Property':property_gb/float(property_gb.max())
})
ax = ts3.resample('M').plot(lw=4,alpha=.75,colormap='jet')
ax.set_ylabel('Incidents (normalized)',fontsize=18)
#ax.right_ax.set_ylabel('Temperature (F)',fontsize=18)
ax.set_xlabel('Time',fontsize=18)
ax.set_ylim((0,1))
#ax.set_yscale('log')
ax.grid(False,which='minor')
ax.legend(loc='upper center',ncol=4,fontsize=18)
Out[29]:
Next we explore the correlation between temperature and crime above. The figure below plots the
In [393]:
ax = ts.boxplot(['Personal','Property'],by='Binned temperature')
ax[0].set_ylabel('Number of crimes',fontsize=18)
ax[0].set_xlabel('Temperature (F)',fontsize=18)
ax[0].set_title('Personal crimes',fontsize=15)
ax[1].set_xlabel('Temperature (F)',fontsize=18)
ax[1].set_title('Property crimes',fontsize=15)
plt.suptitle('Crime increases with temperature',fontsize=20)
#plt.xticks(plt.xticks()[0],arange(-10,100,10),fontsize=15)
Out[393]:
We can also examine the relationship between Temperature
, Humidity
, and number of crimes. First, we extract the total number of crimes for each observed combination of Temperature
and Humidity
and store this in array1
. Then we extract the total observations of Temperature
and Humidity
and store this in array2
. The latter is to normalize the former in the event there are some combinations of temperature occur frequently and result in us overcounting the crime statistics.
Moving from bottom to top we can see the effect we saw above that temperature increases the frequency of crimes (bluer is less crime, redder is more crime). Moving from left to right we see crime doesn't vary substantially as function of humidity, with the exception that very high levels of humidity (above 60%) might have higher rates of crime
In [31]:
var = 'Personal'
ct1 = ts.groupby(['Temperature','Humidity'])[var].agg(np.sum).reset_index()
array1 = np.array(pd.pivot_table(ct1,values=var,rows='Temperature',cols='Humidity').fillna(0))
ct2 = ts.groupby(['Temperature','Humidity'])[var].agg(len).reset_index()
array2 = np.array(pd.pivot_table(ct2,values=var,rows='Temperature',cols='Humidity').fillna(0))
normalized_crime = array1/array2
plt.imshow(normalized_crime,cmap='RdBu_r',origin='lower',label='Count')
#plt.legend(loc='upper right',bbox_to_anchor=(1,.5))
plt.xlabel('Humidity',fontsize=18)
plt.ylabel('Temperature',fontsize=18)
plt.title('{0} crimes by temperature and humidity'.format(var),fontsize=24)
plt.colorbar()
Out[31]: