Analysis of crimes in Phoenix, AZ

The analysis of crime datasets has become a standard practice among people learning data science. Not only these datasets are rich in terms of their features, but they also offer an opportunity to study a region with much more information when combined with other datasets. And finally, these studies can be used to make a safer community using the tools of data science.

The city of Phoenix started to publish their crime dataset from November 2015 (other datasets are also available). The dataset is a CSV file (under Neighborhood and Safetey category) which is updated daily by 11 am and includes incidents from November 1st, 2015 forward through 7 days before the posting date. The dataset used for this analysis is downloaded on 6 Feb 2017. In this analysis, I try to break down the crimes into different categroies and study their daily, monthly and weekly trends.

Cleaning

I use the following packages in Python:

numpy
pandas
matplotlib
seaborn

I use seaborn only once to create a heatmap. If you don't have seaborn installed, the code still works without producing the heatmap.



In [1]:

    
import numpy as np
import pandas as pd

try:
    # module exists
    import seaborn as sns
    seaborn_exists = True
except ImportError:
    # module doesn't exist
    seaborn_exists = True

import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
%matplotlib inline

# custom features of plots
plt.rcParams['font.family'] = 'sans-serif'
plt.rcParams['font.serif'] = 'Helvetica Neue'
plt.rcParams['font.monospace'] = 'Helvetica Neue'
plt.rcParams['font.size'] = 12
plt.rcParams['axes.labelsize'] = 12
plt.rcParams['axes.labelweight'] = 'bold'
plt.rcParams['axes.titlesize'] = 12
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12
plt.rcParams['legend.fontsize'] = 12
plt.rcParams['figure.titlesize'] = 12



In [2]:

    
df = pd.read_csv('./data/cleaneddataset.csv')
print (df['crime'].unique())
df.head(5)









    



['MOTOR VEHICLE THEFT' 'DRUG OFFENSE' 'LARCENY-THEFT' 'RAPE' 'BURGLARY'
 'AGGRAVATED ASSAULT' 'MURDER AND NON-NEGLIGENT MANSLAUGHTER' 'ROBBERY'
 'ARSON']






    Out[2]:







  
    
      
      datetime
      date
      hour
      year
      month
      day
      weekday
      zip
      crime
      place
    
  
  
    
      0
      11/01/2015  00:00
      2015-11-01
      00:00:00
      2015
      11
      1
      Sunday
      85029.0
      MOTOR VEHICLE THEFT
      SINGLE FAMILY HOUSE
    
    
      1
      11/01/2015  00:00
      2015-11-01
      00:00:00
      2015
      11
      1
      Sunday
      85031.0
      DRUG OFFENSE
      APARTMENT
    
    
      2
      11/01/2015  00:00
      2015-11-01
      00:00:00
      2015
      11
      1
      Sunday
      85014.0
      LARCENY-THEFT
      APARTMENT
    
    
      3
      11/01/2015  00:00
      2015-11-01
      00:00:00
      2015
      11
      1
      Sunday
      85307.0
      MOTOR VEHICLE THEFT
      SINGLE FAMILY HOUSE
    
    
      4
      11/01/2015  00:00
      2015-11-01
      00:00:00
      2015
      11
      1
      Sunday
      85043.0
      MOTOR VEHICLE THEFT
      SINGLE FAMILY HOUSE



In [3]:

    
# replace long names with short names
crimemap = {
    'MOTOR VEHICLE THEFT': 'VEHICLE THEFT',
    'LARCENY-THEFT': 'LARCENY THEFT',
    'MURDER AND NON-NEGLIGENT MANSLAUGHTER' : 'MURDER',
    'AGGRAVATED ASSAULT': 'ASSAULT'
    }
df['crime'].replace(crimemap, inplace=True)

Less safe zipcodes

Let's see how many crimes have happend in each zipcode during the last 15 months. Only zipcodes with more than 50 crimes are plotted.



In [4]:

    
cutoff = 50
plt.figure(figsize=(15,8))
sd = df['zip'].value_counts(sort=True,ascending=True)
sd.index = sd.index.astype(int)
sd = sd[~(sd<cutoff)]
ax = sd.plot.bar()
ax.set_ylabel('Number of Incidents')
ax.set_xlabel('Zipcodes with more than '+str(cutoff)+' crimes')
plt.show()

Crime monthly



In [5]:

    
crime_year = pd.crosstab([df['year'],df['month']],df['crime'])

"""fig, ax = plt.subplots(nrows=1, ncols=1,figsize=(12,6))
crime_year.plot(kind='bar', stacked=False, grid=False,ax=ax)
ax.set_ylabel("number of incidents")
ax.set_xlabel("year")
ax.legend(loc = (1,0.1))
ax.set_ylim(0,3000)
plt.tight_layout()
plt.show()"""

"""ax = crime_year.plot()
ax.set_ylabel("number of incidents")
ax.set_xlabel("year")
ax.legend(loc = (1,0.1))
ax.set_ylim(0,3000)
ax.set_xticklabels(ax.get_xticklabels(),rotation=90)
plt.tight_layout()
plt.show()"""

#sns.heatmap(crime_year.T)
#plt.show()

# a set of colors to plot the bars
color_sequence = ['#1f77b4', '#ff7f0e', '#2ca02c','#d62728','#8c564b',
                  '#377eb8','#4daf4a','#984ea3','#f781bf']

# create the figure
fig, axes = plt.subplots(nrows=3, ncols=3, figsize=(12,12), sharex=True)
k=0
for i in range(0,3):
    for j in range(0,3):
        ax = axes[i,j]
        # selec kth columns
        crime_year_col = crime_year.ix[:,k]
        #plot the data with a selected color
        crime_year_col.plot(kind='bar', ax=ax, color=color_sequence[k])
        ax.legend(loc = (0,1))
        # rotate the x-axis ticks
        ax.set_xticklabels(ax.get_xticklabels(),rotation=90)
        ax.set_xlabel('')
        k+=1
plt.tight_layout()
plt.show(fig)









    



/usr/local/lib/python3.5/site-packages/ipykernel_launcher.py:35: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated



In [6]:

    
#df.time = pd.to_datetime(df['datetime'], format='%m/%d/%Y %H:%M')



In [7]:

    
#df.head(5)



In [8]:

    
df.groupby(['year','month'])['crime'].count().plot(kind='bar')
plt.show()

Weekly trends

To see weekly trends

Crime	Highest	Lowest
ARSON	Saturday (59)	Tuesday (27)
ASSAULT	Sunday (801)	Wednesday (636)
BURGLARY	Friday (2274)	Sunday (1383)
DRUG OFFENSE	Wednesday (1029)	Sunday (411)
LARCENY THEFT	Friday (5424)	Sunday (4655)
MURDER	Sunday (28)	Wednesday (15)
RAPE	Saturday (155)	Thursday (118)
ROBBERY	Wednesday (465)	Thursday (394)
VEHICLE THEFT	Friday (1221)	Thursday (1115)

While assault increase going towards the weekend, while drug offense decreases. In fact, drug offense has its peak on wednesdays.

Heatmap



In [9]:

    
crime_weekday = pd.crosstab(df['weekday'],df['crime'])
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(12,8), sharex=True)
if seaborn_exists:
    daysOfWeekList = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
    #daysOfWeekList = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
    crime_weekday=crime_weekday.reindex(daysOfWeekList)
    ax=sns.heatmap(crime_weekday.T,annot=True, fmt="d",linewidths=0.5,cmap='RdBu_r')
    ax.set_xticklabels(ax.get_xticklabels(),rotation=30)
    plt.tight_layout()
    plt.savefig('heatmap.png')
    plt.show()



In [10]:

    
fig, axes = plt.subplots(nrows=3, ncols=3,figsize=(12,12),sharex=True)

print ('| Crime | Highest | Lowest |')
print ('| --- | --- | --- |')
k=0
for i in range(0,3):
    for j in range(0,3):
        ax = axes[i,j]
        # selec kth columns
        crime_weakday_col = crime_weekday.ix[:,k]
        
        crime_name = crime_weakday_col.name
        crime_max_label,crime_max_val = crime_weakday_col.idxmax(), crime_weakday_col.max()
        crime_min_label,crime_min_val = crime_weakday_col.idxmin(), crime_weakday_col.min()
        print ('| {} | {} ({}) | {} ({}) |'.format(crime_name,crime_max_label,crime_max_val,crime_min_label,crime_min_val))
        
        crime_weakday_col.plot(kind='line',ax=ax,color='r',marker='o')
        #crime_weakday_col.plot(kind='bar',ax=ax,color='r')
        ax.legend(loc = (0,1))
        ax.set_xticklabels(ax.get_xticklabels(),rotation=60)
        ax.set_xlabel('')
        k+=1
plt.tight_layout()
plt.show(fig)









    



| Crime | Highest | Lowest |
| --- | --- | --- |
| ARSON | Saturday (128) | Wednesday (92) |
| ASSAULT | Saturday (1660) | Thursday (1366) |
| BURGLARY | Friday (4297) | Sunday (2686) |
| DRUG OFFENSE | Thursday (1868) | Sunday (963) |
| LARCENY THEFT | Friday (10747) | Sunday (9360) |
| MURDER | Saturday (62) | Wednesday (31) |






    



/usr/local/lib/python3.5/site-packages/ipykernel_launcher.py:10: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  # Remove the CWD from sys.path while we load stuff.






    



| RAPE | Friday (333) | Wednesday (237) |
| ROBBERY | Wednesday (925) | Friday (828) |
| VEHICLE THEFT | Friday (2333) | Thursday (2101) |

Month Days trend



In [11]:

    
crime_monthday = pd.crosstab(df['day'],df['crime'])

fig, axes = plt.subplots(nrows=3, ncols=3,figsize=(12,12),sharex=True)

#print ('| Crime | Highest | Lowest |')
#print ('| --- | --- | --- |')
k=0
for i in range(0,3):
    for j in range(0,3):
        ax = axes[i,j]
        # selec kth columns
        crime_monthday_col = crime_monthday.ix[:,k]
        
        '''crime_name = crime_weakday_col.name
        crime_max_label,crime_max_val = crime_weakday_col.idxmax(), crime_weakday_col.max()
        crime_min_label,crime_min_val = crime_weakday_col.idxmin(), crime_weakday_col.min()
        print ('| {} | {} ({}) | {} ({}) |'.format(crime_name,crime_max_label,crime_max_val,crime_min_label,crime_min_val))'''
        
        crime_monthday_col.plot(kind='line',ax=ax,color='r',marker='o')
        ax.legend(loc = (0,1))
        ax.set_xticklabels(ax.get_xticklabels(),rotation=0)
        ax.set_xlabel('')
        k+=1
plt.tight_layout()
plt.show(fig)









    



/usr/local/lib/python3.5/site-packages/ipykernel_launcher.py:12: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  if sys.path[0] == '':



In [12]:

    
dg = pd.crosstab(df['date'],df['crime'])
for col in dg.columns:
    print (col)
    print (dg.sort_values(by=col,ascending=False).index[0:3])









    



ARSON
Index(['2017-12-27', '2018-01-16', '2016-11-14'], dtype='object', name='date')
ASSAULT
Index(['2018-01-01', '2017-08-10', '2017-04-16'], dtype='object', name='date')
BURGLARY
Index(['2017-05-01', '2016-12-13', '2017-01-20'], dtype='object', name='date')
DRUG OFFENSE
Index(['2017-01-19', '2016-03-03', '2017-02-02'], dtype='object', name='date')
LARCENY THEFT
Index(['2017-10-01', '2015-12-19', '2016-12-14'], dtype='object', name='date')
MURDER
Index(['2017-06-29', '2016-02-23', '2017-10-06'], dtype='object', name='date')
RAPE
Index(['2016-01-01', '2016-03-01', '2017-04-01'], dtype='object', name='date')
ROBBERY
Index(['2017-04-25', '2017-07-13', '2017-10-28'], dtype='object', name='date')
VEHICLE THEFT
Index(['2017-10-13', '2016-12-26', '2016-05-06'], dtype='object', name='date')

check zipcodes , which crime more, local buisessnes. For example, does the location of bars have any correlation with car theft or rape?



In [13]:

    
daysOfWeekList = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
"""wdf = pd.crosstab(df['crime'],df['weekday'])[daysOfWeekList]
wdf.to_json('crime_weekly.json')
wdf.to_csv('crime_weekly.csv')"""









    Out[13]:





"wdf = pd.crosstab(df['crime'],df['weekday'])[daysOfWeekList]\nwdf.to_json('crime_weekly.json')\nwdf.to_csv('crime_weekly.csv')"



In [14]:

    
def save_crime(names):
    #make sure there is no white space in the filename
    for name in names:
        wdf = pd.crosstab(df['weekday'],df['crime'])[name]
        wdf = pd.DataFrame(wdf).reindex([daysOfWeekList])
        wdf.columns = ['count']
        wdf.to_csv('./crime_weekly/'+name.replace(" ", "_")+'.csv',sep=',')



In [15]:

    
save_crime(sorted(df.crime.unique())) # for all types of crimes, rem



In [16]:

    
sorted(df.crime.unique())









    Out[16]:





['ARSON',
 'ASSAULT',
 'BURGLARY',
 'DRUG OFFENSE',
 'LARCENY THEFT',
 'MURDER',
 'RAPE',
 'ROBBERY',
 'VEHICLE THEFT']



In [ ]:

	datetime	date	hour	year	month	day	weekday	zip	crime	place
0	11/01/2015 00:00	2015-11-01	00:00:00	2015	11	1	Sunday	85029.0	MOTOR VEHICLE THEFT	SINGLE FAMILY HOUSE
1	11/01/2015 00:00	2015-11-01	00:00:00	2015	11	1	Sunday	85031.0	DRUG OFFENSE	APARTMENT
2	11/01/2015 00:00	2015-11-01	00:00:00	2015	11	1	Sunday	85014.0	LARCENY-THEFT	APARTMENT
3	11/01/2015 00:00	2015-11-01	00:00:00	2015	11	1	Sunday	85307.0	MOTOR VEHICLE THEFT	SINGLE FAMILY HOUSE
4	11/01/2015 00:00	2015-11-01	00:00:00	2015	11	1	Sunday	85043.0	MOTOR VEHICLE THEFT	SINGLE FAMILY HOUSE