Map crime in San Francisco

The dataset is from Kaggle San Francisco crime, with dates, DayofWeek, category, district, resolution, address, lon/lat. For this project, I focus on lon/lat and category


In [3]:
library(ggplot2)
library(ggmap)
library(sp)
library(maptools)
library(rgdal)
library(rgeos)
library(RColorBrewer)
library(dplyr)
options(jupyter.plot_mimetypes = 'image/png')

In [2]:
crime = read.csv('train.csv')
str(crime)


'data.frame':	878049 obs. of  9 variables:
 $ Dates     : Factor w/ 389257 levels "2003-01-06 00:01:00",..: 389257 389257 389256 389255 389255 389255 389255 389255 389254 389254 ...
 $ Category  : Factor w/ 39 levels "ARSON","ASSAULT",..: 38 22 22 17 17 17 37 37 17 17 ...
 $ Descript  : Factor w/ 879 levels "ABANDONMENT OF CHILD",..: 867 811 811 405 405 407 740 740 405 405 ...
 $ DayOfWeek : Factor w/ 7 levels "Friday","Monday",..: 7 7 7 7 7 7 7 7 7 7 ...
 $ PdDistrict: Factor w/ 10 levels "BAYVIEW","CENTRAL",..: 5 5 5 5 6 3 3 1 7 2 ...
 $ Resolution: Factor w/ 17 levels "ARREST, BOOKED",..: 1 1 1 12 12 12 12 12 12 12 ...
 $ Address   : Factor w/ 23228 levels "0 Block of  HARRISON ST",..: 19791 19791 22698 4267 1844 1506 13323 18055 11385 17659 ...
 $ X         : num  -122 -122 -122 -122 -122 ...
 $ Y         : num  37.8 37.8 37.8 37.8 37.8 ...

In [4]:
summary(crime$Category)


Out[4]:
ARSON
1513
ASSAULT
76876
BAD CHECKS
406
BRIBERY
289
BURGLARY
36755
DISORDERLY CONDUCT
4320
DRIVING UNDER THE INFLUENCE
2268
DRUG/NARCOTIC
53971
DRUNKENNESS
4280
EMBEZZLEMENT
1166
EXTORTION
256
FAMILY OFFENSES
491
FORGERY/COUNTERFEITING
10609
FRAUD
16679
GAMBLING
146
KIDNAPPING
2341
LARCENY/THEFT
174900
LIQUOR LAWS
1903
LOITERING
1225
MISSING PERSON
25989
NON-CRIMINAL
92304
OTHER OFFENSES
126182
PORNOGRAPHY/OBSCENE MAT
22
PROSTITUTION
7484
RECOVERED VEHICLE
3138
ROBBERY
23000
RUNAWAY
1946
SECONDARY CODES
9985
SEX OFFENSES FORCIBLE
4388
SEX OFFENSES NON FORCIBLE
148
STOLEN PROPERTY
4540
SUICIDE
508
SUSPICIOUS OCC
31414
TREA
6
TRESPASS
7326
VANDALISM
44725
VEHICLE THEFT
53781
WARRANTS
42214
WEAPON LAWS
8555

In [5]:
#remove those with Y=90.0
crime = crime[crime$Y!=90.0,]

Use ggmap to plot crime locations

The package ggmap in R makes mapping much easier. The function get_map() can get map data from google map, openstreetmap, at specificied locations and zoom level, and style. Then use ggplot() to add layers of data on top of the map.


In [7]:
locations = c(left = -122.5222, 
                bottom = 37.7073, 
                right = -122.3481,
                top = 37.8381)
map_data = get_map(location=locations, zoom=12, source='osm',color='bw')

In [8]:
ggmap(map_data,extent='device') + 
geom_point(aes(x=X,y=Y),data=crime,alpha=0.1,color='red',size=0.1)


Map selected categories of crime

The aggregate plot of all crimes is not very informative. The function 'map_crime' can plot selected category or categories of crime, to make it easier to visualize the locations of a particular type of crime.


In [17]:
map_crime = function(df, categories){
	filtered = filter(df, Category %in% categories)
	plot = ggmap(map_data, extent='device') + geom_point(data=filtered, aes(x=X,y=Y,color=Category),alpha=0.1,size=0.1)
	return(plot)
}

In [18]:
map_crime(crime, 'ASSAULT')



In [19]:
map_crime(crime, c('ASSAULT','DRUG/NARCOTIC','BURGLARY'))


Plot density of crime

With density plot, it is clear that tenderloin is the hotspot for crime. Compare the three categories, assault, burglary, and drug/narcotic, assault and burglary are more spread out whereas drug/narcotic is very concentrated in the tenderloin district.


In [20]:
crime_subset = filter(crime, Category %in% c('ASSAULT','DRUG/NARCOTIC','BURGLARY'))
dim(crime_subset)


Out[20]:
  1. 167597
  2. 9

In [21]:
contours <- stat_density2d(
aes(x = X, y = Y, fill = ..level.., alpha=..level..),
size = 0.1, data = crime_subset, n=200,
geom = "polygon")

ggmap(map_data, extent='device') + contours +
scale_alpha_continuous(range=c(0.1,0.5), guide='none') +
scale_fill_gradient('Crime\nDensity',low="green",high="red")



In [22]:
ggmap(map_data, extent='device') + contours +
scale_alpha_continuous(range=c(0.1,0.5), guide='none') +
scale_fill_gradient('Crime\nDensity',low="green",high="red") +
facet_wrap(~Category)



In [ ]: