Map crime in San Francisco

The dataset is from Kaggle San Francisco crime, with dates, DayofWeek, category, district, resolution, address, lon/lat. For this project, I focus on lon/lat and category



In [3]:

    
library(ggplot2)
library(ggmap)
library(sp)
library(maptools)
library(rgdal)
library(rgeos)
library(RColorBrewer)
library(dplyr)
options(jupyter.plot_mimetypes = 'image/png')



In [2]:

    
crime = read.csv('train.csv')
str(crime)









    



'data.frame':	878049 obs. of  9 variables:
 $ Dates     : Factor w/ 389257 levels "2003-01-06 00:01:00",..: 389257 389257 389256 389255 389255 389255 389255 389255 389254 389254 ...
 $ Category  : Factor w/ 39 levels "ARSON","ASSAULT",..: 38 22 22 17 17 17 37 37 17 17 ...
 $ Descript  : Factor w/ 879 levels "ABANDONMENT OF CHILD",..: 867 811 811 405 405 407 740 740 405 405 ...
 $ DayOfWeek : Factor w/ 7 levels "Friday","Monday",..: 7 7 7 7 7 7 7 7 7 7 ...
 $ PdDistrict: Factor w/ 10 levels "BAYVIEW","CENTRAL",..: 5 5 5 5 6 3 3 1 7 2 ...
 $ Resolution: Factor w/ 17 levels "ARREST, BOOKED",..: 1 1 1 12 12 12 12 12 12 12 ...
 $ Address   : Factor w/ 23228 levels "0 Block of  HARRISON ST",..: 19791 19791 22698 4267 1844 1506 13323 18055 11385 17659 ...
 $ X         : num  -122 -122 -122 -122 -122 ...
 $ Y         : num  37.8 37.8 37.8 37.8 37.8 ...



In [4]:

    
summary(crime$Category)









    Out[4]:





	ARSON
		1513
	ASSAULT
		76876
	BAD CHECKS
		406
	BRIBERY
		289
	BURGLARY
		36755
	DISORDERLY CONDUCT
		4320
	DRIVING UNDER THE INFLUENCE
		2268
	DRUG/NARCOTIC
		53971
	DRUNKENNESS
		4280
	EMBEZZLEMENT
		1166
	EXTORTION
		256
	FAMILY OFFENSES
		491
	FORGERY/COUNTERFEITING
		10609
	FRAUD
		16679
	GAMBLING
		146
	KIDNAPPING
		2341
	LARCENY/THEFT
		174900
	LIQUOR LAWS
		1903
	LOITERING
		1225
	MISSING PERSON
		25989
	NON-CRIMINAL
		92304
	OTHER OFFENSES
		126182
	PORNOGRAPHY/OBSCENE MAT
		22
	PROSTITUTION
		7484
	RECOVERED VEHICLE
		3138
	ROBBERY
		23000
	RUNAWAY
		1946
	SECONDARY CODES
		9985
	SEX OFFENSES FORCIBLE
		4388
	SEX OFFENSES NON FORCIBLE
		148
	STOLEN PROPERTY
		4540
	SUICIDE
		508
	SUSPICIOUS OCC
		31414
	TREA
		6
	TRESPASS
		7326
	VANDALISM
		44725
	VEHICLE THEFT
		53781
	WARRANTS
		42214
	WEAPON LAWS
		8555



In [5]:

    
#remove those with Y=90.0
crime = crime[crime$Y!=90.0,]

Use ggmap to plot crime locations

The package ggmap in R makes mapping much easier. The function get_map() can get map data from google map, openstreetmap, at specificied locations and zoom level, and style. Then use ggplot() to add layers of data on top of the map.



In [7]:

    
locations = c(left = -122.5222, 
                bottom = 37.7073, 
                right = -122.3481,
                top = 37.8381)
map_data = get_map(location=locations, zoom=12, source='osm',color='bw')



In [8]:

    
ggmap(map_data,extent='device') + 
geom_point(aes(x=X,y=Y),data=crime,alpha=0.1,color='red',size=0.1)

Map selected categories of crime

The aggregate plot of all crimes is not very informative. The function 'map_crime' can plot selected category or categories of crime, to make it easier to visualize the locations of a particular type of crime.



In [17]:

    
map_crime = function(df, categories){
	filtered = filter(df, Category %in% categories)
	plot = ggmap(map_data, extent='device') + geom_point(data=filtered, aes(x=X,y=Y,color=Category),alpha=0.1,size=0.1)
	return(plot)
}



In [18]:

    
map_crime(crime, 'ASSAULT')



In [19]:

    
map_crime(crime, c('ASSAULT','DRUG/NARCOTIC','BURGLARY'))

Plot density of crime

With density plot, it is clear that tenderloin is the hotspot for crime. Compare the three categories, assault, burglary, and drug/narcotic, assault and burglary are more spread out whereas drug/narcotic is very concentrated in the tenderloin district.



In [20]:

    
crime_subset = filter(crime, Category %in% c('ASSAULT','DRUG/NARCOTIC','BURGLARY'))
dim(crime_subset)









    Out[20]:





	167597
	9



In [21]:

    
contours <- stat_density2d(
aes(x = X, y = Y, fill = ..level.., alpha=..level..),
size = 0.1, data = crime_subset, n=200,
geom = "polygon")

ggmap(map_data, extent='device') + contours +
scale_alpha_continuous(range=c(0.1,0.5), guide='none') +
scale_fill_gradient('Crime\nDensity',low="green",high="red")



In [22]:

    
ggmap(map_data, extent='device') + contours +
scale_alpha_continuous(range=c(0.1,0.5), guide='none') +
scale_fill_gradient('Crime\nDensity',low="green",high="red") +
facet_wrap(~Category)



In [ ]: