The dataset is from Kaggle San Francisco crime, with dates, DayofWeek, category, district, resolution, address, lon/lat. For this project, I focus on lon/lat and category
In [3]:
library(ggplot2)
library(ggmap)
library(sp)
library(maptools)
library(rgdal)
library(rgeos)
library(RColorBrewer)
library(dplyr)
options(jupyter.plot_mimetypes = 'image/png')
In [2]:
crime = read.csv('train.csv')
str(crime)
In [4]:
summary(crime$Category)
Out[4]:
In [5]:
#remove those with Y=90.0
crime = crime[crime$Y!=90.0,]
The package ggmap in R makes mapping much easier. The function get_map() can get map data from google map, openstreetmap, at specificied locations and zoom level, and style. Then use ggplot() to add layers of data on top of the map.
In [7]:
locations = c(left = -122.5222,
bottom = 37.7073,
right = -122.3481,
top = 37.8381)
map_data = get_map(location=locations, zoom=12, source='osm',color='bw')
In [8]:
ggmap(map_data,extent='device') +
geom_point(aes(x=X,y=Y),data=crime,alpha=0.1,color='red',size=0.1)
The aggregate plot of all crimes is not very informative. The function 'map_crime' can plot selected category or categories of crime, to make it easier to visualize the locations of a particular type of crime.
In [17]:
map_crime = function(df, categories){
filtered = filter(df, Category %in% categories)
plot = ggmap(map_data, extent='device') + geom_point(data=filtered, aes(x=X,y=Y,color=Category),alpha=0.1,size=0.1)
return(plot)
}
In [18]:
map_crime(crime, 'ASSAULT')
In [19]:
map_crime(crime, c('ASSAULT','DRUG/NARCOTIC','BURGLARY'))
With density plot, it is clear that tenderloin is the hotspot for crime. Compare the three categories, assault, burglary, and drug/narcotic, assault and burglary are more spread out whereas drug/narcotic is very concentrated in the tenderloin district.
In [20]:
crime_subset = filter(crime, Category %in% c('ASSAULT','DRUG/NARCOTIC','BURGLARY'))
dim(crime_subset)
Out[20]:
In [21]:
contours <- stat_density2d(
aes(x = X, y = Y, fill = ..level.., alpha=..level..),
size = 0.1, data = crime_subset, n=200,
geom = "polygon")
ggmap(map_data, extent='device') + contours +
scale_alpha_continuous(range=c(0.1,0.5), guide='none') +
scale_fill_gradient('Crime\nDensity',low="green",high="red")
In [22]:
ggmap(map_data, extent='device') + contours +
scale_alpha_continuous(range=c(0.1,0.5), guide='none') +
scale_fill_gradient('Crime\nDensity',low="green",high="red") +
facet_wrap(~Category)
In [ ]: