Analyzing When and Where San Francisco Criminal Arrests Occur Using R and ggplot2

by Max Woolf

This notebook is the complement to my blog posts Analyzing San Francisco Crime Data to Determine When Arrests Frequently Occur and Mapping Where Arrests Frequently Occur in San Francisco Using Crime Data.

This notebook is licensed under the MIT License. If you use the code or data visualization designs contained within this notebook, it would be greatly appreciated if proper attribution is given back to this notebook and/or myself. Thanks! :)



In [1]:

    
options(warn = -1)

# IMPORTANT: This assumes that all packages in "Rstart.R" are installed,
# and the fonts "Source Sans Pro" and "Open Sans Condensed Bold" are installed
# via extrafont. If ggplot2 charts fail to render, you may need to change/remove the theme call.

source("Rstart.R")
library(ggmap)

options(repr.plot.mimetypes = 'image/png', repr.plot.width = 4, repr.plot.height = 3, repr.plot.res = 300)

sessionInfo()









    



Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Registering fonts with R

Attaching package: ‘scales’

The following objects are masked from ‘package:readr’:

    col_factor, col_numeric

Note: the specification for S3 class “AsIs” in package ‘RJSONIO’ seems equivalent to one from package ‘jsonlite’: not turning on duplicate class definitions for this class.






    Out[1]:





R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.1 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] ggmap_2.5.2        stringr_1.0.0      digest_0.6.8       RColorBrewer_1.1-2
[5] scales_0.3.0       extrafont_0.17     ggplot2_1.0.1      dplyr_0.4.3       
[9] readr_0.1.1       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.1         plyr_1.8.3          base64enc_0.1-3    
 [4] tools_3.2.2         uuid_0.1-2          jsonlite_0.9.17    
 [7] evaluate_0.8        gtable_0.1.2        lattice_0.20-33    
[10] png_0.1-7           IRdisplay_0.3       DBI_0.3.1          
[13] mapproj_1.2-4       IRkernel_0.5        parallel_3.2.2     
[16] proto_0.3-10        rzmq_0.7.7          Rttf2pt1_1.3.3     
[19] repr_0.4            maps_3.0.0-2        RgoogleMaps_1.2.0.7
[22] R6_2.1.1            jpeg_0.1-8          RJSONIO_1.3-0      
[25] sp_1.2-0            reshape2_1.4.1      extrafontdb_1.0    
[28] magrittr_1.5        MASS_7.3-43         assertthat_0.1     
[31] colorspace_1.2-6    geosphere_1.4-3     stringi_0.5-5      
[34] munsell_0.4.2       rjson_0.2.15

Processing the Data

We load the data using readr and read_csv() since it's faster. Since there is a lot of redundant data (e.g. address, coordinates), we only load the columns we need.



In [2]:

    
path <- "~/Downloads/SFPD_Incidents_-_from_1_January_2003.csv"

df <- read_csv(path)









    



|================================================================================| 100%  360 MB



In [3]:

    
df %>% head(10)
sprintf("# of Rows in Dataframe: %s", nrow(df))
sprintf("Dataframe Size: %s", format(object.size(df), units = "MB"))









    Out[3]:





IncidntNum Category Descript DayOfWeek Date Time PdDistrict Resolution Address X Y Location PdId

	1 150996567 BURGLARY BURGLARY OF APARTMENT HOUSE, UNLAWFUL ENTRY Sunday 11/15/2015 23:58 INGLESIDE NONE 3200 Block of HARRISON ST -122.4115 37.74605 (37.7460485796086, -122.411460219918) 1.509966e+13
	2 156283485 LARCENY/THEFT PETTY THEFT OF PROPERTY Sunday 11/15/2015 23:30 BAYVIEW NONE 17TH ST / DEHARO ST -122.4016 37.76484 (37.7648403636386, -122.401600659931) 1.562835e+13
	3 150997195 VANDALISM MALICIOUS MISCHIEF, GRAFFITI Sunday 11/15/2015 23:30 PARK NONE 2500 Block of 15TH ST -122.4378 37.76603 (37.7660311509137, -122.437848713459) 1.509972e+13
	4 150996501 VANDALISM MALICIOUS MISCHIEF, VANDALISM OF VEHICLES Sunday 11/15/2015 23:15 SOUTHERN NONE 1ST ST / FOLSOM ST -122.3945 37.7873 (37.7872982355244, -122.394484874311) 1.509965e+13
	5 150996501 SUSPICIOUS OCC SUSPICIOUS OCCURRENCE Sunday 11/15/2015 23:15 SOUTHERN NONE 1ST ST / FOLSOM ST -122.3945 37.7873 (37.7872982355244, -122.394484874311) 1.509965e+13
	6 156280936 LARCENY/THEFT PETTY THEFT OF PROPERTY Sunday 11/15/2015 23:00 NORTHERN NONE 1800 Block of GEARY BL -122.432 37.78425 (37.7842501079896, -122.432035315509) 1.562809e+13
	7 150998171 VEHICLE THEFT STOLEN AUTOMOBILE Sunday 11/15/2015 23:00 TARAVAL NONE 2300 Block of 30TH AV -122.4875 37.74345 (37.7434503392393, -122.487471191928) 1.509982e+13
	8 150996777 VANDALISM MALICIOUS MISCHIEF, BREAKING WINDOWS Sunday 11/15/2015 23:00 CENTRAL ARREST, BOOKED 400 Block of STOCKTON ST -122.407 37.78992 (37.789918101686, -122.406977563692) 1.509968e+13
	9 151001038 VEHICLE THEFT STOLEN AUTOMOBILE Sunday 11/15/2015 23:00 SOUTHERN NONE HOWARD ST / 9TH ST -122.4132 37.77499 (37.7749926445385, -122.413163134276) 1.51001e+13
	10 150996498 ASSAULT AGGRAVATED ASSAULT WITH BODILY FORCE Sunday 11/15/2015 22:59 MISSION NONE 3100 Block of 16TH ST -122.4236 37.76487 (37.7648666651043, -122.423637302048) 1.509965e+13









    Out[3]:




'# of Rows in Dataframe: 1842050'






    Out[3]:




'Dataframe Size: 180.9 Mb'



In [4]:

    
columns = c("Category", "Descript", "DayOfWeek", "Date", "Time", "PdDistrict", "Resolution", "X", "Y")

# select() requires column indices, so use which() to find them
df <- df %>% select(which(names(df) %in% columns))

df %>% head(10)
sprintf("# of Rows in Dataframe: %s", nrow(df))
sprintf("Dataframe Size: %s", format(object.size(df), units = "MB"))









    Out[4]:





Category Descript DayOfWeek Date Time PdDistrict Resolution X Y

	1 BURGLARY BURGLARY OF APARTMENT HOUSE, UNLAWFUL ENTRY Sunday 11/15/2015 23:58 INGLESIDE NONE -122.4115 37.74605
	2 LARCENY/THEFT PETTY THEFT OF PROPERTY Sunday 11/15/2015 23:30 BAYVIEW NONE -122.4016 37.76484
	3 VANDALISM MALICIOUS MISCHIEF, GRAFFITI Sunday 11/15/2015 23:30 PARK NONE -122.4378 37.76603
	4 VANDALISM MALICIOUS MISCHIEF, VANDALISM OF VEHICLES Sunday 11/15/2015 23:15 SOUTHERN NONE -122.3945 37.7873
	5 SUSPICIOUS OCC SUSPICIOUS OCCURRENCE Sunday 11/15/2015 23:15 SOUTHERN NONE -122.3945 37.7873
	6 LARCENY/THEFT PETTY THEFT OF PROPERTY Sunday 11/15/2015 23:00 NORTHERN NONE -122.432 37.78425
	7 VEHICLE THEFT STOLEN AUTOMOBILE Sunday 11/15/2015 23:00 TARAVAL NONE -122.4875 37.74345
	8 VANDALISM MALICIOUS MISCHIEF, BREAKING WINDOWS Sunday 11/15/2015 23:00 CENTRAL ARREST, BOOKED -122.407 37.78992
	9 VEHICLE THEFT STOLEN AUTOMOBILE Sunday 11/15/2015 23:00 SOUTHERN NONE -122.4132 37.77499
	10 ASSAULT AGGRAVATED ASSAULT WITH BODILY FORCE Sunday 11/15/2015 22:59 MISSION NONE -122.4236 37.76487









    Out[4]:




'# of Rows in Dataframe: 1842050'






    Out[4]:




'Dataframe Size: 126.9 Mb'

The All-Caps text is ugly: let's force the text in the appropriate columns into proper case. (see this Stack Overflow question)



In [5]:

    
proper_case <- function(x) {
    return (gsub("\\b([A-Z])([A-Z]+)", "\\U\\1\\L\\2" , x, perl = TRUE))
}

df <- df %>% mutate(Category = proper_case(Category),
                 Descript = proper_case(Descript),
                 PdDistrict = proper_case(PdDistrict),
                 Resolution = proper_case(Resolution))

df %>% head(10)









    Out[5]:





Category Descript DayOfWeek Date Time PdDistrict Resolution X Y

	1 Burglary Burglary Of Apartment House, Unlawful Entry Sunday 11/15/2015 23:58 Ingleside None -122.4115 37.74605
	2 Larceny/Theft Petty Theft Of Property Sunday 11/15/2015 23:30 Bayview None -122.4016 37.76484
	3 Vandalism Malicious Mischief, Graffiti Sunday 11/15/2015 23:30 Park None -122.4378 37.76603
	4 Vandalism Malicious Mischief, Vandalism Of Vehicles Sunday 11/15/2015 23:15 Southern None -122.3945 37.7873
	5 Suspicious Occ Suspicious Occurrence Sunday 11/15/2015 23:15 Southern None -122.3945 37.7873
	6 Larceny/Theft Petty Theft Of Property Sunday 11/15/2015 23:00 Northern None -122.432 37.78425
	7 Vehicle Theft Stolen Automobile Sunday 11/15/2015 23:00 Taraval None -122.4875 37.74345
	8 Vandalism Malicious Mischief, Breaking Windows Sunday 11/15/2015 23:00 Central Arrest, Booked -122.407 37.78992
	9 Vehicle Theft Stolen Automobile Sunday 11/15/2015 23:00 Southern None -122.4132 37.77499
	10 Assault Aggravated Assault With Bodily Force Sunday 11/15/2015 22:59 Mission None -122.4236 37.76487

Filtering the Data

Let's filter df by Arrests to aggregate some intersting statistics.



In [6]:

    
# grepl() is the best way to do in-text search
df_arrest <- df %>% filter(grepl("Arrest", Resolution))

df_arrest %>% head(10)
sprintf("# of Rows in Dataframe: %s", nrow(df_arrest))
sprintf("Dataframe Size: %s", format(object.size(df_arrest), units = "MB"))









    Out[6]:





Category Descript DayOfWeek Date Time PdDistrict Resolution X Y

	1 Vandalism Malicious Mischief, Breaking Windows Sunday 11/15/2015 23:00 Central Arrest, Booked -122.407 37.78992
	2 Assault Battery Sunday 11/15/2015 22:53 Northern Arrest, Booked -122.4187 37.78501
	3 Assault Child Abuse (Physical) Sunday 11/15/2015 22:53 Northern Arrest, Booked -122.4187 37.78501
	4 Other Offenses Drivers License, Suspended Or Revoked Sunday 11/15/2015 22:35 Southern Arrest, Booked -122.412 37.7809
	5 Stolen Property Stolen Property, Possession With Knowledge, Receiving Sunday 11/15/2015 22:20 Central Arrest, Booked -122.4185 37.80615
	6 Other Offenses Tampering With A Vehicle Sunday 11/15/2015 22:20 Central Arrest, Booked -122.4185 37.80615
	7 Warrants Enroute To Department Of Corrections Sunday 11/15/2015 22:20 Central Arrest, Booked -122.4185 37.80615
	8 Secondary Codes Domestic Violence Sunday 11/15/2015 22:00 Northern Arrest, Booked -122.4385 37.79941
	9 Assault Threats Against Life Sunday 11/15/2015 22:00 Northern Arrest, Booked -122.4385 37.79941
	10 Larceny/Theft Lost Property, Petty Theft Sunday 11/15/2015 21:40 Central Arrest, Booked -122.4185 37.80615









    Out[6]:




'# of Rows in Dataframe: 587499'






    Out[6]:




'Dataframe Size: 40.7 Mb'

Crime Over Time

Create a chart of crimes over time.



In [7]:

    
df_arrest_daily <- df_arrest %>%
                    mutate(Date = as.Date(Date, "%m/%d/%Y")) %>%
                    group_by(Date) %>% 
                    summarize(count = n()) %>%
                    arrange(Date)

df_arrest_daily %>% head(10)









    Out[7]:





Date count

	1 2003-01-01 172
	2 2003-01-02 144
	3 2003-01-03 191
	4 2003-01-04 123
	5 2003-01-05 161
	6 2003-01-06 184
	7 2003-01-07 181
	8 2003-01-08 233
	9 2003-01-09 183
	10 2003-01-10 135



In [8]:

    
plot <- ggplot(df_arrest_daily, aes(x = Date, y = count)) +
    geom_line(color = "#F2CA27", size = 0.1) +
    geom_smooth(color = "#1A1A1A") +
    fte_theme() +
    scale_x_date(breaks = date_breaks("2 years"), labels = date_format("%Y")) +
    labs(x = "Date of Arrest", y = "# of Police Arrests", title = "Daily Police Arrests in San Francisco from 2003 – 2015")

max_save(plot, "sf-arrest-when-1", "SF OpenData")









    



geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

Crime Time Heatmap

Aggregate counts of arrests by Day-of-Week and Time to create heat map. Fortunately, the Day-Of-Week part is pre-derived, but Hour is slightly harder.



In [9]:

    
# Returns the numeric hour component of a string formatted "HH:MM", e.g. "09:40" input returns 9
get_hour <- function(x) {
    return (as.numeric(strsplit(x,":")[[1]][1]))
}

df_arrest_time <- df_arrest %>%
                    mutate(Hour = sapply(Time, get_hour)) %>%
                    group_by(DayOfWeek, Hour) %>% 
                    summarize(count = n())

df_arrest_time %>% head(10)









    Out[9]:





DayOfWeek Hour count

	1 Friday 0 3670
	2 Friday 1 2627
	3 Friday 2 2277
	4 Friday 3 1399
	5 Friday 4 986
	6 Friday 5 879
	7 Friday 6 1294
	8 Friday 7 2283
	9 Friday 8 2873
	10 Friday 9 3227

Reorder and format Factors.



In [10]:

    
dow_format <- c("Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday")
hour_format <- c(paste(c(12,1:11),"AM"), paste(c(12,1:11),"PM"))

df_arrest_time$DayOfWeek <- factor(df_arrest_time$DayOfWeek, level = rev(dow_format))
df_arrest_time$Hour <- factor(df_arrest_time$Hour, level = 0:23, label = hour_format)

df_arrest_time %>% head(10)









    Out[10]:





DayOfWeek Hour count

	1 Friday 12 AM 3670
	2 Friday 1 AM 2627
	3 Friday 2 AM 2277
	4 Friday 3 AM 1399
	5 Friday 4 AM 986
	6 Friday 5 AM 879
	7 Friday 6 AM 1294
	8 Friday 7 AM 2283
	9 Friday 8 AM 2873
	10 Friday 9 AM 3227



In [11]:

    
plot <- ggplot(df_arrest_time, aes(x = Hour, y = DayOfWeek, fill = count)) +
    geom_tile() +
    fte_theme() +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.6), legend.title = element_blank(), legend.position="top", legend.direction="horizontal", legend.key.width=unit(2, "cm"), legend.key.height=unit(0.25, "cm"), legend.margin=unit(-0.5,"cm"), panel.margin=element_blank()) +
    labs(x = "Hour of Arrest (Local Time)", y = "Day of Week of Arrest", title = "# of Police Arrests in San Francisco from 2003 – 2015, by Time of Arrest") +
    scale_fill_gradient(low = "white", high = "#27AE60", labels = comma)

max_save(plot, "sf-arrest-when-2", "SF OpenData", w=6)

Hmm, why is there a surge on Wednesday afternoon, and at 4-5PM on all days? Let's look at subgroups to verify there isn't a latent factor.

Factor by Crime Category

Certain types of crime may be more time dependent. (i.e. more traffic violations when people leave work)



In [12]:

    
df_top_crimes <- df_arrest %>%
                    group_by(Category) %>% 
                    summarize(count = n()) %>%
                    arrange(desc(count))

df_top_crimes %>% head(20)









    Out[12]:





Category count

	1 Other Offenses 183156
	2 Drug/Narcotic 98400
	3 Warrants 81426
	4 Assault 56934
	5 Larceny/Theft 31369
	6 Prostitution 14429
	7 Weapon Laws 11674
	8 Burglary 10449
	9 Trespass 10308
	10 Non-Criminal 10046
	11 Vandalism 9280
	12 Robbery 8168
	13 Stolen Property 8042
	14 Drunkenness 7202
	15 Secondary Codes 6960
	16 Disorderly Conduct 5769
	17 Fraud 4849
	18 Driving Under The Influence 4549
	19 Vehicle Theft 4376
	20 Forgery/Counterfeiting 4210



In [13]:

    
df_arrest_time_crime <- df_arrest %>%
                    filter(Category %in% df_top_crimes$Category[2:19]) %>%
                    mutate(Hour = sapply(Time, get_hour)) %>%
                    group_by(Category, DayOfWeek, Hour) %>% 
                    summarize(count = n())

df_arrest_time_crime$DayOfWeek <- factor(df_arrest_time_crime$DayOfWeek, level = rev(dow_format))
df_arrest_time_crime$Hour <- factor(df_arrest_time_crime$Hour, level = 0:23, label = hour_format)

df_arrest_time_crime %>% head(10)









    Out[13]:





Category DayOfWeek Hour count

	1 Assault Friday 12 AM 408
	2 Assault Friday 1 AM 341
	3 Assault Friday 2 AM 326
	4 Assault Friday 3 AM 149
	5 Assault Friday 4 AM 105
	6 Assault Friday 5 AM 88
	7 Assault Friday 6 AM 113
	8 Assault Friday 7 AM 193
	9 Assault Friday 8 AM 238
	10 Assault Friday 9 AM 254



In [14]:

    
plot <- ggplot(df_arrest_time_crime, aes(x = Hour, y = DayOfWeek, fill = count)) +
    geom_tile() +
    fte_theme() +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.6, size = 4)) +
    labs(x = "Hour of Arrest (Local Time)", y = "Day of Week of Arrest", title = "# of Police Arrests in San Francisco from 2003 – 2015, by Category and Time of Arrest") +
    scale_fill_gradient(low = "white", high = "#2980B9") +
    facet_wrap(~ Category, nrow = 6)

max_save(plot, "sf-arrest-when-3", "SF OpenData", w = 6, h = 8, tall = T)

Good, but the gradients aren't helpful because they are not normalized. We need to normalize the range on each facet. (unfortunately, this makes the value of the gradient unhelpful)



In [15]:

    
df_arrest_time_crime <- df_arrest_time_crime %>%
                            group_by(Category) %>%
                            mutate(norm = count/sum(count))

df_arrest_time_crime %>% head(10)









    Out[15]:





Category DayOfWeek Hour count norm

	1 Assault Friday 12 AM 408 0.007166192
	2 Assault Friday 1 AM 341 0.005989391
	3 Assault Friday 2 AM 326 0.005725928
	4 Assault Friday 3 AM 149 0.002617065
	5 Assault Friday 4 AM 105 0.001844241
	6 Assault Friday 5 AM 88 0.001545649
	7 Assault Friday 6 AM 113 0.001984754
	8 Assault Friday 7 AM 193 0.00338989
	9 Assault Friday 8 AM 238 0.004180279
	10 Assault Friday 9 AM 254 0.004461306



In [16]:

    
plot <- ggplot(df_arrest_time_crime, aes(x = Hour, y = DayOfWeek, fill = norm)) +
    geom_tile() +
    fte_theme() +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.6, size = 4)) +
    labs(x = "Hour of Arrest (Local Time)", y = "Day of Week of Arrest", title = "Police Arrests in San Francisco from 2003 – 2015 by Time of Arrest, Normalized by Type of Crime") +
    scale_fill_gradient(low = "white", high = "#2980B9") +
    facet_wrap(~ Category, nrow = 6)

max_save(plot, "sf-arrest-when-4", "SF OpenData", w = 6, h = 8, tall = T)

Much more helpful.

Factor by Police District

Same as above, but with a different facet.



In [17]:

    
df_arrest_time_district <- df_arrest %>%
                    mutate(Hour = sapply(Time, get_hour)) %>%
                    group_by(PdDistrict, DayOfWeek, Hour) %>% 
                    summarize(count = n()) %>%
                    group_by(PdDistrict) %>%
                    mutate(norm = count/sum(count))

df_arrest_time_district$DayOfWeek <- factor(df_arrest_time_district$DayOfWeek, level = rev(dow_format))
df_arrest_time_district$Hour <- factor(df_arrest_time_district$Hour, level = 0:23, label = hour_format)

df_arrest_time_district %>% head(10)









    Out[17]:





PdDistrict DayOfWeek Hour count norm

	1 Bayview Friday 12 AM 347 0.005714474
	2 Bayview Friday 1 AM 195 0.003211304
	3 Bayview Friday 2 AM 151 0.002486702
	4 Bayview Friday 3 AM 90 0.00148214
	5 Bayview Friday 4 AM 101 0.001663291
	6 Bayview Friday 5 AM 81 0.001333926
	7 Bayview Friday 6 AM 100 0.001646822
	8 Bayview Friday 7 AM 226 0.003721819
	9 Bayview Friday 8 AM 313 0.005154554
	10 Bayview Friday 9 AM 397 0.006537885



In [18]:

    
plot <- ggplot(df_arrest_time_district, aes(x = Hour, y = DayOfWeek, fill = norm)) +
    geom_tile() +
    fte_theme() +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.6, size = 4)) +
    labs(x = "Hour of Arrest (Local Time)", y = "Day of Week of Arrest", title = "Police Arrests in San Francisco from 2003 – 2015 by Time of Arrest, Normalized by Station") +
    scale_fill_gradient(low = "white", high = "#8E44AD") +
    facet_wrap(~ PdDistrict, nrow = 5)

max_save(plot, "sf-arrest-when-5", "SF OpenData", w = 6, h = 8, tall = T)

Not helpful either. Meh.

Factor by Month

If crime is tied to activities, the period at which activies end may impact.



In [19]:

    
df_arrest_time_month <- df_arrest %>%
                    mutate(Month = format(as.Date(Date, "%m/%d/%Y"), "%B"), Hour = sapply(Time, get_hour)) %>%
                    group_by(Month, DayOfWeek, Hour) %>% 
                    summarize(count = n()) %>%
                    group_by(Month) %>%
                    mutate(norm = count/sum(count))

df_arrest_time_month$DayOfWeek <- factor(df_arrest_time_month$DayOfWeek, level = rev(dow_format))
df_arrest_time_month$Hour <- factor(df_arrest_time_month$Hour, level = 0:23, label = hour_format)

df_arrest_time_month %>% head(10)









    Out[19]:





Month DayOfWeek Hour count norm

	1 April Friday 12 AM 292 0.005988884
	2 April Friday 1 AM 187 0.003835347
	3 April Friday 2 AM 209 0.004286564
	4 April Friday 3 AM 98 0.002009968
	5 April Friday 4 AM 103 0.002112517
	6 April Friday 5 AM 53 0.001087023
	7 April Friday 6 AM 107 0.002194557
	8 April Friday 7 AM 190 0.003896876
	9 April Friday 8 AM 216 0.004430133
	10 April Friday 9 AM 284 0.005824805



In [20]:

    
# Set order of month facets by chronological order instead of alphabetical
df_arrest_time_month$Month <- factor(df_arrest_time_month$Month,
                                     level = c("January","February","March","April","May","June","July","August","September","October","November","December"))

plot <- ggplot(df_arrest_time_month, aes(x = Hour, y = DayOfWeek, fill = norm)) +
    geom_tile() +
    fte_theme() +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.6, size = 4)) +
    labs(x = "Hour of Arrest (Local Time)", y = "Day of Week of Arrest", title = "Police Arrests in San Francisco from 2003 – 2015 by Time of Arrest, Normalized by Month") +
    scale_fill_gradient(low = "white", high = "#E74C3C") +
    facet_wrap(~ Month, nrow = 4)

max_save(plot, "sf-arrest-when-6", "SF OpenData", w = 6, h = 6, tall = T)

That is not helpful either!

Factor By Year

Perhaps things changed overtime?



In [21]:

    
df_arrest_time_year <- df_arrest %>%
                    mutate(Year = format(as.Date(Date, "%m/%d/%Y"), "%Y"), Hour = sapply(Time, get_hour)) %>%
                    group_by(Year, DayOfWeek, Hour) %>% 
                    summarize(count = n()) %>%
                    group_by(Year) %>%
                    mutate(norm = count/sum(count))

df_arrest_time_year$DayOfWeek <- factor(df_arrest_time_year$DayOfWeek, level = rev(dow_format))
df_arrest_time_year$Hour <- factor(df_arrest_time_year$Hour, level = 0:23, label = hour_format)

df_arrest_time_year %>% head(10)









    Out[21]:





Year DayOfWeek Hour count norm

	1 2003 Friday 12 AM 295 0.005575084
	2 2003 Friday 1 AM 195 0.003685225
	3 2003 Friday 2 AM 181 0.003420645
	4 2003 Friday 3 AM 155 0.002929281
	5 2003 Friday 4 AM 61 0.001152814
	6 2003 Friday 5 AM 95 0.001795366
	7 2003 Friday 6 AM 152 0.002872586
	8 2003 Friday 7 AM 240 0.004535662
	9 2003 Friday 8 AM 296 0.005593983
	10 2003 Friday 9 AM 395 0.007464943



In [22]:

    
plot <- ggplot(df_arrest_time_year, aes(x = Hour, y = DayOfWeek, fill = norm)) +
    geom_tile() +
    fte_theme() +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.6, size = 4)) +
    labs(x = "Hour of Arrest (Local Time)", y = "Day of Week of Arrest", title = "Police Arrests in San Francisco from 2003 – 2015 by Time of Arrest, Normalized by Year") +
    scale_fill_gradient(low = "white", high = "#E67E22") +
    facet_wrap(~ Year, nrow = 6)

max_save(plot, "sf-arrest-when-7", "SF OpenData", w = 6, h = 6, tall = T)

Ack, not really.

Plot with ggmap

Let's try working with maps. (Ed. Note: Due to their size, the maps will not be embedded directly into the notebook, but they will be available in the repository.}

We can use the CSV output of the Bounding Box Tool to easily choose explicit bounds.



In [23]:

    
bbox = c(-122.516441,37.702072,-122.37276,37.811818)

# credit to /u/all_genes_considered for map setting suggestion
map <- get_map(location = bbox, source = "stamen", maptype = "toner-lite")









    



Map from URL : http://tile.stamen.com/toner-lite/13/1308/3165.png
Map from URL : http://tile.stamen.com/toner-lite/13/1309/3165.png
Map from URL : http://tile.stamen.com/toner-lite/13/1310/3165.png
Map from URL : http://tile.stamen.com/toner-lite/13/1311/3165.png
Map from URL : http://tile.stamen.com/toner-lite/13/1308/3166.png
Map from URL : http://tile.stamen.com/toner-lite/13/1309/3166.png
Map from URL : http://tile.stamen.com/toner-lite/13/1310/3166.png
Map from URL : http://tile.stamen.com/toner-lite/13/1311/3166.png
Map from URL : http://tile.stamen.com/toner-lite/13/1308/3167.png
Map from URL : http://tile.stamen.com/toner-lite/13/1309/3167.png
Map from URL : http://tile.stamen.com/toner-lite/13/1310/3167.png
Map from URL : http://tile.stamen.com/toner-lite/13/1311/3167.png
Map from URL : http://tile.stamen.com/toner-lite/13/1308/3168.png
Map from URL : http://tile.stamen.com/toner-lite/13/1309/3168.png
Map from URL : http://tile.stamen.com/toner-lite/13/1310/3168.png
Map from URL : http://tile.stamen.com/toner-lite/13/1311/3168.png

Test map download.



In [64]:

    
png("sf-arrest-where-0.png", w=900, h=900, res=300)
ggmap(map)
dev.off()









    Out[64]:




pdf: 2

The "white space" issue noted in the bootstrap article is still present due to the fixed ratio of the ggmap. You will need to tweak chart dimensions accordingly.



In [24]:

    
plot <- ggmap(map) +
            geom_point(data = df_arrest, aes(x=X, y=Y), color = "#27AE60", size = 0.5, alpha = 0.01) +
            fte_theme() +
            theme(axis.text.x = element_blank(), axis.text.y = element_blank(), axis.title.x = element_blank(), axis.title.y = element_blank()) +
            theme(plot.margin = unit(c(0.3, 0.3, -0.25, 0), "cm")) +
            labs(title = "Locations of Police Arrests Made in San Francisco from 2003 – 2015")

max_save(plot, "sf-arrest-where-1", "SF OpenData", w = 3.8, h = 4)

We can facet the map by the Type of Crime using facet_wrap. (contrary to notes in the documentation, setting the ggplot as the base_layer is apparently not necessary, and imposes a performance penalty)



In [25]:

    
plot <- ggmap(map) +
            geom_point(data = df_arrest %>% filter(Category %in% df_top_crimes$Category[2:19]), aes(x=X, y=Y, color=Category), size=0.75, alpha=0.05) +
            fte_theme() +
            theme(axis.text.x = element_blank(), axis.text.y = element_blank(), axis.title.x = element_blank(), axis.title.y = element_blank()) +
            labs(title = "Locations of Police Arrests Made in San Francisco from 2003 – 2015, by Type of Crime") +
            facet_wrap(~ Category, nrow = 3)

max_save(plot, "sf-arrest-where-2", "SF OpenData", w = 14.2, h = 8, tall = T)

Now let's normalize the above plot for each facter, with Hex aggregation.



In [38]:

    
# Do not show hex if sum is below threshold
sum_thresh <- function(x, threshold = 10^-3) {
    if (sum(x) < threshold) {return (NA)}
    else {return (sum(x))}
}

plot <- ggmap(map) +
            stat_summary_hex(data = df_arrest %>% filter(Category %in% df_top_crimes$Category[2:19]) %>% group_by(Category) %>% mutate(w=1/n()), aes(x=X, y=Y, z=w), fun=sum_thresh, alpha = 0.8, color="#CCCCCC") +
            fte_theme() +
            theme(axis.text.x = element_blank(), axis.text.y = element_blank(), axis.title.x = element_blank(), axis.title.y = element_blank()) +
            scale_fill_gradient(low = "#DDDDDD", high = "#2980B9") +
            labs(title = "Locations of Police Arrests Made in San Francisco from 2003 – 2015, Normalized by Type of Crime") +
            facet_wrap(~ Category, nrow = 3)

max_save(plot, "sf-arrest-where-3", "SF OpenData", w = 14.2, h = 8, tall = T)

Facet by police districts.



In [56]:

    
plot <- ggmap(map) +
            stat_summary_hex(data = df_arrest %>% group_by(PdDistrict) %>% mutate(w=1/n()), aes(x=X, y=Y, z=w), fun=sum_thresh, alpha = 0.8, color="#CCCCCC") +
            fte_theme() +
            theme(axis.text.x = element_blank(), axis.text.y = element_blank(), axis.title.x = element_blank(), axis.title.y = element_blank()) +
            scale_fill_gradient(low = "#DDDDDD", high = "#8E44AD") +
            labs(title = "Locations of Police Arrests Made in San Francisco from 2003 – 2015, Normalized by Police District") +
            facet_wrap(~ PdDistrict, nrow = 2)

max_save(plot, "sf-arrest-where-4", "SF OpenData", w = 13, h = 6, tall = T)

Facet by months. (The raw month must be appended to the original df_arrest data frame now)



In [55]:

    
df_arrest <- df_arrest %>% mutate(Month=format(as.Date(Date, "%m/%d/%Y"), "%B"))
df_arrest$Month <- factor(df_arrest$Month,
                                     level = c("January","February","March","April","May","June","July","August","September","October","November","December"))

plot <- ggmap(map) +
            stat_summary_hex(data = df_arrest %>% group_by(Month) %>% mutate(w=1/n()), aes(x=X, y=Y, z=w), fun=sum_thresh, alpha = 0.8, color="#CCCCCC") +
            fte_theme() +
            theme(axis.text.x = element_blank(), axis.text.y = element_blank(), axis.title.x = element_blank(), axis.title.y = element_blank()) +
            scale_fill_gradient(low = "#DDDDDD", high = "#E74C3C") +
            labs(title = "Locations of Police Arrests Made in San Francisco from 2003 – 2015, Normalized by Month") +
            facet_wrap(~ Month, nrow=2)

max_save(plot, "sf-arrest-where-5", "SF OpenData", w=13, h=5, tall=T)

Facet by year.



In [54]:

    
df_arrest <- df_arrest %>% mutate(Year=format(as.Date(Date, "%m/%d/%Y"), "%Y"))

plot <- ggmap(map) +
            stat_summary_hex(data=df_arrest %>% group_by(Year) %>% mutate(w=1/n()), aes(x=X, y=Y, z=w), fun=sum_thresh, alpha = 0.8, color="#CCCCCC") +
            fte_theme() +
            theme(axis.text.x = element_blank(), axis.text.y = element_blank(), axis.title.x = element_blank(), axis.title.y = element_blank()) +
            scale_fill_gradient(low = "#DDDDDD", high = "#E67E22") +
            labs(title = "Locations of Police Arrests Made in San Francisco from 2003 – 2015, Normalized by Year") +
            facet_wrap(~ Year, nrow=2)

max_save(plot, "sf-arrest-where-6", "SF OpenData", w=10.5, h=4)

Facet by hour of day.



In [53]:

    
df_arrest <- df_arrest %>% mutate(Hour = sapply(Time, get_hour))
df_arrest$Hour <- factor(df_arrest$Hour, level = 0:23, label = hour_format)

plot <- ggmap(map) +
            stat_summary_hex(data=df_arrest %>% group_by(Hour) %>% mutate(w=1/n()), aes(x=X, y=Y, z=w), fun=sum_thresh, alpha = 0.8, color="#CCCCCC") +
            fte_theme() +
            theme(axis.text.x = element_blank(), axis.text.y = element_blank(), axis.title.x = element_blank(), axis.title.y = element_blank()) +
            scale_fill_gradient(low = "#DDDDDD", high = "#1ABC9C") +
            labs(title = "Locations of Police Arrests Made in San Francisco from 2003 – 2015, Normalized by Hour") +
            facet_wrap(~ Hour, nrow=4)

max_save(plot, "sf-arrest-where-7", "SF OpenData", w=10.5, h=8, tall=T)

Facet by Day of Week



In [40]:

    
df_arrest$DayOfWeek <- factor(df_arrest$DayOfWeek, level = dow_format)

plot <- ggmap(map) +
            stat_summary_hex(data=df_arrest %>% group_by(DayOfWeek) %>% mutate(w=1/n()), aes(x=X, y=Y, z=w), fun=sum_thresh, alpha = 0.8, color="#CCCCCC") +
            fte_theme() +
            theme(axis.text.x = element_blank(), axis.text.y = element_blank(), axis.title.x = element_blank(), axis.title.y = element_blank()) +
            scale_fill_gradient(low = "#DDDDDD", high = "#16A085") +
            labs(title = "Locations of Police Arrests Made in San Francisco from 2003 – 2015, Normalized by Day of Week") +
            facet_wrap(~ DayOfWeek, nrow=2)

max_save(plot, "sf-arrest-where-8", "SF OpenData", w=10.5, h=6, tall=T)

Followup analysis to /u/NowProveIt's comment on Reddit suggesting that SSI payments lead to higher activity on Wednesday. Here's a code fragment to create a data frame of Wednesdays and their month-wise ordinals.



In [31]:

    
start_date <- "2003-01-01"
end_date <- "2015-11-15"

# Create a vector of all days between start and end date
days <- seq(as.Date(start_date), as.Date(end_date), "days")

df_dates <- tbl_df(data.frame(Date = days)) %>%
                mutate(weekday = format(Date, "%A"),
                       month = format(Date, "%B"),
                       year = format(Date, "%Y"))

df_dates %>% head(10)









    Out[31]:





Date weekday month year

	1 2003-01-01 Wednesday January 2003
	2 2003-01-02 Thursday January 2003
	3 2003-01-03 Friday January 2003
	4 2003-01-04 Saturday January 2003
	5 2003-01-05 Sunday January 2003
	6 2003-01-06 Monday January 2003
	7 2003-01-07 Tuesday January 2003
	8 2003-01-08 Wednesday January 2003
	9 2003-01-09 Thursday January 2003
	10 2003-01-10 Friday January 2003

Use window function shennanigans to get the ordinal ranks.



In [32]:

    
# Text values to replace numeric ordinals
ordinals <- c("First", "Second", "Third", "Fourth")

df_dates <- df_dates %>%
                filter(weekday == "Wednesday") %>%
                group_by(year, month) %>%
                mutate(rank = rank(Date)) %>%
                filter(rank <= 4) %>% # removes the rare 5th Wednesday
                mutate(Date = format(Date, format = "%m/%d/%Y"),  # needs to be proper format for merging
                       rank = factor(rank, levels = 1:4, labels = ordinals),
                       ordinal = paste(rank, weekday))

df_dates %>% head(10)









    Out[32]:





Date weekday month year rank ordinal

	1 01/01/2003 Wednesday January 2003 First First Wednesday
	2 01/08/2003 Wednesday January 2003 Second Second Wednesday
	3 01/15/2003 Wednesday January 2003 Third Third Wednesday
	4 01/22/2003 Wednesday January 2003 Fourth Fourth Wednesday
	5 02/05/2003 Wednesday February 2003 First First Wednesday
	6 02/12/2003 Wednesday February 2003 Second Second Wednesday
	7 02/19/2003 Wednesday February 2003 Third Third Wednesday
	8 02/26/2003 Wednesday February 2003 Fourth Fourth Wednesday
	9 03/05/2003 Wednesday March 2003 First First Wednesday
	10 03/12/2003 Wednesday March 2003 Second Second Wednesday

Combine with the arrest data frame.



In [49]:

    
df_arrest_wed <- df_arrest %>%
                filter(DayOfWeek == "Wednesday") %>%
                inner_join(df_dates) %>%
                select(Date, Time, X, Y, ordinal)

sprintf("NA values present from Merge: %s", sum(is.na(df_arrest_wed %>% select(ordinal))) > 0)
set.seed(42)
df_arrest_wed %>% sample_n(10)









    



Joining by: "Date"






    Out[49]:




'NA values present from Merge: FALSE'






    Out[49]:





Date Time X Y ordinal

	1 11/26/2003 11:15 -122.4106 37.7825 Fourth Wednesday
	2 09/03/2003 22:17 -122.3891 37.71968 First Wednesday
	3 09/14/2011 23:24 -122.4108 37.78321 Second Wednesday
	4 11/10/2004 16:00 -122.4162 37.76363 Second Wednesday
	5 06/06/2007 20:09 -122.4164 37.7816 First Wednesday
	6 12/10/2008 15:52 -122.411 37.78414 Second Wednesday
	7 02/22/2006 10:55 -122.4085 37.77376 Fourth Wednesday
	8 10/23/2013 23:30 -122.4296 37.76775 Fourth Wednesday
	9 03/28/2007 10:25 -122.4466 37.78225 Fourth Wednesday
	10 08/02/2006 17:52 -122.4124 37.783 First Wednesday



In [50]:

    
df_arrests_ord <- df_arrest_wed %>%
                    mutate(Hour = sapply(Time, get_hour)) %>%
                    group_by(ordinal, Hour) %>%
                    summarize(count = n())

df_arrests_ord %>% head(10)









    Out[50]:





ordinal Hour count

	1 First Wednesday 0 783
	2 First Wednesday 1 581
	3 First Wednesday 2 460
	4 First Wednesday 3 363
	5 First Wednesday 4 247
	6 First Wednesday 5 282
	7 First Wednesday 6 356
	8 First Wednesday 7 621
	9 First Wednesday 8 795
	10 First Wednesday 9 831



In [65]:

    
df_arrests_ord$ordinal <- factor(df_arrests_ord$ordinal, levels = c("First Wednesday", "Second Wednesday", "Third Wednesday", "Fourth Wednesday"))

plot <- ggplot(df_arrests_ord, aes(x = Hour, y = count, color = ordinal)) +
    geom_line() +
    fte_theme() +
    scale_x_continuous(breaks = c(0,4,8,12,16,20), labels = c("12 AM", "4 AM", "8 AM", "12 PM", "4 PM", "8 PM")) +
    scale_y_continuous(labels = comma) +
    theme(legend.title = element_blank(), legend.position="top", legend.direction="horizontal", legend.key.height=unit(0.25, "cm"), legend.margin=unit(-0.5,"cm")) +
    labs(x = "Hour of Arrest (Local Time)", y = "Total # of Arrests by Hour", title = "# of Police Arrests in San Francisco from 2003 – 2015 on Wednesdays, by Hour")

max_save(plot, "ssi-crime-1", "SF OpenData", w = 5)

Create a normallized map of the Wednesdays, because why not?



In [52]:

    
df_arrest_wed$ordinal <- factor(df_arrest_wed$ordinal, levels = c("First Wednesday", "Second Wednesday", "Third Wednesday", "Fourth Wednesday"))

plot <- ggmap(map) +
            stat_summary_hex(data=df_arrest_wed %>% group_by(ordinal) %>% mutate(w=1/n()), aes(x=X, y=Y, z=w), fun=sum_thresh, alpha = 0.8, color="#CCCCCC") +
            fte_theme() +
            theme(axis.text.x = element_blank(), axis.text.y = element_blank(), axis.title.x = element_blank(), axis.title.y = element_blank()) +
            scale_fill_gradient(low = "#DDDDDD", high = "#E74C3C") +
            labs(title = "Locations of Police Arrests Made in San Francisco from 2003 – 2015, Normalized by # Wednesday") +
            facet_wrap(~ ordinal, nrow=2)

max_save(plot, "ssi-crime-2", "SF OpenData", w = 5.5, h = 6, tall=T)

The MIT License (MIT)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

	IncidntNum	Category	Descript	DayOfWeek	Date	Time	PdDistrict	Resolution	Address	X	Y	Location	PdId
1	150996567	BURGLARY	BURGLARY OF APARTMENT HOUSE, UNLAWFUL ENTRY	Sunday	11/15/2015	23:58	INGLESIDE	NONE	3200 Block of HARRISON ST	-122.4115	37.74605	(37.7460485796086, -122.411460219918)	1.509966e+13
2	156283485	LARCENY/THEFT	PETTY THEFT OF PROPERTY	Sunday	11/15/2015	23:30	BAYVIEW	NONE	17TH ST / DEHARO ST	-122.4016	37.76484	(37.7648403636386, -122.401600659931)	1.562835e+13
3	150997195	VANDALISM	MALICIOUS MISCHIEF, GRAFFITI	Sunday	11/15/2015	23:30	PARK	NONE	2500 Block of 15TH ST	-122.4378	37.76603	(37.7660311509137, -122.437848713459)	1.509972e+13
4	150996501	VANDALISM	MALICIOUS MISCHIEF, VANDALISM OF VEHICLES	Sunday	11/15/2015	23:15	SOUTHERN	NONE	1ST ST / FOLSOM ST	-122.3945	37.7873	(37.7872982355244, -122.394484874311)	1.509965e+13
5	150996501	SUSPICIOUS OCC	SUSPICIOUS OCCURRENCE	Sunday	11/15/2015	23:15	SOUTHERN	NONE	1ST ST / FOLSOM ST	-122.3945	37.7873	(37.7872982355244, -122.394484874311)	1.509965e+13
6	156280936	LARCENY/THEFT	PETTY THEFT OF PROPERTY	Sunday	11/15/2015	23:00	NORTHERN	NONE	1800 Block of GEARY BL	-122.432	37.78425	(37.7842501079896, -122.432035315509)	1.562809e+13
7	150998171	VEHICLE THEFT	STOLEN AUTOMOBILE	Sunday	11/15/2015	23:00	TARAVAL	NONE	2300 Block of 30TH AV	-122.4875	37.74345	(37.7434503392393, -122.487471191928)	1.509982e+13
8	150996777	VANDALISM	MALICIOUS MISCHIEF, BREAKING WINDOWS	Sunday	11/15/2015	23:00	CENTRAL	ARREST, BOOKED	400 Block of STOCKTON ST	-122.407	37.78992	(37.789918101686, -122.406977563692)	1.509968e+13
9	151001038	VEHICLE THEFT	STOLEN AUTOMOBILE	Sunday	11/15/2015	23:00	SOUTHERN	NONE	HOWARD ST / 9TH ST	-122.4132	37.77499	(37.7749926445385, -122.413163134276)	1.51001e+13
10	150996498	ASSAULT	AGGRAVATED ASSAULT WITH BODILY FORCE	Sunday	11/15/2015	22:59	MISSION	NONE	3100 Block of 16TH ST	-122.4236	37.76487	(37.7648666651043, -122.423637302048)	1.509965e+13

	Date	count
1	2003-01-01	172
2	2003-01-02	144
3	2003-01-03	191
4	2003-01-04	123
5	2003-01-05	161
6	2003-01-06	184
7	2003-01-07	181
8	2003-01-08	233
9	2003-01-09	183
10	2003-01-10	135

	DayOfWeek	Hour	count
1	Friday	0	3670
2	Friday	1	2627
3	Friday	2	2277
4	Friday	3	1399
5	Friday	4	986
6	Friday	5	879
7	Friday	6	1294
8	Friday	7	2283
9	Friday	8	2873
10	Friday	9	3227

	DayOfWeek	Hour	count
1	Friday	12 AM	3670
2	Friday	1 AM	2627
3	Friday	2 AM	2277
4	Friday	3 AM	1399
5	Friday	4 AM	986
6	Friday	5 AM	879
7	Friday	6 AM	1294
8	Friday	7 AM	2283
9	Friday	8 AM	2873
10	Friday	9 AM	3227

	Category	count
1	Other Offenses	183156
2	Drug/Narcotic	98400
3	Warrants	81426
4	Assault	56934
5	Larceny/Theft	31369
6	Prostitution	14429
7	Weapon Laws	11674
8	Burglary	10449
9	Trespass	10308
10	Non-Criminal	10046
11	Vandalism	9280
12	Robbery	8168
13	Stolen Property	8042
14	Drunkenness	7202
15	Secondary Codes	6960
16	Disorderly Conduct	5769
17	Fraud	4849
18	Driving Under The Influence	4549
19	Vehicle Theft	4376
20	Forgery/Counterfeiting	4210

	Month	DayOfWeek	Hour	count	norm
1	April	Friday	12 AM	292	0.005988884
2	April	Friday	1 AM	187	0.003835347
3	April	Friday	2 AM	209	0.004286564
4	April	Friday	3 AM	98	0.002009968
5	April	Friday	4 AM	103	0.002112517
6	April	Friday	5 AM	53	0.001087023
7	April	Friday	6 AM	107	0.002194557
8	April	Friday	7 AM	190	0.003896876
9	April	Friday	8 AM	216	0.004430133
10	April	Friday	9 AM	284	0.005824805

	Year	DayOfWeek	Hour	count	norm
1	2003	Friday	12 AM	295	0.005575084
2	2003	Friday	1 AM	195	0.003685225
3	2003	Friday	2 AM	181	0.003420645
4	2003	Friday	3 AM	155	0.002929281
5	2003	Friday	4 AM	61	0.001152814
6	2003	Friday	5 AM	95	0.001795366
7	2003	Friday	6 AM	152	0.002872586
8	2003	Friday	7 AM	240	0.004535662
9	2003	Friday	8 AM	296	0.005593983
10	2003	Friday	9 AM	395	0.007464943

	Date	weekday	month	year
1	2003-01-01	Wednesday	January	2003
2	2003-01-02	Thursday	January	2003
3	2003-01-03	Friday	January	2003
4	2003-01-04	Saturday	January	2003
5	2003-01-05	Sunday	January	2003
6	2003-01-06	Monday	January	2003
7	2003-01-07	Tuesday	January	2003
8	2003-01-08	Wednesday	January	2003
9	2003-01-09	Thursday	January	2003
10	2003-01-10	Friday	January	2003

	Date	weekday	month	year	rank	ordinal
1	01/01/2003	Wednesday	January	2003	First	First Wednesday
2	01/08/2003	Wednesday	January	2003	Second	Second Wednesday
3	01/15/2003	Wednesday	January	2003	Third	Third Wednesday
4	01/22/2003	Wednesday	January	2003	Fourth	Fourth Wednesday
5	02/05/2003	Wednesday	February	2003	First	First Wednesday
6	02/12/2003	Wednesday	February	2003	Second	Second Wednesday
7	02/19/2003	Wednesday	February	2003	Third	Third Wednesday
8	02/26/2003	Wednesday	February	2003	Fourth	Fourth Wednesday
9	03/05/2003	Wednesday	March	2003	First	First Wednesday
10	03/12/2003	Wednesday	March	2003	Second	Second Wednesday

	Date	Time	X	Y	ordinal
1	11/26/2003	11:15	-122.4106	37.7825	Fourth Wednesday
2	09/03/2003	22:17	-122.3891	37.71968	First Wednesday
3	09/14/2011	23:24	-122.4108	37.78321	Second Wednesday
4	11/10/2004	16:00	-122.4162	37.76363	Second Wednesday
5	06/06/2007	20:09	-122.4164	37.7816	First Wednesday
6	12/10/2008	15:52	-122.411	37.78414	Second Wednesday
7	02/22/2006	10:55	-122.4085	37.77376	Fourth Wednesday
8	10/23/2013	23:30	-122.4296	37.76775	Fourth Wednesday
9	03/28/2007	10:25	-122.4466	37.78225	Fourth Wednesday
10	08/02/2006	17:52	-122.4124	37.783	First Wednesday