Atlanta Police Department

The Atlanta Police Department provides Part 1 crime data at http://www.atlantapd.org/i-want-to/crime-data-downloads

A recent copy of the data file is stored in the cluster. Please, do not copy this data file into your home directory!

Introduction

  • This notebooks leads into an exploration of public crime data provided by the Atlanta Police Department.
  • The original data set and supplemental information can be found at http://www.atlantapd.org/i-want-to/crime-data-downloads
  • The data set is available on ARC, please, don't download into your home directory on ARC!

In [1]:
import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt

In [7]:
ls -l /home/data/APD/COBRA-YTD*.xlsx


-rw-r--r-- 1 pmolnar data 6113160 Oct  4 17:15 /home/data/APD/COBRA-YTD2009.xlsx
-rw-r--r-- 1 pmolnar data 5528521 Oct  4 17:15 /home/data/APD/COBRA-YTD2010.xlsx
-rw-r--r-- 1 pmolnar data 5432924 Oct  4 17:15 /home/data/APD/COBRA-YTD2011.xlsx
-rw-r--r-- 1 pmolnar data 5128046 Oct  4 17:15 /home/data/APD/COBRA-YTD2012.xlsx
-rw-r--r-- 1 pmolnar data 4972005 Oct  4 17:15 /home/data/APD/COBRA-YTD2013.xlsx
-rw-r--r-- 1 pmolnar data 4804222 Oct  4 17:15 /home/data/APD/COBRA-YTD2014.xlsx
-rw-r--r-- 1 pmolnar data 4640184 Oct  4 17:15 /home/data/APD/COBRA-YTD2015.xlsx
-rw-r--r-- 1 pmolnar data 3931468 Oct  4 17:15 /home/data/APD/COBRA-YTD2016.xlsx
-rw-r--r-- 1 pmolnar data 2962620 Oct  4 17:15 /home/data/APD/COBRA-YTD2017.xlsx

In [15]:
dfd = dict()
for YY in [9, 10, 11, 12, 13, 14, 15, 16, 17]:
    dfd[YY] = pd.read_excel('/home/data/APD/COBRA-YTD20%02d.xlsx'%YY, sheetname='Query')
    print YY
    print "Shape of table: ", dfd[YY].shape
    print dfd[YY].offense_id.min(), dfd[YY].offense_id.max()
    print "-----"


9
Shape of table:  (39626, 23)
82040835 93611412081
-----
10
Shape of table:  (35770, 23)
72692336 103601083093
-----
11
Shape of table:  (35174, 23)
100020717 113642157089
-----
12
Shape of table:  (33394, 23)
110171050 123610998085
-----
13
Shape of table:  (32303, 23)
111290429 133641638086
-----
14
Shape of table:  (31166, 23)
120140510 143590367094
-----
15
Shape of table:  (30115, 23)
140041432 153600498099
-----
16
Shape of table:  (29112, 23)
150102493 163660563113
-----
17
Shape of table:  (19073, 23)
160821939 172640598057
-----

Let's look at the structure of this table. We're actually creating some text output that can be used to create a data dictionary.


In [19]:
pd.concat([dfd[17], dfd[16]]).shape


Out[19]:
(48185, 23)

In [25]:
df = pd.concat([dfd[k] for k in dfd.keys()])

In [26]:
df.to_csv('/home/data/APD/COBRA-YTD-multiyear.csv.gz', index=None, compression='gzip')

In [ ]:


In [ ]: