The Atlanta Police Department provides Part 1 crime data at http://www.atlantapd.org/i-want-to/crime-data-downloads
A recent copy of the data file is stored in the cluster. Please, do not copy this data file into your home directory!
In [1]:
import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
In [7]:
ls -l /home/data/APD/COBRA-YTD*.xlsx
In [15]:
dfd = dict()
for YY in [9, 10, 11, 12, 13, 14, 15, 16, 17]:
dfd[YY] = pd.read_excel('/home/data/APD/COBRA-YTD20%02d.xlsx'%YY, sheetname='Query')
print YY
print "Shape of table: ", dfd[YY].shape
print dfd[YY].offense_id.min(), dfd[YY].offense_id.max()
print "-----"
Let's look at the structure of this table. We're actually creating some text output that can be used to create a data dictionary.
In [19]:
pd.concat([dfd[17], dfd[16]]).shape
Out[19]:
In [25]:
df = pd.concat([dfd[k] for k in dfd.keys()])
In [26]:
df.to_csv('/home/data/APD/COBRA-YTD-multiyear.csv.gz', index=None, compression='gzip')
In [ ]:
In [ ]: