Atlanta Police Department

The Atlanta Police Department provides Part 1 crime data at http://www.atlantapd.org/i-want-to/crime-data-downloads

A recent copy of the data file is stored in the cluster. Please, do not copy this data file into your home directory!

Introduction

  • This notebooks leads into an exploration of public crime data provided by the Atlanta Police Department.
  • The original data set and supplemental information can be found at http://www.atlantapd.org/i-want-to/crime-data-downloads
  • The data set is available on ARC, please, don't download into your home directory on ARC!

In [2]:
import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt

In [3]:
# load data set
df = pd.read_csv('/home/data/APD/COBRA-YTD2017.csv.gz')
print "Shape of table: ", df.shape


Shape of table:  (17425, 23)

Let's look at the structure of this table. We're actually creating some text output that can be used to create a data dictionary.


In [ ]:
dataDict = pd.DataFrame({'DataType': df.dtypes.values, 'Description': '', }, index=df.columns.values)

We need to enter the descriptions for each entry in our dictionary manually. However, why not just create a the Python code automatically...

Run the code below only if you haven't edited the datadict.py file in a different way, since it will overwrite what you have so far. (That's why the code is commented-out.)


In [ ]:
# with open("datadict.py", "w") as io:
#     for i in dataDict.index:
#         io.write("dataDict.loc['%s'].Description = '' # type: %s\n" % (i, str(dataDict.loc[i].DataType)))

In [ ]:
# %load datadict.py
dataDict.loc['MI_PRINX'].Description = '' # type: int64
dataDict.loc['offense_id'].Description = 'Unique ID in the format YYDDDNNNN with the year YY, the day of the year DDD and a counter NNNN' # type: int64
dataDict.loc['rpt_date'].Description = 'Date the crime was reported' # type: object
dataDict.loc['occur_date'].Description = 'Estimated date when the crime occured' # type: object
dataDict.loc['occur_time'].Description = 'Estimated time when the crime occured' # type: object
dataDict.loc['poss_date'].Description = '' # type: object
dataDict.loc['poss_time'].Description = '' # type: object
dataDict.loc['beat'].Description = '' # type: int64
dataDict.loc['apt_office_prefix'].Description = '' # type: object
dataDict.loc['apt_office_num'].Description = '' # type: object
dataDict.loc['location'].Description = '' # type: object
dataDict.loc['MinOfucr'].Description = '' # type: int64
dataDict.loc['MinOfibr_code'].Description = '' # type: object
dataDict.loc['dispo_code'].Description = '' # type: object
dataDict.loc['MaxOfnum_victims'].Description = '' # type: float64
dataDict.loc['Shift'].Description = 'Zones have 8 or 10 hour shifts' # type: object
dataDict.loc['Avg Day'].Description = '' # type: object
dataDict.loc['loc_type'].Description = '' # type: float64
dataDict.loc['UC2 Literal'].Description = '' # type: object
dataDict.loc['neighborhood'].Description = '' # type: object
dataDict.loc['npu'].Description = '' # type: object
dataDict.loc['x'].Description = '' # type: float64
dataDict.loc['y'].Description = '' # type: float64
dataDict.to_csv("COBRA_Data_Dictionary.csv")

Fixing Data Types


In [ ]:
print df.groupby("Shift").count().index

Date and Time

Create a proper text field

In order to use the text to date/time converter our text columns need to have the appropriate format.


In [4]:
# function currying
def fixdatetime(fld):
    def _fix(s):
        date_col = '%s_date' % fld
        time_col = '%s_time' % fld
        if time_col in s.index:
            return str(s[date_col])+' '+str(s[time_col])
        else:
            return str(s[date_col])+' 00:00:00'
    return _fix

df.apply(fixdatetime('rpt'), axis=1)[:10]


Out[4]:
0    09/06/2017 00:00:00
1    09/06/2017 00:00:00
2    09/06/2017 00:00:00
3    09/06/2017 00:00:00
4    09/06/2017 00:00:00
5    09/06/2017 00:00:00
6    09/06/2017 00:00:00
7    09/06/2017 00:00:00
8    09/06/2017 00:00:00
9    09/06/2017 00:00:00
dtype: object

Convert Columns


In [5]:
for col in ['rpt', 'occur', 'poss']:
    datser = df.apply(fixdatetime(col), axis=1)
    df['%s_dt'%col] = pd.to_datetime(datser, format="%m/%d/%Y %H:%M:%S", errors='coerce')

In [6]:
df.head()


Out[6]:
MI_PRINX offense_id rpt_date occur_date occur_time poss_date poss_time beat apt_office_prefix apt_office_num ... Avg Day loc_type UC2 Literal neighborhood npu x y rpt_dt occur_dt poss_dt
0 7693210 172490115 09/06/2017 09/06/2017 00:30:00 09/06/2017 00:35:00 607 NaN NaN ... Wed NaN AGG ASSAULT Custer/McDonough/Guice W -84.35850 33.70839 2017-09-06 2017-09-06 00:30:00 2017-09-06 00:35:00
1 7693211 172490265 09/06/2017 09/05/2017 11:15:00 09/06/2017 02:30:00 512 NaN NaN ... Tue 99.0 LARCENY-FROM VEHICLE Downtown M -84.39736 33.74958 2017-09-06 2017-09-05 11:15:00 2017-09-06 02:30:00
2 7693212 172490322 09/06/2017 09/06/2017 03:15:00 09/06/2017 03:45:00 501 NaN NaN ... Wed 18.0 LARCENY-FROM VEHICLE Atlantic Station E -84.39776 33.79072 2017-09-06 2017-09-06 03:15:00 2017-09-06 03:45:00
3 7693213 172490390 09/06/2017 09/05/2017 17:45:00 09/06/2017 04:57:00 207 NaN NaN ... Tue 18.0 LARCENY-FROM VEHICLE Brookwood Hills E -84.39361 33.80774 2017-09-06 2017-09-05 17:45:00 2017-09-06 04:57:00
4 7693214 172490401 09/06/2017 09/05/2017 17:00:00 09/06/2017 05:00:00 203 NaN NaN ... Tue 18.0 LARCENY-FROM VEHICLE Hills Park D -84.43337 33.79848 2017-09-06 2017-09-05 17:00:00 2017-09-06 05:00:00

5 rows × 26 columns

Beats and Zones

The City of Atlanta is divided into 6 zones. Each with 12 to 14 beats.

Let's create a separate column for the zones:


In [41]:
df['Zone'] = df['beat']//100

In [ ]:


In [ ]:
df['UC2 Literal'].unique()

In [42]:
df[df['UC2 Literal']=='LARCENY-FROM VEHICLE']


Out[42]:
MI_PRINX offense_id rpt_date occur_date occur_time poss_date poss_time beat apt_office_prefix apt_office_num ... UC2 Literal neighborhood npu x y rpt_dt occur_dt poss_dt Year Zone
1 7693211 172490265 09/06/2017 09/05/2017 11:15:00 09/06/2017 02:30:00 512 NaN NaN ... LARCENY-FROM VEHICLE Downtown M -84.39736 33.74958 2017-09-06 2017-09-05 11:15:00 2017-09-06 02:30:00 2017.0 5
2 7693212 172490322 09/06/2017 09/06/2017 03:15:00 09/06/2017 03:45:00 501 NaN NaN ... LARCENY-FROM VEHICLE Atlantic Station E -84.39776 33.79072 2017-09-06 2017-09-06 03:15:00 2017-09-06 03:45:00 2017.0 5
3 7693213 172490390 09/06/2017 09/05/2017 17:45:00 09/06/2017 04:57:00 207 NaN NaN ... LARCENY-FROM VEHICLE Brookwood Hills E -84.39361 33.80774 2017-09-06 2017-09-05 17:45:00 2017-09-06 04:57:00 2017.0 2
4 7693214 172490401 09/06/2017 09/05/2017 17:00:00 09/06/2017 05:00:00 203 NaN NaN ... LARCENY-FROM VEHICLE Hills Park D -84.43337 33.79848 2017-09-06 2017-09-05 17:00:00 2017-09-06 05:00:00 2017.0 2
11 7693221 172490557 09/06/2017 09/05/2017 21:00:00 09/06/2017 07:20:00 207 NaN 323 ... LARCENY-FROM VEHICLE Loring Heights E -84.40671 33.79742 2017-09-06 2017-09-05 21:00:00 2017-09-06 07:20:00 2017.0 2
12 7693222 172490741 09/06/2017 09/06/2017 07:30:00 09/06/2017 08:20:00 506 NaN NaN ... LARCENY-FROM VEHICLE Midtown E -84.37102 33.77686 2017-09-06 2017-09-06 07:30:00 2017-09-06 08:20:00 2017.0 5
13 7693223 172490781 09/06/2017 09/05/2017 22:00:00 09/06/2017 05:00:00 212 NaN NaN ... LARCENY-FROM VEHICLE Piedmont Heights F -84.37157 33.80697 2017-09-06 2017-09-05 22:00:00 2017-09-06 05:00:00 2017.0 2
17 7693227 172490914 09/06/2017 09/05/2017 19:30:00 09/06/2017 10:10:00 408 NaN NaN ... LARCENY-FROM VEHICLE Venetian Hills S -84.44116 33.72205 2017-09-06 2017-09-05 19:30:00 2017-09-06 10:10:00 2017.0 4
20 7693230 172491015 09/06/2017 09/06/2017 07:00:00 09/06/2017 10:45:00 208 NaN NaN ... LARCENY-FROM VEHICLE Peachtree Park B -84.36815 33.84642 2017-09-06 2017-09-06 07:00:00 2017-09-06 10:45:00 2017.0 2
21 7693231 172491018 09/06/2017 09/06/2017 10:30:00 09/06/2017 11:00:00 413 NaN NaN ... LARCENY-FROM VEHICLE Ben Hill Pines P -84.50755 33.68212 2017-09-06 2017-09-06 10:30:00 2017-09-06 11:00:00 2017.0 4
30 7693240 172491351 09/06/2017 09/06/2017 14:10:00 09/06/2017 14:10:00 401 NaN NaN ... LARCENY-FROM VEHICLE West End T -84.41680 33.74057 2017-09-06 2017-09-06 14:10:00 2017-09-06 14:10:00 2017.0 4
36 7693246 172491776 09/06/2017 09/06/2017 17:00:00 09/06/2017 17:13:00 109 NaN NaN ... LARCENY-FROM VEHICLE Center Hill J -84.47168 33.77587 2017-09-06 2017-09-06 17:00:00 2017-09-06 17:13:00 2017.0 1
40 7693250 172491865 09/06/2017 09/06/2017 14:00:00 09/06/2017 17:45:00 605 NaN NaN ... LARCENY-FROM VEHICLE Grant Park W -84.37925 33.74322 2017-09-06 2017-09-06 14:00:00 2017-09-06 17:45:00 2017.0 6
43 7693253 172492033 09/06/2017 09/03/2017 08:30:00 09/03/2017 22:30:00 206 NaN NaN ... LARCENY-FROM VEHICLE South Tuxedo Park B -84.38104 33.84316 2017-09-06 2017-09-03 08:30:00 2017-09-03 22:30:00 2017.0 2
50 7693260 172492215 09/06/2017 09/06/2017 19:30:00 09/06/2017 21:15:00 206 NaN NaN ... LARCENY-FROM VEHICLE Garden Hills B -84.37759 33.83533 2017-09-06 2017-09-06 19:30:00 2017-09-06 21:15:00 2017.0 2
51 7693261 172492350 09/06/2017 09/06/2017 19:30:00 09/06/2017 22:06:00 505 NaN NaN ... LARCENY-FROM VEHICLE Midtown E -84.38533 33.77831 2017-09-06 2017-09-06 19:30:00 2017-09-06 22:06:00 2017.0 5
55 7693138 172480028 09/05/2017 09/04/2017 17:30:00 09/04/2017 23:30:00 511 NaN NaN ... LARCENY-FROM VEHICLE Downtown M -84.39166 33.75190 2017-09-05 2017-09-04 17:30:00 2017-09-04 23:30:00 2017.0 5
57 7693140 172480065 09/05/2017 09/04/2017 17:00:00 09/05/2017 00:20:00 508 NaN NaN ... LARCENY-FROM VEHICLE Downtown M -84.39178 33.76002 2017-09-05 2017-09-04 17:00:00 2017-09-05 00:20:00 2017.0 5
59 7693142 172480144 09/05/2017 09/05/2017 00:00:00 09/05/2017 01:00:00 213 NaN NaN ... LARCENY-FROM VEHICLE Piedmont Heights F -84.37025 33.79592 2017-09-05 2017-09-05 00:00:00 2017-09-05 01:00:00 2017.0 2
60 7693143 172480162 09/05/2017 09/05/2017 00:00:00 09/05/2017 01:00:00 213 NaN NaN ... LARCENY-FROM VEHICLE Lindridge/Martin Manor F -84.35601 33.81123 2017-09-05 2017-09-05 00:00:00 2017-09-05 01:00:00 2017.0 2
62 7693145 172480270 09/05/2017 09/04/2017 21:00:00 09/05/2017 02:26:00 612 NaN NaN ... LARCENY-FROM VEHICLE East Atlanta W -84.32549 33.72743 2017-09-05 2017-09-04 21:00:00 2017-09-05 02:26:00 2017.0 6
64 7693147 172480383 09/05/2017 09/04/2017 18:00:00 09/05/2017 03:42:00 313 NaN 6104 ... LARCENY-FROM VEHICLE South River Gardens Z -84.35549 33.66061 2017-09-05 2017-09-04 18:00:00 2017-09-05 03:42:00 2017.0 3
67 7693150 172480528 09/05/2017 09/05/2017 04:00:00 09/05/2017 06:05:00 310 NaN 2601 ... LARCENY-FROM VEHICLE Browns Mill Park Z -84.37006 33.68838 2017-09-05 2017-09-05 04:00:00 2017-09-05 06:05:00 2017.0 3
69 7693152 172480682 09/05/2017 09/04/2017 17:00:00 09/05/2017 08:00:00 605 NaN 210 ... LARCENY-FROM VEHICLE Capitol Gateway V -84.38612 33.74513 2017-09-05 2017-09-04 17:00:00 2017-09-05 08:00:00 2017.0 6
70 7693153 172480709 09/05/2017 09/05/2017 07:45:00 09/05/2017 07:50:00 201 NaN NaN ... LARCENY-FROM VEHICLE NaN C -84.45274 33.82065 2017-09-05 2017-09-05 07:45:00 2017-09-05 07:50:00 2017.0 2
71 7693154 172480772 09/05/2017 09/03/2017 23:00:00 09/05/2017 08:35:00 310 NaN NaN ... LARCENY-FROM VEHICLE Browns Mill Park Z -84.36507 33.68797 2017-09-05 2017-09-03 23:00:00 2017-09-05 08:35:00 2017.0 3
72 7693155 172480790 09/05/2017 09/05/2017 05:45:00 09/05/2017 08:00:00 301 NaN NaN ... LARCENY-FROM VEHICLE Adair Park V -84.40824 33.73657 2017-09-05 2017-09-05 05:45:00 2017-09-05 08:00:00 2017.0 3
74 7693157 172480865 09/05/2017 09/04/2017 20:00:00 09/05/2017 07:00:00 612 NaN NaN ... LARCENY-FROM VEHICLE East Atlanta W -84.34393 33.73856 2017-09-05 2017-09-04 20:00:00 2017-09-05 07:00:00 2017.0 6
75 7693158 172480868 09/05/2017 09/05/2017 21:13:00 09/05/2017 21:13:00 210 NaN NaN ... LARCENY-FROM VEHICLE Lenox B -84.35947 33.84800 2017-09-05 2017-09-05 21:13:00 2017-09-05 21:13:00 2017.0 2
76 7693159 172480883 09/05/2017 09/04/2017 18:00:00 09/05/2017 09:19:00 207 NaN NaN ... LARCENY-FROM VEHICLE Loring Heights E -84.39947 33.79455 2017-09-05 2017-09-04 18:00:00 2017-09-05 09:19:00 2017.0 2
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
17336 7676041 170021967 01/02/2017 01/02/2017 19:00:00 01/02/2017 20:15:00 602 NaN NaN ... LARCENY-FROM VEHICLE Inman Park N -84.34916 33.76634 2017-01-02 2017-01-02 19:00:00 2017-01-02 20:15:00 2017.0 6
17340 7676045 170022145 01/02/2017 01/02/2017 22:06:00 01/02/2017 22:30:00 302 NaN NaN ... LARCENY-FROM VEHICLE Pittsburgh V -84.40221 33.73365 2017-01-02 2017-01-02 22:06:00 2017-01-02 22:30:00 2017.0 3
17343 7674777 163522791 01/02/2017 12/18/2016 01:00:00 12/18/2016 01:30:00 303 NaN NaN ... LARCENY-FROM VEHICLE Mechanicsville V -84.39438 33.73699 2017-01-02 2016-12-18 01:00:00 2016-12-18 01:30:00 2016.0 3
17345 7675895 170010079 01/01/2017 01/01/2017 00:00:00 01/01/2017 00:10:00 413 NaN NaN ... LARCENY-FROM VEHICLE NaN P -84.50584 33.68990 2017-01-01 2017-01-01 00:00:00 2017-01-01 00:10:00 2017.0 4
17346 7675896 170010151 01/01/2017 01/01/2017 00:26:00 01/31/2017 12:17:00 503 NaN NaN ... LARCENY-FROM VEHICLE Midtown E -84.39087 33.78153 2017-01-01 2017-01-01 00:26:00 2017-01-31 12:17:00 2017.0 5
17348 7675898 170010193 01/01/2017 12/31/2016 22:30:00 01/01/2017 00:45:00 402 NaN NaN ... LARCENY-FROM VEHICLE Oakland City S -84.42916 33.73214 2017-01-01 2016-12-31 22:30:00 2017-01-01 00:45:00 2016.0 4
17350 7675900 170010232 01/01/2017 12/30/2016 23:30:00 12/31/2016 23:50:00 605 NaN NaN ... LARCENY-FROM VEHICLE Oakland W -84.37875 33.74803 2017-01-01 2016-12-30 23:30:00 2016-12-31 23:50:00 2016.0 6
17351 7675901 170010294 01/01/2017 12/31/2016 19:00:00 01/01/2017 00:30:00 507 NaN NaN ... LARCENY-FROM VEHICLE Downtown M -84.39475 33.74875 2017-01-01 2016-12-31 19:00:00 2017-01-01 00:30:00 2016.0 5
17358 7675908 170010593 01/01/2017 12/31/2016 01:00:00 01/01/2017 03:03:00 309 NaN NaN ... LARCENY-FROM VEHICLE Perkerson X -84.40744 33.68263 2017-01-01 2016-12-31 01:00:00 2017-01-01 03:03:00 2016.0 3
17361 7675911 170010712 01/01/2017 01/01/2017 01:30:00 01/01/2017 03:00:00 212 NaN NaN ... LARCENY-FROM VEHICLE Morningside/Lenox Park F -84.36122 33.81038 2017-01-01 2017-01-01 01:30:00 2017-01-01 03:00:00 2017.0 2
17366 7675916 170010894 01/01/2017 12/15/2016 08:00:00 12/15/2016 12:00:00 405 NaN NaN ... LARCENY-FROM VEHICLE Harland Terrace I -84.48570 33.75177 2017-01-01 2016-12-15 08:00:00 2016-12-15 12:00:00 2016.0 4
17372 7675922 170011007 01/01/2017 12/31/2016 16:00:00 01/01/2017 09:00:00 210 NaN NaN ... LARCENY-FROM VEHICLE Lenox B -84.35947 33.84800 2017-01-01 2016-12-31 16:00:00 2017-01-01 09:00:00 2016.0 2
17373 7675923 170011015 01/01/2017 12/31/2016 23:00:00 01/01/2017 09:00:00 612 NaN NaN ... LARCENY-FROM VEHICLE East Atlanta W -84.33704 33.74340 2017-01-01 2016-12-31 23:00:00 2017-01-01 09:00:00 2016.0 6
17376 7675926 170011070 01/01/2017 01/01/2017 10:25:00 01/01/2017 10:25:00 101 NaN NaN ... LARCENY-FROM VEHICLE The Villages at Castleberry H T -84.40523 33.74505 2017-01-01 2017-01-01 10:25:00 2017-01-01 10:25:00 2017.0 1
17379 7675929 170011129 01/01/2017 12/31/2016 12:00:00 01/01/2017 11:00:00 206 NaN 103 ... LARCENY-FROM VEHICLE Buckhead Village B -84.37533 33.83888 2017-01-01 2016-12-31 12:00:00 2017-01-01 11:00:00 2016.0 2
17381 7675931 170011171 01/01/2017 12/31/2016 21:00:00 01/01/2017 11:00:00 504 NaN NaN ... LARCENY-FROM VEHICLE Marietta Street Artery E -84.41050 33.77957 2017-01-01 2016-12-31 21:00:00 2017-01-01 11:00:00 2016.0 5
17386 7675936 170011255 01/01/2017 01/01/2017 12:00:00 01/01/2017 13:00:00 612 NaN NaN ... LARCENY-FROM VEHICLE East Atlanta W -84.32324 33.72602 2017-01-01 2017-01-01 12:00:00 2017-01-01 13:00:00 2017.0 6
17394 7675958 170011751 01/01/2017 01/01/2017 15:30:00 01/01/2017 17:45:00 508 NaN NaN ... LARCENY-FROM VEHICLE Downtown M -84.38905 33.75832 2017-01-01 2017-01-01 15:30:00 2017-01-01 17:45:00 2017.0 5
17395 7675959 170011863 01/01/2017 12/31/2016 10:30:00 01/01/2017 12:00:00 112 NaN NaN ... LARCENY-FROM VEHICLE Collier Heights I -84.49110 33.78133 2017-01-01 2016-12-31 10:30:00 2017-01-01 12:00:00 2016.0 1
17399 7675963 170012028 01/01/2017 01/01/2017 14:30:00 01/01/2017 19:30:00 102 NaN NaN ... LARCENY-FROM VEHICLE Vine City L -84.40771 33.75916 2017-01-01 2017-01-01 14:30:00 2017-01-01 19:30:00 2017.0 1
17402 7675966 170012145 01/01/2017 01/01/2017 20:30:00 01/01/2017 20:45:00 209 NaN NaN ... LARCENY-FROM VEHICLE North Buckhead B -84.38135 33.85547 2017-01-01 2017-01-01 20:30:00 2017-01-01 20:45:00 2017.0 2
17403 7675967 170012182 01/01/2017 01/01/2017 21:05:00 01/01/2017 21:07:00 203 NaN NaN ... LARCENY-FROM VEHICLE Blandtown D -84.42561 33.80058 2017-01-01 2017-01-01 21:05:00 2017-01-01 21:07:00 2017.0 2
17406 7675970 170012253 01/01/2017 01/01/2017 21:00:00 01/01/2017 22:00:00 212 NaN NaN ... LARCENY-FROM VEHICLE Piedmont Heights F -84.36831 33.80965 2017-01-01 2017-01-01 21:00:00 2017-01-01 22:00:00 2017.0 2
17407 7675971 170012261 01/01/2017 01/01/2017 22:00:00 01/01/2017 22:21:00 414 NaN NaN ... LARCENY-FROM VEHICLE Princeton Lakes P -84.50908 33.65571 2017-01-01 2017-01-01 22:00:00 2017-01-01 22:21:00 2017.0 4
17414 7675944 170011358 01/01/2017 12/31/2016 19:00:00 01/01/2017 11:00:00 505 NaN 1105 ... LARCENY-FROM VEHICLE Midtown E -84.38424 33.77800 2017-01-01 2016-12-31 19:00:00 2017-01-01 11:00:00 2016.0 5
17415 7675945 170011435 01/01/2017 12/31/2016 19:05:00 12/31/2016 22:30:00 504 NaN NaN ... LARCENY-FROM VEHICLE Marietta Street Artery E -84.40612 33.77425 2017-01-01 2016-12-31 19:05:00 2016-12-31 22:30:00 2016.0 5
17416 7675946 170011504 01/01/2017 01/01/2017 15:00:00 01/01/2017 15:30:00 602 NaN NaN ... LARCENY-FROM VEHICLE Inman Park N -84.35289 33.76341 2017-01-01 2017-01-01 15:00:00 2017-01-01 15:30:00 2017.0 6
17418 7675948 170011525 01/01/2017 01/01/2017 15:37:00 01/01/2017 15:37:00 303 NaN NaN ... LARCENY-FROM VEHICLE Mechanicsville V -84.39486 33.73760 2017-01-01 2017-01-01 15:37:00 2017-01-01 15:37:00 2017.0 3
17419 7675949 170011531 01/01/2017 01/01/2017 14:15:00 01/01/2017 15:45:00 610 NaN NaN ... LARCENY-FROM VEHICLE Kirkwood O -84.31645 33.76408 2017-01-01 2017-01-01 14:15:00 2017-01-01 15:45:00 2017.0 6
17422 7675952 170011645 01/01/2017 12/31/2016 20:00:00 01/01/2017 14:00:00 302 NaN NaN ... LARCENY-FROM VEHICLE Pittsburgh V -84.40465 33.73026 2017-01-01 2016-12-31 20:00:00 2017-01-01 14:00:00 2016.0 3

6171 rows × 28 columns


In [ ]:
df.occur_dt.map(lambda d: d.year).unique()

In [43]:
df['Year'] = df.occur_dt.map(lambda d: d.year)
df2 = df[(df.Year>=2010) & (df.Year<=2017)]
df2.shape, df.shape


Out[43]:
((17418, 28), (17425, 28))

In [ ]:


In [ ]:


In [44]:
df_LarcenyFromVehicle = df2[(df2['UC2 Literal']=='LARCENY-FROM VEHICLE')&(df2.Year==2017)].copy()
agr_LarcenyFromVehicle = df_LarcenyFromVehicle.set_index('occur_dt').resample('W').offense_id.count()
agr_LarcenyFromVehicle


Out[44]:
occur_dt
2017-01-01     21
2017-01-08    157
2017-01-15    154
2017-01-22    178
2017-01-29    182
2017-02-05    161
2017-02-12    137
2017-02-19    190
2017-02-26    135
2017-03-05    142
2017-03-12    112
2017-03-19    118
2017-03-26    117
2017-04-02    124
2017-04-09    165
2017-04-16    191
2017-04-23    181
2017-04-30    186
2017-05-07    198
2017-05-14    180
2017-05-21    191
2017-05-28    190
2017-06-04    171
2017-06-11    196
2017-06-18    180
2017-06-25    194
2017-07-02    183
2017-07-09    188
2017-07-16    211
2017-07-23    196
2017-07-30    188
2017-08-06    167
2017-08-13    161
2017-08-20    226
2017-08-27    201
2017-09-03    187
2017-09-10     61
Freq: W-SUN, Name: offense_id, dtype: int64

In [30]:
df_LarcenyFromVehicle["Hour"] = df_LarcenyFromVehicle.occur_dt.map(lambda d: d.hour)

In [33]:
df_LarcenyFromVehicle.groupby("Hour").offense_id.count()


Out[33]:
Hour
0     290
1     220
2     121
3      94
4      55
5      57
6      85
7     113
8     210
9     198
10    195
11    258
12    317
13    278
14    254
15    310
16    288
17    339
18    431
19    483
20    441
21    363
22    403
23    317
Name: offense_id, dtype: int64

In [53]:
hourly = df_LarcenyFromVehicle.resample('H', on='occur_dt').offense_id.count()

In [71]:
hourly.reset_index().occur_dt.map(lambda d: d.week)


Out[71]:
0       52
1       52
2       52
3       52
4       52
5       52
6       52
7       52
8       52
9       52
10      52
11      52
12      52
13      52
14      52
15      52
16      52
17      52
18      52
19      52
20      52
21      52
22      52
23      52
24       1
25       1
26       1
27       1
28       1
29       1
        ..
5942    36
5943    36
5944    36
5945    36
5946    36
5947    36
5948    36
5949    36
5950    36
5951    36
5952    36
5953    36
5954    36
5955    36
5956    36
5957    36
5958    36
5959    36
5960    36
5961    36
5962    36
5963    36
5964    36
5965    36
5966    36
5967    36
5968    36
5969    36
5970    36
5971    36
Name: occur_dt, dtype: int64

In [64]:
df3 = pd.DataFrame({"N": hourly})
##df3['Day'] = df3.reset_index().occur_dt ##.map(lambda d: d.day)
df3


Out[64]:
N
occur_dt
2017-01-01 00:00:00 2
2017-01-01 01:00:00 1
2017-01-01 02:00:00 0
2017-01-01 03:00:00 0
2017-01-01 04:00:00 0
2017-01-01 05:00:00 0
2017-01-01 06:00:00 0
2017-01-01 07:00:00 0
2017-01-01 08:00:00 0
2017-01-01 09:00:00 0
2017-01-01 10:00:00 1
2017-01-01 11:00:00 0
2017-01-01 12:00:00 2
2017-01-01 13:00:00 0
2017-01-01 14:00:00 3
2017-01-01 15:00:00 3
2017-01-01 16:00:00 0
2017-01-01 17:00:00 0
2017-01-01 18:00:00 1
2017-01-01 19:00:00 0
2017-01-01 20:00:00 2
2017-01-01 21:00:00 3
2017-01-01 22:00:00 2
2017-01-01 23:00:00 1
2017-01-02 00:00:00 1
2017-01-02 01:00:00 0
2017-01-02 02:00:00 0
2017-01-02 03:00:00 1
2017-01-02 04:00:00 1
2017-01-02 05:00:00 0
... ...
2017-09-05 14:00:00 0
2017-09-05 15:00:00 2
2017-09-05 16:00:00 0
2017-09-05 17:00:00 2
2017-09-05 18:00:00 3
2017-09-05 19:00:00 3
2017-09-05 20:00:00 2
2017-09-05 21:00:00 2
2017-09-05 22:00:00 1
2017-09-05 23:00:00 0
2017-09-06 00:00:00 0
2017-09-06 01:00:00 0
2017-09-06 02:00:00 0
2017-09-06 03:00:00 1
2017-09-06 04:00:00 0
2017-09-06 05:00:00 0
2017-09-06 06:00:00 0
2017-09-06 07:00:00 2
2017-09-06 08:00:00 0
2017-09-06 09:00:00 0
2017-09-06 10:00:00 1
2017-09-06 11:00:00 0
2017-09-06 12:00:00 0
2017-09-06 13:00:00 0
2017-09-06 14:00:00 2
2017-09-06 15:00:00 0
2017-09-06 16:00:00 0
2017-09-06 17:00:00 1
2017-09-06 18:00:00 0
2017-09-06 19:00:00 2

5972 rows × 1 columns


In [73]:
ls


COBRA_Data_Dictionary.csv  CrimeData_orig.ipynb  datadict.py  HW06/  README.md

In [ ]:
fig = plt.figure(figsize=(10,6)) # 10inx10in
#plt.plot(resdf['BURGLARY-RESIDENCE'].index, resdf['BURGLARY-RESIDENCE'])
plt.scatter(resdf['BURGLARY-RESIDENCE'].index, resdf['BURGLARY-RESIDENCE'], marker='x')
plt.scatter(resdf['BURGLARY-NONRES'].index, resdf['BURGLARY-NONRES'], marker='o')

plt.ylim(0, 500)
plt.title('BURGLARY-RESIDENCE')
plt.xticks(range(13), ['', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
fig.savefig('BurglaryResidence_over_month.svg')
x = 1

In [ ]:
def getTheMonth(x):
    return x.month

df['occur_month'] = df['occur_ts'].map(getTheMonth)

In [ ]:
resdf = df.groupby(['UC2 Literal', 'occur_month']).offense_id.count()
fig = plt.figure(figsize=(10,6))
plt.scatter(resdf['BURGLARY-RESIDENCE'].index, resdf['BURGLARY-RESIDENCE'], marker='x')
plt.ylim(0, 500)
plt.title('BURGLARY-RESIDENCE')
plt.xticks(range(13), ['', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
plt.savefig('quiz3-burglary-residence.png')

Seasonal Model


In [ ]:
fig = plt.figure(figsize=(40,30))
crime_types = crime_year.index.levels[0]
years = crime_year.index.levels[1]
for c in range(len(crime_types)):
    y_max = max(crime_year.loc[crime_types[c]])
    
    plt.subplot(4,3,c+1)
    plt.hlines(crime_year.loc[crime_types[c]].iloc[-1]*100/y_max, years[0], years[-1], linestyles="dashed", color="r")
    plt.bar(crime_year.loc[crime_types[c]].index, crime_year.loc[crime_types[c]]*100/y_max, label=crime_types[c], alpha=0.5)
    ##plt.legend()
    plt.ylim(0, 100)
    plt.xticks(years+0.4, [str(int(y)) for y in years], rotation=0, fontsize=24)
    plt.yticks([0,20,40,60,80,100], ['0%','20%','40%','60%','80%','100%'], fontsize=24)
    plt.title(crime_types[c], fontsize=30)
    None

In [ ]:
c = 3 ## 'BURGLARY-RESIDENCE'
resburglaries = crime_year_month.loc[crime_types[c]]
fig = plt.figure(figsize=(20,10))
for y in years:
    plt.plot(resburglaries.loc[y].index, resburglaries.loc[y], label=("%4.0f"%y))
plt.legend()
plt.title("Seasonal Trends - %s"%crime_types[c], fontsize=20)
plt.xticks(range(13), ['', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
plt.xlim(0,13)
None

In [ ]:
c = 3 ## 'BURGLARY-RESIDENCE'
fig = plt.figure(figsize=(20,10))
for y in years:
    avg = resburglaries.loc[y].mean()
    std = resburglaries.loc[y].std()
    ##plt.hlines(avg, 1, 13, linestyle='dashed')
    plt.plot(resburglaries.loc[y].index, (resburglaries.loc[y]-avg)/std, label=("%4.0f"%y))
plt.legend()
plt.title("Seasonal Trends - %s (normalized)"%crime_types[c], fontsize=20)
plt.xticks(list(range(1,13)), ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
plt.xlim(0,13)
plt.ylabel("Standard deviations $\sigma_y$")
None

In [ ]:


In [ ]:


In [ ]: