Crime In Detroit (2009 to Present)

Course: Data Bootcamp
Name: Gillian Berberich
Project Date: May 4th, 2016
New York University
Leonard N. Stern School of Business

This project examines crime data in Detroit, Michigan from January 1, 2009 to present using Detroit Open Data. There are several datasets of interest contained on the Detroit Open Data website. The main crime file has over 1 million records and is updated frequently. Other datasets used contain locational data for schools, police stations, and fire stations within Detroit city limits.

This version: Dave Backus messing around to see if we can overcome the row limits on downloading data.

Importing Packages

For this project, I have used several packages available in Python. It is useful to know which Python version and Pandas version I am using as well as the date these scripts were run.


In [4]:
import pandas as pd             # data package
import matplotlib.pyplot as plt # graphics 
import sys                      # system module, used to get Python version 
import datetime as dt           # date tools, used to note current date
#import geopy as geo             # geographical package

%matplotlib inline 

print('\nPython version: ', sys.version) 
print('Pandas version: ', pd.__version__)
print("Today's date:", dt.date.today())


Python version:  3.5.1 |Anaconda 2.5.0 (64-bit)| (default, Jan 29 2016, 15:01:46) [MSC v.1900 64 bit (AMD64)]
Pandas version:  0.17.1
Today's date: 2016-04-24

Reading Data

The datasets are accessed live from Detroit Open Data in the json format. Each data file contains a link which explains how to access the data (each dataset provides a link, shown as url, url1, url2, url3 below). For one the datasets, a link was not available, though I was able to create a link by finding the resource number for the data.

Follow these instructions to access the data:

  • Open the web address https://data.detroitmi.gov/
  • Click “Public Safety”
  • Locate “DPD: All Crime Incidents, 2009 – Present (Provisional)” on the list.
  • Click “API Docs” link to the bottom right of that block.
  • Scroll down to “Getting Started” and you will see a link that ends in .json
  • Copy the link, this is the link that IPython will pull the data from.
  • Repeat process for “DPD Stations” and “DFD Stations”

Follow these instructions to access the Detroit School data.

  • Click the “Education” link on the left hand bar under “Categories”.
  • Locate “Detroit Schools” on the list.
  • Click on the dataset.
  • On the bar at the top right, click “Export”
  • Under “Download as”, right click on “JSON” and click “Copy Link Location”
  • Paste the link to a word document and copy the resource code portion (it is 4 characters followed by a hyphen and four more characters).
  • Add this resource code to the end of https://data.detroitmi.gov/resource/
  • Finally, add .json to the end of it. This is the link that IPython will pull data from.

Reported Crimes


In [ ]:
%%time
url = 'https://data.detroitmi.gov/resource/i9ph-uyrp.json'
crime = pd.read_json(url)
crime = crime.rename(columns={'caseid':'Case ID',
                              'address':'Address',
                              'hour':'Hour',
                              'incidentdate':'Incident Date',
                              'lat':'Latitude',
                              'lon':'Longitude',
                              'neighborhood':'Neighborhood',
                              'category':'Category',
                              'offensedescription':'Offense Description'})

print('Dimensions:', crime.shape)

In [6]:
csv = pd.read_csv('DPD__All_Crime_Incidents__Provisional_.csv')
print('Dimensions:', csv.shape)


Dimensions: (1126021, 22)
C:\Users\dbackus\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py:2902: DtypeWarning: Columns (3,17,18,21) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)

In [2]:
crime = crime[['Case ID','Longitude','Latitude','Address','Incident Date','Hour','Neighborhood','Category','Offense Description']].set_index('Case ID')

crime.head(2)


Out[2]:
Longitude Latitude Address Incident Date Hour Neighborhood Category Offense Description
Case ID
2048319 -83.2584 42.4210 18000 LAHSER 2016-01-30T00:00:00.000 5 OLD REDFORD STOLEN VEHICLE VEHICLE THEFT
2055559 -83.2332 42.3912 00 ACACIA MINOCK 2016-02-20T00:00:00.000 2 WESTWOOD PARK MISCELLANEOUS MISCELLANEOUS - IMPOUNDED VEHICLE

Police Station Locations


In [3]:
url1 = 'https://data.detroitmi.gov/resource/3n6r-g9kp.json'
police = pd.read_json(url1)
police = police.rename(columns={'address_1':'Address',
                                'zip_code':'Zip Code',
                                'id':'ID'})

police.insert(1, 'Longitude', 0.0)
police.insert(2, 'Latitude', 0.0)

for (i, ps) in police.iterrows():
    # Pull out dictionary
    curr_dict = ps['location']
    # Pull out coordinates
    coord = curr_dict['coordinates']

    # Set value just sets 
    police.set_value(i, 'Longitude', coord[0])
    police.set_value(i, 'Latitude', coord[1])

    
police = police[['ID','Longitude','Latitude','Address','Zip Code']].set_index('ID')

police.head(2)


Out[3]:
Longitude Latitude Address Zip Code
ID
1 -83.045241 42.326325 20 Atwater 48226
2 -83.179933 42.385553 13530 Lesure 48227

Fire Station Locations


In [4]:
url2 = 'https://data.detroitmi.gov/resource/hz79-58xh.json'
fire = pd.read_json(url2)
fire = fire.rename(columns={'station':'Station',
                            'full_address_address':'Address',
                            'full_address_zip':'Zip Code'})

fire.insert(1, 'Longitude', 0.0)
fire.insert(2, 'Latitude', 0.0)

for (i, fs) in fire.iterrows():
    # Pull out dictionary
    curr_dict = fs['full_address']
    # Pull out coordinates
    coord = curr_dict['coordinates']

    # Set value just sets 
    fire.set_value(i, 'Longitude', coord[0])
    fire.set_value(i, 'Latitude', coord[1])

fire = fire[['Station','Longitude','Latitude','Address','Zip Code']].set_index('Station')

fire.head(2)


Out[4]:
Longitude Latitude Address Zip Code
Station
E50 -82.985474 42.420406 12985 Houston Whittier St 48205
E42 -83.138740 42.366575 6324 W Chicago 48204

School Locations


In [5]:
url3 = 'https://data.detroitmi.gov/resource/8xpr-6ij9.json'
school = pd.read_json(url3)
school = school.rename(columns={'entityoffi':'School',
                                'the_geom':'Location',
                                'entityphys':'Address',
                                'entityph_4':'Zip Code'})

school.insert(1, 'Longitude', 0.0)
school.insert(2, 'Latitude', 0.0)

for (i, s) in school.iterrows():
    # Pull out dictionary
    curr_dict = s['Location']
    # Pull out coordinates
    coord = curr_dict['coordinates']

    # Set value just sets 
    school.set_value(i, 'Longitude', coord[0])
    school.set_value(i, 'Latitude', coord[1])

school = school[['School', 'Longitude', 'Latitude', 'Address', 'Zip Code']].set_index('School')
school.head(2)


Out[5]:
Longitude Latitude Address Zip Code
School
Pulaski Elementary-Middle School -82.999392 42.441115 19725 STRASBURG ST 482051633
Sampson Academy -83.118454 42.353458 4700 TIREMAN ST 482044243

Crime Graphs


In [6]:
crime.count()


Out[6]:
Longitude              1000
Latitude               1000
Address                1000
Incident Date          1000
Hour                   1000
Neighborhood            978
Category               1000
Offense Description    1000
dtype: int64

In [ ]: