This project examines crime data in Detroit, Michigan from January 1, 2009 to present using Detroit Open Data. There are several datasets of interest contained on the Detroit Open Data website. The main crime file has over 1 million records and is updated frequently. Other datasets used contain locational data for schools, police stations, and fire stations within Detroit city limits.
This version: Dave Backus messing around to see if we can overcome the row limits on downloading data.
In [4]:
import pandas as pd # data package
import matplotlib.pyplot as plt # graphics
import sys # system module, used to get Python version
import datetime as dt # date tools, used to note current date
#import geopy as geo # geographical package
%matplotlib inline
print('\nPython version: ', sys.version)
print('Pandas version: ', pd.__version__)
print("Today's date:", dt.date.today())
The datasets are accessed live from Detroit Open Data in the json format. Each data file contains a link which explains how to access the data (each dataset provides a link, shown as url, url1, url2, url3 below). For one the datasets, a link was not available, though I was able to create a link by finding the resource number for the data.
Follow these instructions to access the data:
Follow these instructions to access the Detroit School data.
In [ ]:
%%time
url = 'https://data.detroitmi.gov/resource/i9ph-uyrp.json'
crime = pd.read_json(url)
crime = crime.rename(columns={'caseid':'Case ID',
'address':'Address',
'hour':'Hour',
'incidentdate':'Incident Date',
'lat':'Latitude',
'lon':'Longitude',
'neighborhood':'Neighborhood',
'category':'Category',
'offensedescription':'Offense Description'})
print('Dimensions:', crime.shape)
In [6]:
csv = pd.read_csv('DPD__All_Crime_Incidents__Provisional_.csv')
print('Dimensions:', csv.shape)
In [2]:
crime = crime[['Case ID','Longitude','Latitude','Address','Incident Date','Hour','Neighborhood','Category','Offense Description']].set_index('Case ID')
crime.head(2)
Out[2]:
In [3]:
url1 = 'https://data.detroitmi.gov/resource/3n6r-g9kp.json'
police = pd.read_json(url1)
police = police.rename(columns={'address_1':'Address',
'zip_code':'Zip Code',
'id':'ID'})
police.insert(1, 'Longitude', 0.0)
police.insert(2, 'Latitude', 0.0)
for (i, ps) in police.iterrows():
# Pull out dictionary
curr_dict = ps['location']
# Pull out coordinates
coord = curr_dict['coordinates']
# Set value just sets
police.set_value(i, 'Longitude', coord[0])
police.set_value(i, 'Latitude', coord[1])
police = police[['ID','Longitude','Latitude','Address','Zip Code']].set_index('ID')
police.head(2)
Out[3]:
In [4]:
url2 = 'https://data.detroitmi.gov/resource/hz79-58xh.json'
fire = pd.read_json(url2)
fire = fire.rename(columns={'station':'Station',
'full_address_address':'Address',
'full_address_zip':'Zip Code'})
fire.insert(1, 'Longitude', 0.0)
fire.insert(2, 'Latitude', 0.0)
for (i, fs) in fire.iterrows():
# Pull out dictionary
curr_dict = fs['full_address']
# Pull out coordinates
coord = curr_dict['coordinates']
# Set value just sets
fire.set_value(i, 'Longitude', coord[0])
fire.set_value(i, 'Latitude', coord[1])
fire = fire[['Station','Longitude','Latitude','Address','Zip Code']].set_index('Station')
fire.head(2)
Out[4]:
In [5]:
url3 = 'https://data.detroitmi.gov/resource/8xpr-6ij9.json'
school = pd.read_json(url3)
school = school.rename(columns={'entityoffi':'School',
'the_geom':'Location',
'entityphys':'Address',
'entityph_4':'Zip Code'})
school.insert(1, 'Longitude', 0.0)
school.insert(2, 'Latitude', 0.0)
for (i, s) in school.iterrows():
# Pull out dictionary
curr_dict = s['Location']
# Pull out coordinates
coord = curr_dict['coordinates']
# Set value just sets
school.set_value(i, 'Longitude', coord[0])
school.set_value(i, 'Latitude', coord[1])
school = school[['School', 'Longitude', 'Latitude', 'Address', 'Zip Code']].set_index('School')
school.head(2)
Out[5]:
In [6]:
crime.count()
Out[6]:
In [ ]: