In [1]:
import pandas as pd

In [3]:
# This is the raw 1.3 GB file downloaded from Chicago's data portal
# Go to https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2
# And export csv to get the latest copy
raw_crimes = pd.read_csv('../data/Crimes_-_2001_to_present.csv')


/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py:1164: DtypeWarning: Columns (13) have mixed types. Specify dtype option on import or set low_memory=False.
  data = self._reader.read(nrows)

In [4]:
crime_type_set = set(raw_crimes['Primary Type'])
crime_type_set


Out[4]:
{'ARSON',
 'ASSAULT',
 'BATTERY',
 'BURGLARY',
 'CONCEALED CARRY LICENSE VIOLATION',
 'CRIM SEXUAL ASSAULT',
 'CRIMINAL DAMAGE',
 'CRIMINAL TRESPASS',
 'DECEPTIVE PRACTICE',
 'DOMESTIC VIOLENCE',
 'GAMBLING',
 'HOMICIDE',
 'HUMAN TRAFFICKING',
 'INTERFERENCE WITH PUBLIC OFFICER',
 'INTIMIDATION',
 'KIDNAPPING',
 'LIQUOR LAW VIOLATION',
 'MOTOR VEHICLE THEFT',
 'NARCOTICS',
 'NON - CRIMINAL',
 'NON-CRIMINAL',
 'NON-CRIMINAL (SUBJECT SPECIFIED)',
 'OBSCENITY',
 'OFFENSE INVOLVING CHILDREN',
 'OTHER NARCOTIC VIOLATION',
 'OTHER OFFENSE',
 'PROSTITUTION',
 'PUBLIC INDECENCY',
 'PUBLIC PEACE VIOLATION',
 'RITUALISM',
 'ROBBERY',
 'SEX OFFENSE',
 'STALKING',
 'THEFT',
 'WEAPONS VIOLATION'}

From here, I dumped the types into crime_bins.csv and categorized by hand. I used this breakdown for help.