In [1]:

    
# Allow us to load `open_cp` without installing
import sys, os.path
sys.path.insert(0, os.path.abspath(os.path.join("..","..")))

Explore the geo-coding of the chicago dataset

The chicago crime data provides both:

block-level addresses, sanitised to the form "010XX N CENTRAL PARK AVE"
location data in "coordinates" and "longitude/latitude"

We shall see that the coordinates and lon/lat data are linked by projecting in EPSG:3435. However, the relationship between the block-level addresses and the coordinates is complicated, to say the least.



In [2]:

    
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import geopandas as gpd
import os, lzma

import open_cp.sources.chicago as chicago

Check the projection

Our library does not return the "coordinates", so we shall manually read them from the CSV file and join them.

A bit of browsing around lead me to EPSG 3435 which is "NAD83 / Illinois East (ftUS)"

Of note is that this website says the data is in "US feet"
Out of the box, this just doesn't work.
But if I manually correct for the units (i.e. convert from meters to US feet) then it gives an exact agreement.

Here, by "exact" agreement we mean actual equality, once we round to the nearest foot.

Going the other way, and starting with the x/y coords and projecting back to longitude / latitude leads to small rounding errors.

This suggests that the lon/lat data is "primary", and the coordinates are computed from it.



In [3]:

    
# Set "empty_geometry" to allow us to re-project; but we cannot save such a file!
frame = chicago.load_to_geoDataFrame(empty_geometry="empty")
frame.head()









    Out[3]:






  
    
      
      address
      case
      crime
      geometry
      location
      timestamp
      type
    
  
  
    
      0
      010XX N CENTRAL PARK AVE
      HZ560767
      OTHER OFFENSE
      POINT (-87.71645415899999 41.899712716)
      APARTMENT
      2016-12-22T02:55:00
      VIOLATE ORDER OF PROTECTION
    
    
      1
      051XX S WASHTENAW AVE
      HZ561134
      BATTERY
      POINT (-87.691539994 41.800445234)
      RESIDENTIAL YARD (FRONT/BACK)
      2016-12-22T11:17:00
      AGGRAVATED: OTHER FIREARM
    
    
      2
      059XX W DIVERSEY AVE
      HZ565584
      DECEPTIVE PRACTICE
      POINT (-87.774165121 41.931166274)
      RESIDENCE
      2016-12-09T12:00:00
      FINANCIAL IDENTITY THEFT $300 AND UNDER
    
    
      3
      001XX N STATE ST
      HZ561772
      THEFT
      POINT (-87.62787669799999 41.883500187)
      DEPARTMENT STORE
      2016-12-22T18:50:00
      RETAIL THEFT
    
    
      4
      008XX N MICHIGAN AVE
      HZ561969
      THEFT
      POINT (-87.624095634 41.897982937)
      SMALL RETAIL STORE
      2016-12-22T19:20:00
      RETAIL THEFT



In [4]:

    
import csv

xcoords, ycoords = [], []
with open(chicago._default_filename, "rt") as file:
    reader = csv.reader(file)
    header = next(reader)
    xi = header.index("X COORDINATE")
    yi = header.index("Y COORDINATE")
    for row in reader:
        if row[xi] is not "":
            xcoords.append(int(row[xi]))
            ycoords.append(int(row[yi]))
        else:
            xcoords.append(None)
            ycoords.append(None)

frame["xcoord"] = xcoords
frame["ycoord"] = ycoords
frame.head()









    Out[4]:






  
    
      
      address
      case
      crime
      geometry
      location
      timestamp
      type
      xcoord
      ycoord
    
  
  
    
      0
      010XX N CENTRAL PARK AVE
      HZ560767
      OTHER OFFENSE
      POINT (-87.71645415899999 41.899712716)
      APARTMENT
      2016-12-22T02:55:00
      VIOLATE ORDER OF PROTECTION
      1152189.0
      1906649.0
    
    
      1
      051XX S WASHTENAW AVE
      HZ561134
      BATTERY
      POINT (-87.691539994 41.800445234)
      RESIDENTIAL YARD (FRONT/BACK)
      2016-12-22T11:17:00
      AGGRAVATED: OTHER FIREARM
      1159242.0
      1870526.0
    
    
      2
      059XX W DIVERSEY AVE
      HZ565584
      DECEPTIVE PRACTICE
      POINT (-87.774165121 41.931166274)
      RESIDENCE
      2016-12-09T12:00:00
      FINANCIAL IDENTITY THEFT $300 AND UNDER
      1136403.0
      1918003.0
    
    
      3
      001XX N STATE ST
      HZ561772
      THEFT
      POINT (-87.62787669799999 41.883500187)
      DEPARTMENT STORE
      2016-12-22T18:50:00
      RETAIL THEFT
      1176352.0
      1900927.0
    
    
      4
      008XX N MICHIGAN AVE
      HZ561969
      THEFT
      POINT (-87.624095634 41.897982937)
      SMALL RETAIL STORE
      2016-12-22T19:20:00
      RETAIL THEFT
      1177338.0
      1906213.0



In [6]:

    
frame[frame.address=="019XX E 73RD ST"].head()









    Out[6]:






  
    
      
      address
      case
      crime
      geometry
      location
      timestamp
      type
      xcoord
      ycoord
    
  
  
    
      4209
      019XX E 73RD ST
      HZ567200
      BURGLARY
      POINT (-87.578760989 41.76248689)
      VACANT LOT/LAND
      2016-12-27T17:00:00
      FORCIBLE ENTRY
      1190114.0
      1856944.0
    
    
      21540
      019XX E 73RD ST
      JA208153
      ASSAULT
      POINT (-87.578980927 41.76248559)
      STREET
      2017-04-01T03:08:00
      AGGRAVATED: HANDGUN
      1190054.0
      1856943.0
    
    
      48182
      019XX E 73RD ST
      HZ431786
      BURGLARY
      POINT (-87.576891409 41.762507528)
      OTHER
      2016-09-13T05:00:00
      FORCIBLE ENTRY
      1190624.0
      1856956.0
    
    
      67209
      019XX E 73RD ST
      HZ285311
      BATTERY
      POINT (-87.577639248 41.762498728)
      STREET
      2016-05-29T22:09:00
      DOMESTIC BATTERY SIMPLE
      1190420.0
      1856951.0
    
    
      96195
      019XX E 73RD ST
      HZ464497
      DECEPTIVE PRACTICE
      POINT (-87.577639248 41.762498728)
      RESIDENCE
      2016-10-04T10:50:00
      FRAUD OR CONFIDENCE GAME
      1190420.0
      1856951.0



In [ ]:



In [ ]:



In [5]:

    
# Check geometry is empty if and only if coords are
geo_null = frame.geometry.map(lambda pt : pt.is_empty)
np.all(frame.xcoord.isnull() == geo_null), np.all(frame.ycoord.isnull() == geo_null), np.sum(~geo_null)









    Out[5]:





(True, True, 254965)



In [6]:

    
frame = frame.dropna()
len(frame)









    Out[6]:





254965



In [7]:

    
# The x/y coordinates are integers in the CSV, so treat them as such
frame.xcoord = frame.xcoord.astype(np.int)
frame.ycoord = frame.ycoord.astype(np.int)



In [8]:

    
# Backup the lon/lat coords and the project
frame["coords"] = frame.geometry.copy()
frame = frame.to_crs({'init': 'epsg:3435', "preserve_units":True})



In [9]:

    
np.max((frame.xcoord - frame.geometry.map(lambda pt : pt.coords[0][0]))**2
    + (frame.ycoord - frame.geometry.map(lambda pt : pt.coords[0][1]))**2)









    Out[9]:





5.2056829767243174e-08

Rounding

Suppose we round the projected coordinates to the nearest foot. We find that we get an exact agreement.



In [10]:

    
np.all( frame.xcoord - np.round(frame.geometry.map(lambda pt : pt.coords[0][0])) == 0 )









    Out[10]:





True



In [11]:

    
np.all( frame.ycoord - np.round(frame.geometry.map(lambda pt : pt.coords[0][1])) == 0 )









    Out[11]:





True

Starting with projectd coordinates

Suppose instead we start with the projected coordinates, and convert these back to lon/lat. In the CSV file, the lon/lat coords are rounded to 9 decimal places. We find that there are small rounding errors in this case.



In [12]:

    
import shapely.geometry

frame.geometry = [shapely.geometry.Point(x,y) for x, y in zip(frame.xcoord, frame.ycoord)]



In [13]:

    
frame = frame.to_crs({'init': 'epsg:4326'})
frame.head()









    Out[13]:






  
    
      
      address
      case
      crime
      geometry
      location
      timestamp
      type
      xcoord
      ycoord
      coords
    
  
  
    
      0
      010XX N CENTRAL PARK AVE
      HZ560767
      OTHER OFFENSE
      POINT (-87.71645415873341 41.89971271596254)
      APARTMENT
      2016-12-22T02:55:00
      VIOLATE ORDER OF PROTECTION
      1152189
      1906649
      POINT (-87.71645415899999 41.899712716)
    
    
      1
      051XX S WASHTENAW AVE
      HZ561134
      BATTERY
      POINT (-87.69153999399755 41.8004452337576)
      RESIDENTIAL YARD (FRONT/BACK)
      2016-12-22T11:17:00
      AGGRAVATED: OTHER FIREARM
      1159242
      1870526
      POINT (-87.691539994 41.800445234)
    
    
      2
      059XX W DIVERSEY AVE
      HZ565584
      DECEPTIVE PRACTICE
      POINT (-87.77416512059659 41.93116627430756)
      RESIDENCE
      2016-12-09T12:00:00
      FINANCIAL IDENTITY THEFT $300 AND UNDER
      1136403
      1918003
      POINT (-87.774165121 41.931166274)
    
    
      3
      001XX N STATE ST
      HZ561772
      THEFT
      POINT (-87.62787669753108 41.88350018711407)
      DEPARTMENT STORE
      2016-12-22T18:50:00
      RETAIL THEFT
      1176352
      1900927
      POINT (-87.62787669799999 41.883500187)
    
    
      4
      008XX N MICHIGAN AVE
      HZ561969
      THEFT
      POINT (-87.62409563378577 41.89798293721393)
      SMALL RETAIL STORE
      2016-12-22T19:20:00
      RETAIL THEFT
      1177338
      1906213
      POINT (-87.624095634 41.897982937)



In [14]:

    
frame[frame.coords.map(lambda pt : pt.coords[0][0]) !=
      frame.geometry.map(lambda pt : float("{:.9f}".format(pt.coords[0][0] )))].head()









    Out[14]:






  
    
      
      address
      case
      crime
      geometry
      location
      timestamp
      type
      xcoord
      ycoord
      coords
    
  
  
    
      97005
      013XX W ANCONA ST
      HZ379958
      BATTERY
      POINT (-87.66050584649999 41.89391280219293)
      RESIDENCE PORCH/HALLWAY
      2016-08-05T19:55:00
      AGGRAVATED: OTHER DANG WEAPON
      1167437
      1904650
      POINT (-87.660505847 41.893912802)
    
    
      132485
      061XX W EDDY ST
      HZ490603
      BATTERY
      POINT (-87.78022858449999 41.94481522745974)
      RESIDENCE
      2016-10-26T21:00:00
      AGGRAVATED DOMESTIC BATTERY: OTHER DANG WEAPON
      1134721
      1922966
      POINT (-87.780228585 41.944815227)
    
    
      137638
      013XX W ANCONA ST
      JA157851
      THEFT
      POINT (-87.66050584649999 41.89391280219293)
      RESIDENTIAL YARD (FRONT/BACK)
      2017-02-16T10:30:00
      OVER $500
      1167437
      1904650
      POINT (-87.660505847 41.893912802)
    
    
      152276
      013XX W ANCONA ST
      JA167559
      BURGLARY
      POINT (-87.66050584649999 41.89391280219293)
      RESIDENCE
      2017-02-25T19:00:00
      UNLAWFUL ENTRY
      1167437
      1904650
      POINT (-87.660505847 41.893912802)
    
    
      152867
      013XX W ANCONA ST
      JA167123
      CRIMINAL DAMAGE
      POINT (-87.66050584649999 41.89391280219293)
      APARTMENT
      2017-02-25T17:54:00
      TO PROPERTY
      1167437
      1904650
      POINT (-87.660505847 41.893912802)

Visualise the geocoding

We'll use TileMapBase to visualise where the coordinates land. Use EPSG:3857 to conform the "web mercator" projection.

(Sadly, plotting from geoPandas seems very slow. So we cheat, and use some knowledge.)



In [15]:

    
import tilemapbase
tilemapbase.start_logging()



In [16]:

    
frame = chicago.load_to_geoDataFrame().dropna()
frame = frame.to_crs({"init":"EPSG:3857"})



In [17]:

    
lat, lon = 41.895937, -87.677092
region = 0.01

extent1 = tilemapbase.extent(lon-region, lon+region, lat-region, lat+region, 500, 500)
extent1.project_3857()

lat, lon = 41.895379, -87.721192
extent2 = tilemapbase.extent(lon-region, lon+region, lat-region, lat+region, 500, 500)
extent2.project_3857()

fig, ax = plt.subplots(ncols=2, figsize=(16,8))
extent1.plothq(ax[0], tilemapbase.tiles.OSM)
extent2.plothq(ax[1], tilemapbase.tiles.OSM)

x = frame.geometry.map(lambda pt : pt.coords[0][0])
y = frame.geometry.map(lambda pt : pt.coords[0][1])
ax[0].scatter(x, y, color="black", marker="x", alpha=0.5)
ax[1].scatter(x, y, color="black", marker="x", alpha=0.5)
None



In [18]:

    
lat, lon = 41.8786, -87.7493
region = 0.005
extent1 = tilemapbase.extent(lon-region, lon+region, lat-region, lat+region, 500, 500)
extent1.project_3857()

fig, ax = plt.subplots(ncols=2, figsize=(16,8))
extent1.plothq(ax[0], tilemapbase.tiles.OSM)
ax[0].scatter(x, y, color="black", marker="x", alpha=0.5)
None

Re-create plots from Rosser, Cheng

Let us try to recreate the plots from page 3 (Figure 1) of:

Rosser, Cheng, "Improving the Robustness and Accuracy of Crime Prediction with the Self-Exciting Point Process Through Isotropic Triggering" Appl. Spatial Analysis DOI:10.1007/s12061-016-9198-y

For the purposes of this study, we use open access crime data records from the City of Chicago covering a 1 year period starting on 1st March 2011



In [19]:

    
import datetime

filename_all = os.path.join("..", "..", "open_cp", "sources", "chicago_all.csv.xz")

def in_time_range(time_string):
    dt = datetime.datetime.strptime(time_string, "%Y-%m-%dT%H:%M:%S")
    delta = dt - datetime.datetime(2011, 3, 1)
    return delta >= datetime.timedelta(0) and delta <= datetime.timedelta(days=365)

with lzma.open(filename_all, "rt") as file:
    data2011 = [ event for event in chicago.generate_GeoJSON_Features(file, type="all")
                if event["properties"]["crime"] == "BURGLARY"
                    and in_time_range(event["properties"]["timestamp"]) ]

data2011 = gpd.GeoDataFrame.from_features(data2011)
data2011.crs = {"init":"EPSG:4326"} # Lon/Lat native coords
data2011.head()









    Out[19]:






  
    
      
      address
      case
      crime
      geometry
      location
      timestamp
      type
    
  
  
    
      0
      010XX N HAMLIN AVE
      HT346569
      BURGLARY
      POINT (-87.72134357 41.900059954)
      RESIDENCE
      2011-06-13T17:30:00
      FORCIBLE ENTRY
    
    
      1
      105XX S CORLISS AVE
      HT457173
      BURGLARY
      None
      VACANT LOT/LAND
      2011-08-10T15:15:00
      FORCIBLE ENTRY
    
    
      2
      023XX E 70TH ST
      HT183222
      BURGLARY
      POINT (-87.567306284 41.768069819)
      APARTMENT
      2011-03-01T00:45:00
      HOME INVASION
    
    
      3
      0000X S WESTERN AVE
      HT183281
      BURGLARY
      POINT (-87.68642135499999 41.880842701)
      SMALL RETAIL STORE
      2011-03-01T04:07:00
      FORCIBLE ENTRY
    
    
      4
      054XX W DEVON AVE
      HT183321
      BURGLARY
      POINT (-87.764348762 41.997327626)
      RESTAURANT
      2011-03-01T01:40:00
      FORCIBLE ENTRY



In [20]:

    
data2011 = data2011.dropna()
data2011 = data2011.to_crs({"init":"EPSG:3857"})



In [21]:

    
top, left = 41.8019, -87.6258
bottom, right = 41.7724, -87.5865
region = max(top - bottom, right - left) / 2
lon, lat = (left + right)/2, (top + bottom)/2
extent1 = tilemapbase.extent(lon-region, lon+region, lat-region, lat+region, 500, 500)
extent1.project_3857()

top, left = 41.7125, -87.7203
bottom, right = 41.6796, -87.6793
region = max(top - bottom, right - left) / 2
lon, lat = (left + right)/2, (top + bottom)/2
extent2 = tilemapbase.extent(lon-region, lon+region, lat-region, lat+region, 500, 500)
extent2.project_3857()

fig, ax = plt.subplots(ncols=2, figsize=(16,8))
extent1.zoom += 1
extent1.plothq(ax[0], tilemapbase.tiles.Stamen_Toner)
extent2.zoom += 1
extent2.plothq(ax[1], tilemapbase.tiles.Stamen_Toner)

x = data2011.geometry.map(lambda pt : pt.coords[0][0])
y = data2011.geometry.map(lambda pt : pt.coords[0][1])
ax[0].scatter(x, y, color="red", marker="o", alpha=0.5)
ax[1].scatter(x, y, color="red", marker="o", alpha=0.5)
None

Explore unusual encoding issues

These have been discovered from the work on the other notebook, trying to link the points to "edges" in the TIGER address dataset from the US Census Bureau.



In [22]:

    
def plot_block(block_name, buffer, ax):
    block = frame[frame.address==block_name]
    extent = tilemapbase.extent_from_frame(block, 600, buffer)
    extent.plothq(ax, tilemapbase.tiles.OSM, alpha=0.7)
    x = block.geometry.map(lambda pt : pt.coords[0][0])
    y = block.geometry.map(lambda pt : pt.coords[0][1])
    ax.scatter(x, y, color="black", marker="x", alpha=1)

Split roads

Where the east/west (or north/south) lanes of a major road are divided. Are the 2 sets of events really disjoint?



In [23]:

    
fig, ax = plt.subplots(ncols=2, figsize=(16,7))
plot_block('002XX S WACKER DR', 60, ax[0])
plot_block('030XX S DR MARTIN LUTHER KING JR DR', 700, ax[1])

What look like errors in the dataset



In [24]:

    
fig, ax = plt.subplots(ncols=2, figsize=(16,7))
plot_block('0000X E GRAND AVE', 20, ax[0])
plot_block('016XX W 66TH ST', 20, ax[1])

Where the geocoding disagrees with OpenStreetMap

Here we mis-align with roads, but this appears to be a disagreement between the TIGER data and OSM.



In [25]:

    
fig, ax = plt.subplots(ncols=1, figsize=(16,7))
plot_block('092XX S PRINCETON AVE', 100, ax)

Unusual geography

Especially in the 1st plot, are we meant to infer anything from the coordinates?



In [27]:

    
fig, ax = plt.subplots(ncols=2, figsize=(16,7))
plot_block('130XX S DR MARTIN LUTHER KING JR DR', 100, ax[0])
plot_block('042XX W 77TH ST', 2000, ax[1])

Other strange codings; work in progress



In [28]:

    
block = frame[frame.address=='067XX S STONY ISLAND AVE']

extent = tilemapbase.extent_from_frame(block, 600, 100)

fig, ax = plt.subplots(figsize=(8,8))
extent.plothq(ax, tilemapbase.tiles.OSM)

x = block.geometry.map(lambda pt : pt.coords[0][0])
y = block.geometry.map(lambda pt : pt.coords[0][1])
ax.scatter(x, y, color="black", marker="x", alpha=0.5)
None



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:

Looking at the distribution of locations by block

We'll now (re)load the data, project the same way but in meters. Then for each block, we'll look at the all coordinates. We'll then translate these to have zero mean (i.e. measure the distance from the centroid).



In [29]:

    
frame = chicago.load_to_geoDataFrame(empty_geometry="empty")



In [30]:

    
# Meters units and projection suitable for Illinois
frame = frame.to_crs({"init":"EPSG:3435"})
frame.head()









    Out[30]:






  
    
      
      address
      case
      crime
      geometry
      location
      timestamp
      type
    
  
  
    
      0
      010XX N CENTRAL PARK AVE
      HZ560767
      OTHER OFFENSE
      POINT (1152188.999927331 1906649.000013127)
      APARTMENT
      2016-12-22T02:55:00
      VIOLATE ORDER OF PROTECTION
    
    
      1
      051XX S WASHTENAW AVE
      HZ561134
      BATTERY
      POINT (1159241.999998677 1870526.000088323)
      RESIDENTIAL YARD (FRONT/BACK)
      2016-12-22T11:17:00
      AGGRAVATED: OTHER FIREARM
    
    
      2
      059XX W DIVERSEY AVE
      HZ565584
      DECEPTIVE PRACTICE
      POINT (1136402.999890962 1918002.999887207)
      RESIDENCE
      2016-12-09T12:00:00
      FINANCIAL IDENTITY THEFT $300 AND UNDER
    
    
      3
      001XX N STATE ST
      HZ561772
      THEFT
      POINT (1176351.999872658 1900926.999957381)
      DEPARTMENT STORE
      2016-12-22T18:50:00
      RETAIL THEFT
    
    
      4
      008XX N MICHIGAN AVE
      HZ561969
      THEFT
      POINT (1177337.999942323 1906212.999921561)
      SMALL RETAIL STORE
      2016-12-22T19:20:00
      RETAIL THEFT



In [31]:

    
frame.location.unique()









    Out[31]:





array(['APARTMENT', 'RESIDENTIAL YARD (FRONT/BACK)', 'RESIDENCE',
       'DEPARTMENT STORE', 'SMALL RETAIL STORE', 'VACANT LOT/LAND',
       'HOSPITAL BUILDING/GROUNDS', 'OTHER', 'CHA PARKING LOT/GROUNDS',
       'RESIDENCE-GARAGE', 'CHA APARTMENT', 'STREET',
       'SCHOOL, PUBLIC, GROUNDS', 'RESTAURANT', 'ALLEY',
       'PARKING LOT/GARAGE(NON.RESID.)', 'COMMERCIAL / BUSINESS OFFICE',
       'SIDEWALK', 'RESIDENCE PORCH/HALLWAY', '', 'CLEANING STORE',
       'NURSING HOME/RETIREMENT HOME', 'ATM (AUTOMATIC TELLER MACHINE)',
       'VEHICLE NON-COMMERCIAL', 'SCHOOL, PUBLIC, BUILDING', 'LIBRARY',
       'BANK', 'GAS STATION', 'FACTORY/MANUFACTURING BUILDING', 'TAXICAB',
       'GROCERY FOOD STORE', 'GOVERNMENT BUILDING/PROPERTY', 'CTA TRAIN',
       'BARBERSHOP', 'VEHICLE-COMMERCIAL', 'CTA BUS', 'AIRCRAFT',
       'AIRPORT TERMINAL UPPER LEVEL - NON-SECURE AREA', 'ATHLETIC CLUB',
       'AIRPORT VENDING ESTABLISHMENT', 'POLICE FACILITY/VEH PARKING LOT',
       'DRUG STORE', 'CTA PLATFORM', 'CTA BUS STOP', 'AIRPORT PARKING LOT',
       'AIRPORT TERMINAL UPPER LEVEL - SECURE AREA', 'CURRENCY EXCHANGE',
       'HOTEL/MOTEL', 'PARK PROPERTY', 'OTHER COMMERCIAL TRANSPORTATION',
       'MEDICAL/DENTAL OFFICE', 'BAR OR TAVERN', 'HOUSE',
       'CONVENIENCE STORE', 'CTA STATION',
       'AIRPORT TERMINAL LOWER LEVEL - NON-SECURE AREA',
       'TAVERN/LIQUOR STORE', 'FIRE STATION', 'SCHOOL, PRIVATE, BUILDING',
       'ABANDONED BUILDING', 'BOWLING ALLEY',
       'CHURCH/SYNAGOGUE/PLACE OF WORSHIP',
       'CHA HALLWAY/STAIRWELL/ELEVATOR', 'DRIVEWAY - RESIDENTIAL',
       'SPORTS ARENA/STADIUM',
       'AIRPORT BUILDING NON-TERMINAL - NON-SECURE AREA', 'CAR WASH',
       'VEHICLE - OTHER RIDE SERVICE', 'WAREHOUSE',
       'JAIL / LOCK-UP FACILITY', 'CTA GARAGE / OTHER PROPERTY',
       'OTHER RAILROAD PROP / TRAIN DEPOT', 'AIRPORT/AIRCRAFT', 'YARD',
       'PORCH', 'APPLIANCE STORE', 'AIRPORT TRANSPORTATION SYSTEM (ATS)',
       'MOVIE HOUSE/THEATER', 'AUTO', 'CONSTRUCTION SITE',
       'DAY CARE CENTER', 'AIRPORT BUILDING NON-TERMINAL - SECURE AREA',
       'AIRPORT EXTERIOR - NON-SECURE AREA', 'SCHOOL, PRIVATE, GROUNDS',
       'COIN OPERATED MACHINE', 'CHA PARKING LOT', 'SAVINGS AND LOAN',
       'COLLEGE/UNIVERSITY GROUNDS', 'GAS STATION DRIVE/PROP.',
       'PAWN SHOP', 'AIRPORT TERMINAL LOWER LEVEL - SECURE AREA',
       'ANIMAL HOSPITAL', 'AIRPORT TERMINAL MEZZANINE - NON-SECURE AREA',
       'OFFICE', 'COLLEGE/UNIVERSITY RESIDENCE HALL',
       'AIRPORT EXTERIOR - SECURE AREA', 'POOL ROOM', 'CEMETARY',
       'BOAT/WATERCRAFT', 'VACANT LOT', 'BRIDGE',
       'LAKEFRONT/WATERFRONT/RIVERBANK', 'HIGHWAY/EXPRESSWAY', 'TAVERN',
       'CREDIT UNION', 'FEDERAL BUILDING', 'CLUB', 'NEWSSTAND',
       'CHA HALLWAY', 'STAIRWELL', 'NURSING HOME', 'PARKING LOT',
       'VEHICLE - DELIVERY TRUCK', 'BARBER SHOP/BEAUTY SALON', 'VESTIBULE',
       'HALLWAY', 'CTA TRACKS - RIGHT OF WAY', 'RETAIL STORE',
       'SCHOOL YARD', 'FOREST PRESERVE', 'GARAGE',
       'AUTO / BOAT / RV DEALERSHIP', 'DRIVEWAY', 'LAGOON',
       'CTA "L" TRAIN', 'GANGWAY'], dtype=object)

Find deltas by block



In [32]:

    
import collections

by_block = collections.defaultdict(list)
for _, row in frame.iterrows():
    if row.geometry.is_empty:
        continue
    #if row.crime not in {"ROBBERY", "THEFT"}:
    #    continue
    by_block[row.address].append( list(row.geometry.coords)[0] )



In [33]:

    
for key in set(by_block):
    x = np.asarray(by_block[key])
    x -= np.average(x, axis=0)
    by_block[key] = x

Single out north-south

We choose just the blocks which seem to be north/south (bases on the deltas) and then normalise each block to be in the range [-1,1]



In [34]:

    
def normalise(x, y):
    value = np.vstack([x,y]).T
    if min(y) >= 0:
        if max(y) == 0:
            raise Exception()
        v = value / max(y)
    elif max(y) <= 0:
        if min(y) == 0:
            raise Exception()
        v = value / (-min(y))
    else:
        m = max(max(y), -min(y))
        if m == 0:
            raise Exception()
        v = value / m
    return v

value = np.asarray([[-1,2],[-3,4],[5,6]])
normalise(value[:,0], value[:,1])









    Out[34]:





array([[-0.16666667,  0.33333333],
       [-0.5       ,  0.66666667],
       [ 0.83333333,  1.        ]])



In [35]:

    
ns = []
for key, value in by_block.items():
    x, y = value[:,0], value[:,1]
    if len(x) < 10:
        continue
    dx = max(x) - min(x)
    dy = max(y) - min(y)
    if dy > 10 * dx:
        ns.append(normalise(x,y))

Let's plot all the deltas. These show a strong north/south bias, as we hope. They are on an angle, but I believe this is simply because Chicago is aligned not exactly north/south, but also at a slight angle.

That the line is somewhat think I cannot account for.

The clumps at y=-1 and y=1 are statistical artifacts of rescaling the data to exactly fit into [-1,1].



In [36]:

    
x = np.asarray([x for value in ns for x in value])

fig, ax = plt.subplots(figsize=(12,12))
ax.scatter(x[:,0], x[:,1], alpha=0.002)
ax.set_aspect(.1)
ax.set(ylim=[-1.1,1.1])
None



In [37]:

    
import scipy.stats

Let's look at the y coordinate only, and use a KDE to estimate the density.



In [38]:

    
y_marginals = x[:,1]
kernel = scipy.stats.kde.gaussian_kde(y_marginals)

xc = np.linspace(-1,1,100)
plt.plot(xc, kernel(xc))









    Out[38]:





[<matplotlib.lines.Line2D at 0x22edecf30b8>]

Simulate some data

Let's suppose the data really came from a uniform distribution.

For each entry in ns we simulate the same number of points from a uniform distribution
Convert to having mean zero
Then re-scale in the same way

We also do this with a normal distribution, instead of a uniform distribution. This shows much better agreement with the real data, suggesting that maybe a normal distribution is used?



In [39]:

    
def mean_zero_rescale(values):
    values -= np.average(values)
    v = normalise(np.zeros_like(values), values)
    return v[:,1]

def sim(length):
    values = np.random.random(length)
    return mean_zero_rescale(values)

def sim_gaussian(length):
    values = np.random.normal(size=length)
    return mean_zero_rescale(values)

sim_ns = []

for value in ns:
    length = value.shape[0]
    sim_ns.append(sim_gaussian(length))



In [40]:

    
sim_x = np.asarray([x for value in sim_ns for x in value])
kernel_sim = scipy.stats.kde.gaussian_kde(sim_x)

xc = np.linspace(-1,1,100)
plt.plot(xc, kernel(xc))
plt.plot(xc, kernel_sim(xc))
plt.legend(["Real data", "Simulated gaussian data"])
plt.ylim([0,1])
None

	address	case	crime	geometry	location	timestamp	type
0	010XX N CENTRAL PARK AVE	HZ560767	OTHER OFFENSE	POINT (-87.71645415899999 41.899712716)	APARTMENT	2016-12-22T02:55:00	VIOLATE ORDER OF PROTECTION
1	051XX S WASHTENAW AVE	HZ561134	BATTERY	POINT (-87.691539994 41.800445234)	RESIDENTIAL YARD (FRONT/BACK)	2016-12-22T11:17:00	AGGRAVATED: OTHER FIREARM
2	059XX W DIVERSEY AVE	HZ565584	DECEPTIVE PRACTICE	POINT (-87.774165121 41.931166274)	RESIDENCE	2016-12-09T12:00:00	FINANCIAL IDENTITY THEFT $300 AND UNDER
3	001XX N STATE ST	HZ561772	THEFT	POINT (-87.62787669799999 41.883500187)	DEPARTMENT STORE	2016-12-22T18:50:00	RETAIL THEFT
4	008XX N MICHIGAN AVE	HZ561969	THEFT	POINT (-87.624095634 41.897982937)	SMALL RETAIL STORE	2016-12-22T19:20:00	RETAIL THEFT

	address	case	crime	geometry	location	timestamp	type	xcoord	ycoord
4209	019XX E 73RD ST	HZ567200	BURGLARY	POINT (-87.578760989 41.76248689)	VACANT LOT/LAND	2016-12-27T17:00:00	FORCIBLE ENTRY	1190114.0	1856944.0
21540	019XX E 73RD ST	JA208153	ASSAULT	POINT (-87.578980927 41.76248559)	STREET	2017-04-01T03:08:00	AGGRAVATED: HANDGUN	1190054.0	1856943.0
48182	019XX E 73RD ST	HZ431786	BURGLARY	POINT (-87.576891409 41.762507528)	OTHER	2016-09-13T05:00:00	FORCIBLE ENTRY	1190624.0	1856956.0
67209	019XX E 73RD ST	HZ285311	BATTERY	POINT (-87.577639248 41.762498728)	STREET	2016-05-29T22:09:00	DOMESTIC BATTERY SIMPLE	1190420.0	1856951.0
96195	019XX E 73RD ST	HZ464497	DECEPTIVE PRACTICE	POINT (-87.577639248 41.762498728)	RESIDENCE	2016-10-04T10:50:00	FRAUD OR CONFIDENCE GAME	1190420.0	1856951.0

	address	case	crime	geometry	location	timestamp	type	xcoord	ycoord	coords
97005	013XX W ANCONA ST	HZ379958	BATTERY	POINT (-87.66050584649999 41.89391280219293)	RESIDENCE PORCH/HALLWAY	2016-08-05T19:55:00	AGGRAVATED: OTHER DANG WEAPON	1167437	1904650	POINT (-87.660505847 41.893912802)
132485	061XX W EDDY ST	HZ490603	BATTERY	POINT (-87.78022858449999 41.94481522745974)	RESIDENCE	2016-10-26T21:00:00	AGGRAVATED DOMESTIC BATTERY: OTHER DANG WEAPON	1134721	1922966	POINT (-87.780228585 41.944815227)
137638	013XX W ANCONA ST	JA157851	THEFT	POINT (-87.66050584649999 41.89391280219293)	RESIDENTIAL YARD (FRONT/BACK)	2017-02-16T10:30:00	OVER $500	1167437	1904650	POINT (-87.660505847 41.893912802)
152276	013XX W ANCONA ST	JA167559	BURGLARY	POINT (-87.66050584649999 41.89391280219293)	RESIDENCE	2017-02-25T19:00:00	UNLAWFUL ENTRY	1167437	1904650	POINT (-87.660505847 41.893912802)
152867	013XX W ANCONA ST	JA167123	CRIMINAL DAMAGE	POINT (-87.66050584649999 41.89391280219293)	APARTMENT	2017-02-25T17:54:00	TO PROPERTY	1167437	1904650	POINT (-87.660505847 41.893912802)

	address	case	crime	geometry	location	timestamp	type
0	010XX N HAMLIN AVE	HT346569	BURGLARY	POINT (-87.72134357 41.900059954)	RESIDENCE	2011-06-13T17:30:00	FORCIBLE ENTRY
1	105XX S CORLISS AVE	HT457173	BURGLARY	None	VACANT LOT/LAND	2011-08-10T15:15:00	FORCIBLE ENTRY
2	023XX E 70TH ST	HT183222	BURGLARY	POINT (-87.567306284 41.768069819)	APARTMENT	2011-03-01T00:45:00	HOME INVASION
3	0000X S WESTERN AVE	HT183281	BURGLARY	POINT (-87.68642135499999 41.880842701)	SMALL RETAIL STORE	2011-03-01T04:07:00	FORCIBLE ENTRY
4	054XX W DEVON AVE	HT183321	BURGLARY	POINT (-87.764348762 41.997327626)	RESTAURANT	2011-03-01T01:40:00	FORCIBLE ENTRY