Copyright 2017, Kevin Kane. This program is distributed under the terms of the GNU General Public License.
Often times in social science research we need to join data "up" to larger units of analysis. For example, data on crime, business establishments, or property transactions is geocoded to a point, but we are interested in analyzing how that information varies over blocks or census tracts.
This script takes a point shapefile and aggregates the points as an attribute of a shapefile of Census Blocks. This code can be executed (with the least amount of setup) using the python terminal in ArcGIS. ArcCatalog is recommended since ArcMap will generally render intermediate layers.
Please note that each original shapefile has been given a unique identifier of FIDnum. This is necesarry for the numpy-based step of joining the aggregated point totals back into blocks, so please ensure that a unique identifier field is used or that a field called FIDnum is made.
This tutorial contains four sections. Run the preamble and file declaration codeblocks, then one of these:
(1) How many points are in each block?
(2) How many of a certain type of point are in each block?
(3) How many of all types are in each block?
(4) Sum the attributes of a point into each block
In [5]:
# Preamble to use for all the below scripts:
import arcpy
import numpy as np
import datetime
from arcpy.sa import*
arcpy.env.overwriteOutput = True
The example here uses blocks in Irvine, CA (from the US Census TIGER files) and a file of points in Irvine intended to represent business establishments business establishments from ReferenceUSA. Business establishments have a (fake) category field bus_type and a (fake) number of employees field EMPfixed
In [22]:
# Set working directory (using double forward slashes) and files
arcpy.env.workspace = "C:\\Users\\Research\\Project"
points = "\\Irvine_businesses_randomized.shp"
arcpy.MakeFeatureLayer_management(points, "points_lyr")
blocks = "\\Irvine_blocks.shp"
output = "\\outfile.shp"
In [3]:
#### (1) How many points are in each block? ####
print 'start: ', datetime.datetime.now()
# Execute spatial join
arcpy.SpatialJoin_analysis(blocks, "points_lyr", output, "#", "#", "#", "CONTAINS")
ids = [row.getValue("FIDnum") for row in arcpy.SearchCursor(blocks)]
jc = [row.getValue("Join_Count") for row in arcpy.SearchCursor(output)]
array = np.array([ids, jc]).transpose()
mytype = np.dtype([('FIDnum', np.int32), ('pts_count', np.int32)])
myrecord = np.core.records.array(list(tuple(array.transpose())), dtype=mytype)
arcpy.da.ExtendTable(blocks, 'FIDnum', myrecord, 'FIDnum')
arcpy.Delete_management(output)
print 'end: ', datetime.datetime.now()
In [9]:
#### (2) How many of a certain type of point are in each block? (e.g. only industry code 54) ####
print 'start: ', datetime.datetime.now()
# Isolate desired subset of points
arcpy.SelectLayerByAttribute_management("points_lyr", "NEW_SELECTION", " \"NAICS2\" = 54 ")
# Execute spatial join
arcpy.SpatialJoin_analysis(blocks, "points_lyr", output, "#", "#", "#", "CONTAINS")
ids = [row.getValue("FIDnum") for row in arcpy.SearchCursor(blocks)]
jc = [row.getValue("Join_Count") for row in arcpy.SearchCursor(output)]
array = np.array([ids, jc]).transpose()
mytype = np.dtype([('FIDnum', np.int32), ('naics54', np.int32)])
myrecord = np.core.records.array(list(tuple(array.transpose())), dtype=mytype)
arcpy.da.ExtendTable(blocks, 'FIDnum', myrecord, 'FIDnum')
arcpy.Delete_management(output)
print 'end: ', datetime.datetime.now()
In [13]:
#### (3) How many of all types are in each block? ####
print 'start: ', datetime.datetime.now()
# Create function to identify all industry codes:
def unique_values(table, field):
with arcpy.da.SearchCursor(table, [field]) as cursor:
return sorted({row[0] for row in cursor})
codes = unique_values(points, "bus_type")
# Execute spatial join FOR JUST THE FIRST FOUR INDUSTRY CODES:
for i in codes[0:3]:
arcpy.SelectLayerByAttribute_management("points_lyr", "NEW_SELECTION", " \"bus_type\" = %s"%i)
arcpy.SpatialJoin_analysis(blocks, "points_lyr", output, "#", "#", "#", "CONTAINS")
ids = [row.getValue("FIDnum") for row in arcpy.SearchCursor(blocks)]
jc = [row.getValue("Join_Count") for row in arcpy.SearchCursor(output)]
array = np.array([ids, jc]).transpose()
mytype = np.dtype([('FIDnum', np.int32), ('estab_%s'%i, np.int32)])
myrecord = np.core.records.array(list(tuple(array.transpose())), dtype=mytype)
arcpy.da.ExtendTable(blocks, 'FIDnum', myrecord, 'FIDnum')
arcpy.Delete_management(output)
print 'end: ', datetime.datetime.now()
In [ ]:
#### (4) Sum the attributes of a point into each block (e.g., number of employees) ####
print 'start: ', datetime.datetime.now()
# Define field mappings
fieldmappings = arcpy.FieldMappings()
fieldmappings.addTable(blocks) # target layer name
fieldmappings.addTable("points_lyr") # join layer name
EmpFieldIndex = fieldmappings.findFieldMapIndex("EMPfixed") # name of the points file's attribute field
fieldmap = fieldmappings.getFieldMap(EmpFieldIndex)
field = fieldmap.outputField
field.name = "sumEmp" # name of the output field
fieldmap.outputField = field
fieldmap.mergeRule = "sum" # merge rule - options also include first, last, sum, mean, median, mode, min, max, count, range
fieldmappings.replaceFieldMap(EmpFieldIndex, fieldmap)
# Execute spatial join
arcpy.SpatialJoin_analysis(blocks, "points_lyr", output, "#", "#", "#", "CONTAINS")
ids = [row.getValue("FIDnum") for row in arcpy.SearchCursor(blocks)]
jc = [row.getValue("Join_Count") for row in arcpy.SearchCursor(output)]
emp = [row.getValue("sumEmp") for row in arcpy.SearchCursor(output)]
array = np.array([ids, jc, emp]).transpose()
mytype = np.dtype([('FIDnum', np.int32), ('pts_count', np.int32), ('sumEmp', np.int32)])
myrecord = np.core.records.array(list(tuple(array.transpose())), dtype=mytype)
arcpy.da.ExtendTable(blocks, 'FIDnum', myrecord, 'FIDnum')
arcpy.Delete_management(output)
print 'end: ', datetime.datetime.now()