Using the Spatial Statistics Data Object (SSDataObject) Makes Feature IO Simple

  • SSDataObject does the read/write and accounting of feature/attribute and NumPy Array order
  • Write/Utilize methods that take NumPy Arrays

Using NumPy as the common denominator

  • Could use the ArcPy Data Access Module directly, but there are host of issues/information one must take into account:
    • How to deal with projections and other environment settings?
    • How Cursors affect the accounting of features?
    • How to deal with bad records/bad data and error handling?
    • How to honor/account for full field object control?
    • How do I create output features that correspond to my inputs?
      • Points are easy, what about Polygons and Polylines?
  • Spatial Statistics Data Object (SSDataObject)
    • Almost 30 Spatial Statistics Tools written in Python that ${\bf{must}}$ behave like traditional GP Tools
    • Use SSDataObject and your code should adhere

Basic Imports


In [15]:
import arcpy as ARCPY
import numpy as NUM
import SSDataObject as SSDO

Initialize and Load Fields into Spatial Statsitics Data Object

  • The Unique ID Field ("MYID" in this example) will keep track of the order of your features
    • You can use ssdo.oidName as your Unique ID Field
    • You have no control over Object ID Fields. It is quick, assures "uniqueness", but can't assume they will not get "scrambled" during copies.
    • To assure full control I advocate the "Add Field (LONG)" --> "Calculate Field (From Object ID)" workflow.

In [16]:
inputFC = r'../data/CA_Polygons.shp'
ssdo = SSDO.SSDataObject(inputFC)
ssdo.obtainData("MYID", ['GROWTH', 'LOGPCR69', 'PERCNOHS', 'POP1969'])
df = ssdo.getDataFrame()
print(df.head())


       GROWTH  LOGPCR69  PERCNOHS  POP1969
158  0.011426  0.176233      37.0  1060099
159 -0.137376  0.214186      38.3      398
160 -0.188417  0.067722      41.4    11240
161 -0.085070 -0.118248      42.9   101057
162 -0.049022 -0.081377      48.1    13328

You can get your data using the core NumPy Arrays

  • Use .data to get the native data type
  • Use the returnDouble() function to cast explicitly to float

In [17]:
pop69 = ssdo.fields['POP1969']
nativePop69 = pop69.data
floatPop69 = pop69.returnDouble()
print(floatPop69[0:5])


[  1.06009900e+06   3.98000000e+02   1.12400000e+04   1.01057000e+05
   1.33280000e+04]

You can get your data in a PANDAS Data Frame

  • Note the Unique ID Field is used as the Index

In [18]:
df = ssdo.getDataFrame()
print(df.head())


       GROWTH  LOGPCR69  PERCNOHS  POP1969
158  0.011426  0.176233      37.0  1060099
159 -0.137376  0.214186      38.3      398
160 -0.188417  0.067722      41.4    11240
161 -0.085070 -0.118248      42.9   101057
162 -0.049022 -0.081377      48.1    13328

By default the SSDataObject only stores the centroids of the features


In [19]:
df['XCoords'] = ssdo.xyCoords[:,0]
df['YCoords'] = ssdo.xyCoords[:,1]
print(df.head())


       GROWTH  LOGPCR69  PERCNOHS  POP1969       XCoords       YCoords
158  0.011426  0.176233      37.0  1060099 -1.356736e+07  4.503012e+06
159 -0.137376  0.214186      38.3      398 -1.333797e+07  4.637142e+06
160 -0.188417  0.067722      41.4    11240 -1.343007e+07  4.615529e+06
161 -0.085070 -0.118248      42.9   101057 -1.353566e+07  4.789809e+06
162 -0.049022 -0.081377      48.1    13328 -1.341895e+07  4.581597e+06

You can get the core ArcPy Geometries if desired

  • Set requireGeometry = True

In [20]:
ssdo = SSDO.SSDataObject(inputFC)
ssdo.obtainData("MYID", ['GROWTH', 'LOGPCR69', 'PERCNOHS', 'POP1969'],
               requireGeometry = True)
df = ssdo.getDataFrame()
shapes = NUM.array(ssdo.shapes, dtype = object)
df['shapes'] = shapes
print(df.head())


       GROWTH  LOGPCR69  PERCNOHS  POP1969  \
158  0.011426  0.176233      37.0  1060099   
159 -0.137376  0.214186      38.3      398   
160 -0.188417  0.067722      41.4    11240   
161 -0.085070 -0.118248      42.9   101057   
162 -0.049022 -0.081377      48.1    13328   

                                                shapes  
158  (<geoprocessing array object object at 0x00000...  
159  (<geoprocessing array object object at 0x00000...  
160  (<geoprocessing array object object at 0x00000...  
161  (<geoprocessing array object object at 0x00000...  
162  (<geoprocessing array object object at 0x00000...  

Coming Soon... ArcPy Geometry Data Frame Integration

  • In conjunction with the ArcGIS Python SDK
  • Spatial operators on ArcGIS Data Frames: selection, clip, intersection etc.

Creating Output Feature Classes

  • Simple Example: Adding a field of random standard normal values to your input/output
  • appendFields can be used to copy over any fields from the input whether you read them into the SSDataObject or not.
  • E.g. 'NEW_NAME' was never read into Python but it will be copied to the output. This can save you a lot of memory.

In [21]:
import numpy.random as RAND
import os as OS
ARCPY.env.overwriteOutput = True
outArray = RAND.normal(0,1, (ssdo.numObs,))
outDict = {}
outField = SSDO.CandidateField('STDNORM', 'DOUBLE', outArray, alias = 'Standard Normal')
outDict[outField.name] = outField
outputFC = OS.path.abspath(r'../data/testMyOutput.shp')
ssdo.output2NewFC(outputFC, outDict, appendFields = ['GROWTH', 'PERCNOHS', 'NEW_NAME'])

In [ ]: