Using the Spatial Statistics Data Object (SSDataObject) Makes Feature IO Simple

  • SSDataObject does the read/write and accounting of feature/attribute and NumPy Array order
  • Write/Utilize methods that take NumPy Arrays

Using NumPy as the common denominator

  • Could use the ArcPy Data Access Module directly, but there are host of issues/information one must take into account:
    • How to deal with projections and other environment settings?
    • How Cursors affect the accounting of features?
    • How to deal with bad records/bad data and error handling?
    • How to honor/account for full field object control?
    • How do I create output features that correspond to my inputs?
      • Points are easy, what about Polygons and Polylines?
  • Spatial Statistics Data Object (SSDataObject)
    • Almost 30 Spatial Statistics Tools written in Python that ${\bf{must}}$ behave like traditional GP Tools
    • Use SSDataObject and your code should adhere

Basic Imports


In [1]:
import arcpy as ARCPY
import numpy as NUM
import SSDataObject as SSDO

Initialize and Load Fields into Spatial Statsitics Data Object

  • The Unique ID Field ("MYID" in this example) will keep track of the order of your features
    • You can use ssdo.oidName as your Unique ID Field
    • You have no control over Object ID Fields. It is quick, assures "uniqueness", but can't assume they will not get "scrambled" during copies.
    • To assure full control I advocate the "Add Field (LONG)" --> "Calculate Field (From Object ID)" workflow.

In [2]:
inputFC = r'../data/CA_Polygons.shp'
ssdo = SSDO.SSDataObject(inputFC)
ssdo.obtainData("MYID", ['GROWTH', 'LOGPCR69', 'PERCNOHS', 'POP1969'])
df = ssdo.getDataFrame()
print(df.head())


       GROWTH  LOGPCR69  PERCNOHS  POP1969
158  0.011426  0.176233      37.0  1060099
159 -0.137376  0.214186      38.3      398
160 -0.188417  0.067722      41.4    11240
161 -0.085070 -0.118248      42.9   101057
162 -0.049022 -0.081377      48.1    13328

You can get your data using the core NumPy Arrays

  • Use .data to get the native data type
  • Use the returnDouble() function to cast explicitly to float

In [4]:
pop69 = ssdo.fields['POP1969']
nativePop69 = pop69.data
floatPop69 = pop69.returnDouble()
print(floatPop69[0:5])


[  1.06009900e+06   3.98000000e+02   1.12400000e+04   1.01057000e+05
   1.33280000e+04]

You can get your data in a PANDAS Data Frame

  • Note the Unique ID Field is used as the Index

In [5]:
df = ssdo.getDataFrame()
print(df.head())


       GROWTH  LOGPCR69  PERCNOHS  POP1969
158  0.011426  0.176233      37.0  1060099
159 -0.137376  0.214186      38.3      398
160 -0.188417  0.067722      41.4    11240
161 -0.085070 -0.118248      42.9   101057
162 -0.049022 -0.081377      48.1    13328

By default the SSDataObject only stores the centroids of the features


In [6]:
df['XCoords'] = ssdo.xyCoords[:,0]
df['YCoords'] = ssdo.xyCoords[:,1]
print(df.head())


       GROWTH  LOGPCR69  PERCNOHS  POP1969       XCoords       YCoords
158  0.011426  0.176233      37.0  1060099 -1.356736e+07  4.503012e+06
159 -0.137376  0.214186      38.3      398 -1.333797e+07  4.637142e+06
160 -0.188417  0.067722      41.4    11240 -1.343007e+07  4.615529e+06
161 -0.085070 -0.118248      42.9   101057 -1.353566e+07  4.789809e+06
162 -0.049022 -0.081377      48.1    13328 -1.341895e+07  4.581597e+06

You can get the core ArcPy Geometries if desired

  • Set requireGeometry = True

In [9]:
ssdo = SSDO.SSDataObject(inputFC)
ssdo.obtainData("MYID", ['GROWTH', 'LOGPCR69', 'PERCNOHS', 'POP1969'],
               requireGeometry = True)
df = ssdo.getDataFrame()
shapes = NUM.array(ssdo.shapes, dtype = object)
df['shapes'] = shapes
print(df.head())


       GROWTH  LOGPCR69  PERCNOHS  POP1969  \
158  0.011426  0.176233      37.0  1060099   
159 -0.137376  0.214186      38.3      398   
160 -0.188417  0.067722      41.4    11240   
161 -0.085070 -0.118248      42.9   101057   
162 -0.049022 -0.081377      48.1    13328   

                                                shapes  
158  (<geoprocessing array object object at 0x00000...  
159  (<geoprocessing array object object at 0x00000...  
160  (<geoprocessing array object object at 0x00000...  
161  (<geoprocessing array object object at 0x00000...  
162  (<geoprocessing array object object at 0x00000...  

Coming Soon... ArcPy Geometry Data Frame Integration

  • In conjunction with the ArcGIS Python SDK
  • Spatial operators on ArcGIS Data Frames: selection, clip, intersection etc.

Creating Output Feature Classes

  • Simple Example: Adding a field of random standard normal values to your input/output
  • appendFields can be used to copy over any fields from the input whether you read them into the SSDataObject or not.
  • E.g. 'NEW_NAME' was never read into Python but it will be copied to the output. This can save you a lot of memory.

In [ ]:
import numpy.random as RAND
import os as OS
ARCPY.env.overwriteOutput = True
outArray = RAND.normal(0,1, (ssdo.numObs,))
outDict = {}
outField = SSDO.CandidateField('STDNORM', 'DOUBLE', outArray, alias = 'Standard Normal')
outDict[outField.name] = outField
outputFC = OS.path.abspath(r'../data/testMyOutput.shp')
ssdo.output2NewFC(outputFC, outDict, appendFields = ['GROWTH', 'PERCNOHS', 'NEW_NAME'])

Advanced Analysis Example: Principle Component Analysis (PCA) via the statsmodels Package

Although the technique is more advanced, the Data IO remains the same.

  1. Use the SSDataObject to read the data you need.
  2. Pass the NumPy Array(s) to the method/class.
  3. Create a CandidateField for every output array that you want to save as a field in the output attribute table.
  4. Let the SSDataObject copy over the input field(s)/shapes and match them to the correct rows in your NumPy Arrays.

In [ ]:
outputFC = OS.path.abspath(r'../data/testMyRollingMeanInfo.shp')
outFields = [ "SLOPE", "INTERCEPT", "R_SQRAURED", "P_VALUE", "STD_ERR" ]
outDict = {}
for fieldInd, fieldName in enumerate(outFields):
    outDict[fieldName] = SSDO.CandidateField(fieldName, "DOUBLE", outArray[:,fieldInd])
ssdo.output2NewFC(outputFC, outDict, fieldOrder = outFields)
del ssdo

Example: Max(p) Regional Clustering


In [ ]:
import statsmodels as STATSMODELS
X = df.as_matrix(columns = ['GROWTH', 'LOGPCR69', 'PERCNOHS', 'POP1969'])
pca = STATSMODELS.PCA(X)

In [ ]:
outputFC = OS.path.abspath(r'../data/testMaxPInfo.shp')
outDict = {}
outDict["REGIONID"] = SSDO.CandidateField("REGIONID", "DOUBLE", outArray)
ssdo.output2NewFC(outputFC, outDict, appendFields = ['GROWTH', 'POP1970', 'PERCNOHS'])
del ssdo

In [ ]: