The basic function pycno() accepts a GeoPandas GeoDataFrame, the column name of the field that you wish to smooth and the desired cell size of the output raster.
The algorithm then iterative smooths the chosen data values, adjusting for non-negativity and preserving volume. The parameter r allows the adjustment of the realxation parameter, but in practice the default is fine. The converge parameter sets the stopping point, which is calculated as the initial maximum value * 10 raise to the negative of the converge parameter. Generally, 3 is a good default, but it may need adjustment for some applications, generally the larger the converge parameter, the more iterations are required and the longer the function takes to complete and return an output.
The handle_nulls parameter is based on Tobler's discussion of boundary conditions. The basic implementation (handle_nulls = False) uses the standard numpy convolve function and sets nodata values to 0. Tobler calls this the Dirichlet condition, acknowledging that values other than 0 are possible dependent upon context, although not implemented here. The other option, Tobler calls the Neumann condition, in which the density gradient at edges is set to zero. This is not implemented here, instead the astropy convolve function is used in preference, which has much the same effect. The astropy convolve function replaces null values using the kernel as an interpolation function (handle_nulls = True), this is the default option.
The function outputs a 2D numpy array, geotransformation information, and the coordinate reference system.
As the main output is a numpy array, the matplotlib.pyplot function imshow() can be used to get a quick look at the output.
The save_pycno() helper function takes the outputs from pycno() and saves the array as a Raster using Rasterio, which can then be used in a GIS etc.
The extract_values() function allows for the estimation of value sums from the smooth pycnophylactic interpolation surface to a set of polygons.
NB The field used for smoothing must be non-negative, it is also assumed that an equal-area projection is being used. The GeoDataFrame should be structured so that 1 row is 1 record. Multipolygons are fine, but if they are split into singleparts you will likely get an incorrect output.
Tobler W. 1979. Smooth Pycnophylactic Interpolation for Geographical Regions. Journal of the American Statistical Association. 74: 367: 519-530.
The code here is (for the most part) adapted from Chris Brunsdon's (https://github.com/chrisbrunsdon) R package 'pycno' - https://github.com/cran/pycno
In [2]:
# Some required libraries
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
# import the pycnophylactic smoothing function
from pycno.pycno import pycno, save_pycno, extract_values
%matplotlib inline
In [2]:
# Get some data for London
# Read in London Boroughs geojson and project to British National Grid.
ldn_boro = gpd.read_file('data/LDN_Boro.geojson').to_crs(epsg=27700)
# Get some population data for London Boroughs
ldn_pop = pd.read_excel('data/Pandas_Lon_Pop.xlsx')
# Merge the population data with the geospatial data
ldn_boro = ldn_boro.merge(ldn_pop, how='left', left_on='GSS_CODE', right_on='New Code')
In [5]:
# Create the smooth pycnophylactic surface
res, trans, crs = pycno(ldn_boro,2015,50,converge=4,verbose=False)
# NB runtime warnings derive from np.nan not being handled by the < and abs operators. Nothing to worry about here.
# Save the output as a raster
save_pycno(res,trans,crs,'london_pycno.tif')
In [7]:
fig, (ax1,ax2) = plt.subplots(1,2,figsize=(16,8))
# ax1
ldn_boro['popdense2015'] = ldn_boro[2015]/(ldn_boro['geometry'].area * 1000000)
ldn_boro.plot(column = 'popdense2015',cmap = 'Reds', scheme = 'Quantiles', k=5,ax=ax1)
ax1.axis('equal')
# ax2
ax2.imshow(res, cmap = 'Reds')
ax2.axis('equal')
Out[7]:
In [12]:
# Check that volumes have been preserved by extracting the estimates back to the original
ldn_pop_est = extract_values(res,ldn_boro,trans)
ldn_pop_est.head()
Out[12]: