Comparison of different Python APIs for working with geospatial data

Some performance suggestions:

For Shapely and GeoPandas, use shapely.speedups.enable().
For PostgreSQL/PostGIS, tune the database settings. A quick way to get faster performance is to use the settings suggested by PgTune. Select "Data warehouses" for "DB Type" and minimize the number of connections. If using a fast hard drive, like an SSD, you can also adjust seq_page_cost and random_page_cost.



In [ ]:

    
import logging
import os

from spandex.utils import logger, DataLoader


# Set log level to INFO to see informational messages.
logger.setLevel(logging.INFO)

loader = DataLoader()

# Load boundaries.
for boundary in ['cities', 'tracts', 'zones']:
    filename = os.path.join('boundaries', '%s.shp' % boundary)
    loader.load_shp(filename, boundary, drop=True)

# Load zoning.
loader.load_shp('zoning/zoning_geom.shp', 'zoning', drop=True)

# Load parcels.
for county in ['SantaFe', 'Torrance', 'Valencia', 'Bernalillo']:
    filename = os.path.join('parcel', '{0}_ed.shp'.format(county))
    tablename = 'parcels_{0}'.format(county.lower())
    loader.load_shp(filename, tablename, drop=True)

# Load land use.
for county in ['santa_fe', 'torrance', 'valencia', 'bernalillo', 'edgewood']:
    filename = os.path.join('landuse', '{0}_lu.shp'.format(county))
    tablename = 'landuse_{0}'.format(county.replace('_', ''))
    loader.load_shp(filename, tablename, drop=True)



In [ ]:

    
# More info about the DataLoader and how to use it.
# Override configuration in config/user.cfg.
DataLoader?

Psycopg

Psycopg is a driver for interacting with PostgreSQL. It does not provide any abstraction for SQL beyond passing parameters to SQL queries.

Advantages:

Robust.
PostGIS and Psycopg are both well-documented and well-supported.

Disadvantages:

Python/SQL mixed code is verbose and harder to read/maintain.



In [ ]:

    
from spandex.utils import DataLoader


loader = DataLoader()
with loader.database.cursor() as cur:
    cur.execute("ALTER TABLE parcels_valencia ADD zone_uid varchar(19)")
    cur.execute(
        """UPDATE parcels_valencia SET zone_uid = zoning.zone_uid
        FROM zoning WHERE
        ST_Intersects(ST_Centroid(parcels_valencia.geom), zoning.geom);""")

GeoAlchemy 2 ORM

GeoAlchemy 2 ORM provides a high-level, object-oriented Python API that abstracts SQL expressions.

Advantages:

Cleaner Python syntax.
Still exposes SQL functions, including PostGIS spatial functions, directly.
Can coexist alongside Psycopg.
Provides Shapely integration.
Easier to write Python functions that wrap SQL operations.

Disadvantages:

Requires ORM mappings to describe database tables, although reflection (autoload=True) will build it from database schema, so this might not be an issue.
Difficult to modify table schema, e.g., adding columns, after creating the table. Best to continue doing this using Psycopg (ALTER TABLE).
Still "feels" like SQL, with added overhead/complexity.

GeoAlchemy 2 Core

GeoAlchemy 2 Core is similar to ORM, but its syntax is much less abstracted from SQL. SQL expressions are represented as Python constructs, not classes.

GeoDjango

GeoDjango is a geographic web framework. It is part of Django, the most well-known Python model-view-controller web framework.

Advantages:

Included and maintained as part of established software.
Includes admin web interface and OpenLayers map for editing tables/records/geometries.

Disadvantages:

Better suited for GIS web applications.
More limited compared to GeoAlchemy.
More difficult to autoload schema compared to GeoAlchemy.

GeoPandas

Advantages:

Does not depend on PostGIS.
Clean syntax based on Pandas, Shapely, Fiona.

Disadvantages:

Slow. CPU and memory intensive.
Might not be as practical for exploring large tables in a notebook as we would hope.
Moving target with development in early stages.



In [ ]:

    
import os

from geopandas import GeoDataFrame, GeoSeries
from shapely import speedups

from spandex.utils import load_config


config = load_config()
basedir = config.get('data', 'directory')

# Enable Shapely performance enhancements written in C.
speedups.enable()

# Load zone boundaries.
filename = os.path.join(basedir, 'boundaries', 'zones.shp')
zones = GeoDataFrame.from_file(filename)
zones.head()

Other software that was not considered

PPyGIS: PostGIS adapter for Psycopg that translates geometry objects from EWKB to Python
Shapely directly instead of via GeoPandas
GeoKettle amd Talend spatial extension: like FME, written in Java