Comparison of different Python APIs for working with geospatial data

Some performance suggestions:

  • For Shapely and GeoPandas, use shapely.speedups.enable().
  • For PostgreSQL/PostGIS, tune the database settings. A quick way to get faster performance is to use the settings suggested by PgTune. Select "Data warehouses" for "DB Type" and minimize the number of connections. If using a fast hard drive, like an SSD, you can also adjust seq_page_cost and random_page_cost.

In [ ]:
import logging
import os

from spandex.utils import logger, DataLoader


# Set log level to INFO to see informational messages.
logger.setLevel(logging.INFO)

loader = DataLoader()

# Load boundaries.
for boundary in ['cities', 'tracts', 'zones']:
    filename = os.path.join('boundaries', '%s.shp' % boundary)
    loader.load_shp(filename, boundary, drop=True)

# Load zoning.
loader.load_shp('zoning/zoning_geom.shp', 'zoning', drop=True)

# Load parcels.
for county in ['SantaFe', 'Torrance', 'Valencia', 'Bernalillo']:
    filename = os.path.join('parcel', '{0}_ed.shp'.format(county))
    tablename = 'parcels_{0}'.format(county.lower())
    loader.load_shp(filename, tablename, drop=True)

# Load land use.
for county in ['santa_fe', 'torrance', 'valencia', 'bernalillo', 'edgewood']:
    filename = os.path.join('landuse', '{0}_lu.shp'.format(county))
    tablename = 'landuse_{0}'.format(county.replace('_', ''))
    loader.load_shp(filename, tablename, drop=True)

In [ ]:
# More info about the DataLoader and how to use it.
# Override configuration in config/user.cfg.
DataLoader?

Psycopg

Psycopg is a driver for interacting with PostgreSQL. It does not provide any abstraction for SQL beyond passing parameters to SQL queries.

Advantages:

  • Robust.
  • PostGIS and Psycopg are both well-documented and well-supported.

Disadvantages:

  • Python/SQL mixed code is verbose and harder to read/maintain.

In [ ]:
from spandex.utils import DataLoader


loader = DataLoader()
with loader.database.cursor() as cur:
    cur.execute("ALTER TABLE parcels_valencia ADD zone_uid varchar(19)")
    cur.execute(
        """UPDATE parcels_valencia SET zone_uid = zoning.zone_uid
        FROM zoning WHERE
        ST_Intersects(ST_Centroid(parcels_valencia.geom), zoning.geom);""")

GeoAlchemy 2 ORM

GeoAlchemy 2 ORM provides a high-level, object-oriented Python API that abstracts SQL expressions.

Advantages:

  • Cleaner Python syntax.
  • Still exposes SQL functions, including PostGIS spatial functions, directly.
  • Can coexist alongside Psycopg.
  • Provides Shapely integration.
  • Easier to write Python functions that wrap SQL operations.

Disadvantages:

  • Requires ORM mappings to describe database tables, although reflection (autoload=True) will build it from database schema, so this might not be an issue.
  • Difficult to modify table schema, e.g., adding columns, after creating the table. Best to continue doing this using Psycopg (ALTER TABLE).
  • Still "feels" like SQL, with added overhead/complexity.

Recommended reading:


In [ ]:
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from geoalchemy2 import Geometry

from spandex.utils import DataLoader


# Build engine and session for ORM.
loader = DataLoader()
engine = create_engine('postgresql://', creator=lambda: loader.database._connection)
Base = declarative_base(engine)
Session = sessionmaker(bind=engine)
session = Session()

# Reflect all tables in the database.
Base.metadata.reflect()
tables = Base.metadata.tables

# Map parcels and zoning tables to classes.
class Parcels(Base):
    __table__ = tables['parcels_valencia']
class Zoning(Base):
    __table__ = tables['zoning']

# Dynamically map all tables to classes.
maps = {}
for (name, table) in tables.items():
    maps[name] = type(str(name), (Base,), {'__table__': table})

In [ ]:
# TODO: Example spatial operations

GeoAlchemy 2 Core

GeoAlchemy 2 Core is similar to ORM, but its syntax is much less abstracted from SQL. SQL expressions are represented as Python constructs, not classes.

Recommended reading:


In [ ]:
from sqlalchemy import create_engine, Table, MetaData
from geoalchemy2 import Geometry

from spandex.utils import DataLoader


# Build engine and metadata for core.
loader = DataLoader()
engine = create_engine('postgresql://', creator=lambda: loader.database._connection)
metadata = MetaData(engine)

# Reflect all tables in the database.
metadata.reflect()
tables = metadata.tables

parcels = tables['parcels_valencia']
zoning = tables['zoning']

In [ ]:
# TODO: Example spatial operations

GeoDjango

GeoDjango is a geographic web framework. It is part of Django, the most well-known Python model-view-controller web framework.

Advantages:

  • Included and maintained as part of established software.
  • Includes admin web interface and OpenLayers map for editing tables/records/geometries.

Disadvantages:

  • Better suited for GIS web applications.
  • More limited compared to GeoAlchemy.
  • More difficult to autoload schema compared to GeoAlchemy.

GeoPandas

Advantages:

  • Does not depend on PostGIS.
  • Clean syntax based on Pandas, Shapely, Fiona.

Disadvantages:

  • Slow. CPU and memory intensive.
  • Might not be as practical for exploring large tables in a notebook as we would hope.
  • Moving target with development in early stages.

In [ ]:
import os

from geopandas import GeoDataFrame, GeoSeries
from shapely import speedups

from spandex.utils import load_config


config = load_config()
basedir = config.get('data', 'directory')

# Enable Shapely performance enhancements written in C.
speedups.enable()

# Load zone boundaries.
filename = os.path.join(basedir, 'boundaries', 'zones.shp')
zones = GeoDataFrame.from_file(filename)
zones.head()

Other software that was not considered

  • PPyGIS: PostGIS adapter for Psycopg that translates geometry objects from EWKB to Python
  • Shapely directly instead of via GeoPandas
  • GeoKettle amd Talend spatial extension: like FME, written in Java