Querying 1 775 965 000 points with postgres-pointcloud

442 tiles, ~1.7 billion points.

What do we want to do?

  • test point query timing
  • select some tiles which contain a lot of trees
  • see how LandSAT and LiDAR tree cover data match up

Gathering modules


In [1]:
import os

import psycopg2 as ppg
import numpy as np
import ast
from osgeo import ogr

import shapely as sp
from shapely.geometry import Point,Polygon,asShape
from shapely.wkt import loads as wkt_loads
from shapely import speedups


import cartopy as cp
import cartopy.crs as ccrs

import pandas as pd
import pandas.io.sql as pdsql

import geopandas as gp

from matplotlib.collections import PatchCollection
from mpl_toolkits.basemap import Basemap

import fiona

from descartes import PolygonPatch

import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
speedups.available


Out[2]:
True

In [3]:
speedups.enable()

Making a postgres connection with psycopg2


In [4]:
#  PGPASSWORD=pointy_cloudy psql -h localhost -d pcbm_pc -U pcbm
pg_connection = "dbname=pcbm_pc user=pcbm password=pointy_cloudy host=130.56.244.246"
conn = ppg.connect(pg_connection)
cursor = conn.cursor()

First query for some blocks - this gets them all!


In [5]:
bbox_query = 'SELECT ST_Extent(pa)::geometry(Polygon, 28355) as table_extent FROM ga_ld_pc;'

%time bbox = gp.read_postgis(bbox_query, conn)


---------------------------------------------------------------------------
ProgrammingError                          Traceback (most recent call last)
/Users/adam/anaconda/lib/python3.5/site-packages/pandas/io/sql.py in execute(self, *args, **kwargs)
   1563             else:
-> 1564                 cur.execute(*args)
   1565             return cur

ProgrammingError: function st_extent(pcpatch) does not exist
LINE 1: SELECT ST_Extent(pa)::geometry(Polygon, 28355) as table_exte...
               ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.


During handling of the above exception, another exception occurred:

DatabaseError                             Traceback (most recent call last)
<ipython-input-5-24620290b6e7> in <module>()
      1 bbox_query = 'SELECT ST_Extent(pa)::geometry(Polygon, 28355) as table_extent FROM ga_ld_pc;'
      2 
----> 3 get_ipython().magic('time bbox = gp.read_postgis(bbox_query, conn)')

/Users/adam/anaconda/lib/python3.5/site-packages/IPython/core/interactiveshell.py in magic(self, arg_s)
   2161         magic_name, _, magic_arg_s = arg_s.partition(' ')
   2162         magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2163         return self.run_line_magic(magic_name, magic_arg_s)
   2164 
   2165     #-------------------------------------------------------------------------

/Users/adam/anaconda/lib/python3.5/site-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line)
   2082                 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
   2083             with self.builtin_trap:
-> 2084                 result = fn(*args,**kwargs)
   2085             return result
   2086 

<decorator-gen-60> in time(self, line, cell, local_ns)

/Users/adam/anaconda/lib/python3.5/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    191     # but it's overkill for just that one bit of state.
    192     def magic_deco(arg):
--> 193         call = lambda f, *a, **k: f(*a, **k)
    194 
    195         if callable(arg):

/Users/adam/anaconda/lib/python3.5/site-packages/IPython/core/magics/execution.py in time(self, line, cell, local_ns)
   1175         else:
   1176             st = clock2()
-> 1177             exec(code, glob, local_ns)
   1178             end = clock2()
   1179             out = None

<timed exec> in <module>()

/Users/adam/anaconda/lib/python3.5/site-packages/geopandas/io/sql.py in read_postgis(sql, con, geom_col, crs, index_col, coerce_float, params)
     32     """
     33     df = read_sql(sql, con, index_col=index_col, coerce_float=coerce_float,
---> 34                   params=params)
     35 
     36     if geom_col not in df:

/Users/adam/anaconda/lib/python3.5/site-packages/pandas/io/sql.py in read_sql(sql, con, index_col, coerce_float, params, parse_dates, columns, chunksize)
    497             sql, index_col=index_col, params=params,
    498             coerce_float=coerce_float, parse_dates=parse_dates,
--> 499             chunksize=chunksize)
    500 
    501     try:

/Users/adam/anaconda/lib/python3.5/site-packages/pandas/io/sql.py in read_query(self, sql, index_col, coerce_float, params, parse_dates, chunksize)
   1597 
   1598         args = _convert_params(sql, params)
-> 1599         cursor = self.execute(*args)
   1600         columns = [col_desc[0] for col_desc in cursor.description]
   1601 

/Users/adam/anaconda/lib/python3.5/site-packages/pandas/io/sql.py in execute(self, *args, **kwargs)
   1574             ex = DatabaseError(
   1575                 "Execution failed on sql '%s': %s" % (args[0], exc))
-> 1576             raise_with_traceback(ex)
   1577 
   1578     @staticmethod

/Users/adam/anaconda/lib/python3.5/site-packages/pandas/compat/__init__.py in raise_with_traceback(exc, traceback)
    331         if traceback == Ellipsis:
    332             _, _, traceback = sys.exc_info()
--> 333         raise exc.with_traceback(traceback)
    334 else:
    335     # this version of raise is a syntax error in Python 3

/Users/adam/anaconda/lib/python3.5/site-packages/pandas/io/sql.py in execute(self, *args, **kwargs)
   1562                 cur.execute(*args, **kwargs)
   1563             else:
-> 1564                 cur.execute(*args)
   1565             return cur
   1566         except Exception as exc:

DatabaseError: Execution failed on sql 'SELECT ST_Extent(pa)::geometry(Polygon, 28355) as table_extent FROM ga_ld_pc;': function st_extent(pcpatch) does not exist
LINE 1: SELECT ST_Extent(pa)::geometry(Polygon, 28355) as table_exte...
               ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

In [6]:
#blocks_query = "SELECT pa::geometry(Polygon, 28355) AS geom, PC_PatchAvg(pa, 'Z') AS elevation, id FROM act_patches;"

#blocks_query = "SELECT st_astext(PC_envelope(pa)::geometry(Polygon, 28355)) AS geom, PC_PatchAvg(pa, 'Z') AS elevation, id FROM ga_ld_pc;"

blocks_query = "select PC_envelope(pa)::geometry(Polygon, 28355) AS geom, PC_PatchAvg(pa, 'Z') AS elevation from ga_ld_pc limit 1000;"

%time blocks = gp.read_postgis(blocks_query, conn)


CPU times: user 25.8 ms, sys: 6.91 ms, total: 32.7 ms
Wall time: 3.74 s

how many rows (where a block of points is a row)


In [7]:
blocks.size


Out[7]:
2000

In [8]:
blocks.head()


Out[8]:
geom elevation
0 POLYGON ((600000 6270000, 600000 6270050.78, 6... 55.757960
1 POLYGON ((600032.09 6270000.02, 600032.09 6270... 56.148638
2 POLYGON ((600000 6270050.79, 600000 6270075.98... 57.707492
3 POLYGON ((600000 6270075.99, 600000 6270125.60... 59.521752
4 POLYGON ((600032.27 6270075.98, 600032.27 6270... 59.748096

In [9]:
polys = gp.GeoSeries(blocks.geom)
patches = gp.GeoDataFrame({'geometry': polys, 'elevation': blocks.elevation})

In [10]:
patches.head()


Out[10]:
elevation geometry
0 55.757960 POLYGON ((600000 6270000, 600000 6270050.78, 6...
1 56.148638 POLYGON ((600032.09 6270000.02, 600032.09 6270...
2 57.707492 POLYGON ((600000 6270050.79, 600000 6270075.98...
3 59.521752 POLYGON ((600000 6270075.99, 600000 6270125.60...
4 59.748096 POLYGON ((600032.27 6270075.98, 600032.27 6270...

In [ ]:
blocks_query = "SELECT pa::geometry(Polygon, 28355) AS geom, PC_PatchAvg(pa, 'Z') AS elevation, id FROM ga_ld_pc where PC_PatchAvg(pa, 'Z') > 400;"

%time thepatches = gp.read_postgis(blocks_query, conn)

In [ ]:
thepatches.head()

In [ ]:
%time highpatches = thepatches.query('elevation > 820')
highpatches.head()

Let's map the patches of data we collected - Black Mountain, above 820m high


In [ ]:
highpatches.plot(column='elevation',colormap='BrBG')

Now collect the points from the same region


In [ ]:
points_query = "with pts as(SELECT PC_Explode(pa) as pt FROM act_patches where PC_PatchAvg(pa, 'Z') > 820 ) select st_astext(pt::geometry) from pts;"

#get raw point data, not as a geometry
#points_query =  "SELECT PC_astext(PC_Explode(pa)) as pt FROM act_patches where PC_PatchAvg(pa, 'Z') > 700 ;"

%time pts = pdsql.read_sql(points_query, conn)

# point storage schema:
# 1 = intens, 2 = ReturnNo, 3 = Numreturns, 4 = scandirectionflag, 5 = edgeofflightline
# 6 = classification (ASPRS), 7 = scananglerank, 8 = user data, 9 = pointsourceID
# 10 = R, 11 = G, 12 = B, 13 = GPSTime, 14 = X, 15 = Y, 16 = Z

In [ ]:
#how many points did we get?
pts.size

In [ ]:
#had to check the schema to find point order...
schema_query =  "SELECT * FROM pointcloud_formats where pcid = 4;"

schm = pdsql.read_sql(schema_query, conn)
print(schm.schema)

In [ ]:
pts.head()

In [ ]:
thepoints = []

for point in pts.st_astext:
    this = wkt_loads(point)
    thepoints.append([this.x,this.y,this.z])

In [ ]:
thepoints = np.squeeze(thepoints)

Plot the points


In [ ]:
plt.scatter(thepoints[:,0], thepoints[:,1], c = thepoints[:,2], lw=0, s=5, cmap='BrBG')

Now make a pretty plot - points, patches in the subset, and all the patches in the region


In [ ]:
fig = plt.figure()
fig.set_size_inches(25/2.51, 25/2.51)

BLUE = '#6699cc'
RED = '#cc6699'

ax = fig.gca() 
ax.scatter(thepoints[:,0], thepoints[:,1], c = thepoints[:,2], lw=0, s=3, cmap='BrBG')
for patch in thepatches.geom:
    ax.add_patch(PolygonPatch(patch, fc=BLUE, ec=BLUE, alpha=0.2, zorder=2 ))
    
for patch in highpatches.geom:
    ax.add_patch(PolygonPatch(patch, fc=BLUE, ec=RED, alpha=0.2, zorder=2 ))

Selecting points by classification


In [ ]:
#ASPRS class 6 - buildings
bldng_query = "WITH filtered_patch AS (SELECT PC_FilterEquals(pa, 'Classification', 6) as f_patch FROM act_patches where PC_PatchAvg(pa, 'Z') > 820) SELECT st_astext(point::geometry) FROM filtered_patch, pc_explode(f_patch) AS point;" 
%time bld_pts = pdsql.read_sql(bldng_query, conn)

In [ ]:
bld_pts.head()

bldpoints = []

for point in bld_pts.st_astext:
    this = wkt_loads(point)
    bldpoints.append([this.x,this.y,this.z])
bldpoints = np.squeeze(bldpoints)

In [ ]:
#ASPRS class 2 - ground
grnd_query = "WITH filtered_patch AS (SELECT PC_FilterEquals(pa, 'Classification', 2) as f_patch FROM act_patches where PC_PatchAvg(pa, 'Z') > 820) SELECT st_astext(point::geometry) FROM filtered_patch, pc_explode(f_patch) AS point;" 
%time grnd_pts = pdsql.read_sql(grnd_query, conn)

In [ ]:
grnd_pts.head()

grndpoints = []

for point in grnd_pts.st_astext:
    this = wkt_loads(point)
    grndpoints.append([this.x,this.y,this.z])
grndpoints = np.squeeze(grndpoints)

In [ ]:
#ASPRS class 5 - high vegetation
hv_query = "WITH filtered_patch AS (SELECT PC_FilterEquals(pa, 'Classification', 5) as f_patch FROM ga_ld_pc where PC_PatchAvg(pa, 'Z') > 40) SELECT st_astext(point::geometry) FROM filtered_patch, pc_explode(f_patch) AS point;" 
%time hv_pts = pdsql.read_sql(hv_query, conn)

In [ ]:
#ASPRS class 5 - high vegetation
hv_query = "WITH filtered_patch AS (SELECT PC_FilterEquals(pa, 'Classification', 5) as f_patch FROM ga_ld_pc where PC_Intersection(pa, 'SRID=4326;POLYGON((-126.451 45.552, -126.42 47.55, -126.40 45.552, -126.451 45.552))'::geometry
) SELECT st_astext(point::geometry) FROM filtered_patch, pc_explode(f_patch) AS point;" 
%time hv_pts = pdsql.read_sql(hv_query, conn)

In [ ]:
hv_pts.head()

hvpoints = []

for point in hv_pts.st_astext:
    this = wkt_loads(point)
    hvpoints.append([this.x,this.y,this.z])
hvpoints = np.squeeze(hvpoints)

In [ ]:
fig = plt.figure()
fig.set_size_inches(25/2.51, 25/2.51)

BLUE = '#6699cc'
RED = '#cc6699'

ax = fig.gca() 
ax.scatter(grndpoints[:,0], grndpoints[:,1], c = grndpoints[:,2], lw=0, s=3, cmap='plasma')
ax.scatter(bldpoints[:,0], bldpoints[:,1], c = bldpoints[:,2], lw=0, s=3, cmap='viridis')
ax.scatter(hvpoints[:,0], hvpoints[:,1], c = hvpoints[:,2], lw=0, s=3, cmap='BrBG')

for patch in thepatches.geom:
    ax.add_patch(PolygonPatch(patch, fc=BLUE, ec=BLUE, alpha=0.2, zorder=2 ))
    
for patch in highpatches.geom:
    ax.add_patch(PolygonPatch(patch, fc=BLUE, ec=RED, alpha=0.2, zorder=2 ))

Add a 3D plot


In [ ]:
#set up for 3d plots
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pylab as pylab

In [ ]:
plt_az=300
plt_elev = 50.
plt_s = 2

cb_fmt = '%.1f'

fig = plt.figure()
fig.set_size_inches(30/2.51, 25/2.51)

ax0 = fig.add_subplot(111, projection='3d')
ax0.scatter(grndpoints[:,0], grndpoints[:,1],grndpoints[:,2], c=np.ndarray.tolist(grndpoints[:,2]),\
                lw=0, s=plt_s, cmap='plasma')
ax0.scatter(bldpoints[:,0], bldpoints[:,1],bldpoints[:,2], c=np.ndarray.tolist(bldpoints[:,2]),\
                lw=0, s=plt_s, cmap='viridis')
ax0.scatter(hvpoints[:,0], hvpoints[:,1],hvpoints[:,2], c=np.ndarray.tolist(hvpoints[:,2]),\
                lw=0, s=plt_s-1, cmap='BrBG')

Export to three.js?


In [ ]:
import vtk

In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:
np.savetxt('ground_points.txt', grndpoints, delimiter=',')
np.savetxt('bld_points.txt', bldpoints, delimiter=',')
np.savetxt('hv_points.txt', hvpoints, delimiter=',')

To do:

  • too many things!
  • select from database by:
    • class (demonstrated here)
    • height above ground (need to integrate PDAL and PCL)
    • tree cover
    • intersection with objects
  • things on the list:
    • comparing LandSAT bare ground and LIDAR bare ground
    • tree heights and geophysical properties
    • ...

All very cool but why?

National elevation maps - storing and managing many billions of points as a coherent dataset for precise elevation estimation. Also aiming to store provenance - if there's a data issue, we need more than just points. We need to figure out why the issue occurred, and fix it. We can also store things like point accuracy, some QC metrics, whatever point attributes we like! Or points from manifold sources:

  • airborne LiDAR
  • terrestrial scanners
  • 3D photogrammetry
  • geophysical datasets (already as points in netCDF)
  • output of discrete element models (eg. Kool et al, or new sea ice models in development)

In [ ]: