Some tests with pgpointcloud

LAS files are intractable for subsetting and visualisation without LASTools or Terrasolid or Bentley Map (or similar) which are awesome but costly.
Further, LAS files don't really let us store all manner of things along with the data, for example we already have issues storing digitised waveforms with the ACT dataset
so what can we do? Here are some experiments with PostGIS-pointcloud, one approach to point cloud data management

Some things about this exercise:

I am a postGIS/pgpointcloud n00b. An SQL ninja could probably do a lot better than this and a lot faster!
The data set used here has ~406 000 000 points in it
made from nine ACT 8pt tiles (no waveform data, because stuff)
ingested using a PDAL pipeline (http://pdal.io)
LAS storage is ~20 GB
PG-pointcloud table is ~5 GB, so reasonably similar to .LAZ
there are 82 634 'patches' containing 5 000 points each
patches are the primary query tool, so we index over 82 634 rows, which are arranged in a quasi-space-filling-curve
tradeoff between number of points in patch, and number of rows in DB
but generally speaking, scalable (nearly... research shows that we will hit a limit in a few hundred million points time)

Gathering modules



In [1]:

    
import os

import psycopg2 as ppg
import numpy as np
import ast
from osgeo import ogr

import shapely as sp
from shapely.geometry import Point,Polygon,asShape
from shapely.wkt import loads as wkt_loads
from shapely import speedups


import cartopy as cp
import cartopy.crs as ccrs

import pandas as pd
import pandas.io.sql as pdsql

import geopandas as gp

from matplotlib.collections import PatchCollection
from mpl_toolkits.basemap import Basemap

import fiona

from descartes import PolygonPatch

import matplotlib.pyplot as plt
%matplotlib inline



In [2]:

    
speedups.available









    Out[2]:





True



In [3]:

    
speedups.enable()

Making a postgres connection with psycopg2



In [4]:

    
#  PGPASSWORD=pointy_cloudy psql -h localhost -d pcbm_pc -U pcbm
pg_connection = "dbname=pcbm_pc user=pcbm password=pointy_cloudy host=130.56.244.246"
conn = ppg.connect(pg_connection)
cursor = conn.cursor()

First query for some blocks - this gets them all!



In [5]:

    
#blocks_query = "SELECT pa::geometry(Polygon, 28355) AS geom, PC_PatchAvg(pa, 'Z') AS elevation, id FROM act_patches;"

blocks_query = "SELECT st_astext(PC_envelope(pa)::geometry(Polygon, 28355)) AS geom, PC_PatchAvg(pa, 'Z') AS elevation, id FROM act_patches;"

%time blks = pdsql.read_sql(blocks_query, conn)









    



CPU times: user 327 ms, sys: 186 ms, total: 514 ms
Wall time: 3min 33s

here I was trying to convert a Pandas dataframe to a GeoPandas dataframe with a geometry column



In [6]:

    
gblocks = gp.GeoDataFrame(blks)



In [7]:

    
gblocks.head()









    Out[7]:






  
    
      
      geom
      elevation
      id
    
  
  
    
      0
      POLYGON((688000.16 6092000,688000.16 6092017.0...
      598.078240
      1
    
    
      1
      POLYGON((688000.1 6092017.05,688000.1 6092034....
      598.497148
      2
    
    
      2
      POLYGON((688032.8 6092000,688032.8 6092017.39,...
      595.365976
      3
    
    
      3
      POLYGON((688032.82 6092017.39,688032.82 609203...
      595.920190
      4
    
    
      4
      POLYGON((688000.04 6092034.24,688000.04 609205...
      599.188634
      5



In [8]:

    
polys = gp.GeoSeries(blks.geom)
patches = gp.GeoDataFrame({'geometry': polys, 'elevation': blks.elevation})



In [9]:

    
patches.head()









    Out[9]:






  
    
      
      elevation
      geometry
    
  
  
    
      0
      598.078240
      POLYGON((688000.16 6092000,688000.16 6092017.0...
    
    
      1
      598.497148
      POLYGON((688000.1 6092017.05,688000.1 6092034....
    
    
      2
      595.365976
      POLYGON((688032.8 6092000,688032.8 6092017.39,...
    
    
      3
      595.920190
      POLYGON((688032.82 6092017.39,688032.82 609203...
    
    
      4
      599.188634
      POLYGON((688000.04 6092034.24,688000.04 609205...

...but I gave up and ingested data straight into a GeoPandas frame instead



In [10]:

    
blocks_query = "SELECT pa::geometry(Polygon, 28355) AS geom, PC_PatchAvg(pa, 'Z') AS elevation, id FROM act_patches where PC_PatchAvg(pa, 'Z') > 700;"

%time thepatches = gp.read_postgis(blocks_query, conn)









    



CPU times: user 139 ms, sys: 18.2 ms, total: 157 ms
Wall time: 21.2 s



In [11]:

    
thepatches.head()









    Out[11]:






  
    
      
      geom
      elevation
      id
    
  
  
    
      0
      POLYGON ((689303.09 6095473.11, 689303.09 6095...
      701.567520
      20393
    
    
      1
      POLYGON ((689316.71 6095456.79, 689316.71 6095...
      700.136368
      20397
    
    
      2
      POLYGON ((689329.8200000001 6095456.79, 689329...
      700.521912
      20398
    
    
      3
      POLYGON ((689316.71 6095469.78, 689316.71 6095...
      704.410690
      20399
    
    
      4
      POLYGON ((689328.4 6095469.78, 689328.4 609549...
      705.962588
      20400



In [12]:

    
%time highpatches = thepatches.query('elevation > 820')
highpatches.head()









    



CPU times: user 41.9 ms, sys: 2.3 ms, total: 44.2 ms
Wall time: 44.2 ms






    Out[12]:






  
    
      
      geom
      elevation
      id
    
  
  
    
      2230
      POLYGON ((690562.53 6094433.22, 690562.53 6094...
      821.132130
      66998
    
    
      2232
      POLYGON ((690570.28 6094426.43, 690570.28 6094...
      820.424162
      67000
    
    
      2239
      POLYGON ((690589.48 6094433.04, 690589.48 6094...
      820.406540
      67007
    
    
      2363
      POLYGON ((690762.2000000001 6094355.310000001,...
      820.228386
      67131
    
    
      2365
      POLYGON ((690776.96 6094349.38, 690776.96 6094...
      823.405956
      67133

Let's map the patches of data we collected - Black Mountain, above 820m high



In [13]:

    
highpatches.plot(column='elevation',colormap='BrBG')









    



/Users/adam/anaconda/lib/python3.5/site-packages/geopandas/plotting.py:225: FutureWarning: 'colormap' is deprecated, please use 'cmap' instead (for consistency with matplotlib)
  "(for consistency with matplotlib)", FutureWarning)






    Out[13]:





<matplotlib.axes._subplots.AxesSubplot at 0x126610588>

Now collect the points from the same region



In [14]:

    
points_query = "with pts as(SELECT PC_Explode(pa) as pt FROM act_patches where PC_PatchAvg(pa, 'Z') > 820 ) select st_astext(pt::geometry) from pts;"

#get raw point data, not as a geometry
#points_query =  "SELECT PC_astext(PC_Explode(pa)) as pt FROM act_patches where PC_PatchAvg(pa, 'Z') > 700 ;"

%time pts = pdsql.read_sql(points_query, conn)

# point storage schema:
# 1 = intens, 2 = ReturnNo, 3 = Numreturns, 4 = scandirectionflag, 5 = edgeofflightline
# 6 = classification (ASPRS), 7 = scananglerank, 8 = user data, 9 = pointsourceID
# 10 = R, 11 = G, 12 = B, 13 = GPSTime, 14 = X, 15 = Y, 16 = Z









    



CPU times: user 472 ms, sys: 195 ms, total: 667 ms
Wall time: 25.8 s



In [15]:

    
#how many points did we get?
pts.size









    Out[15]:





879988



In [16]:

    
#had to check the schema to find point order...
schema_query =  "SELECT * FROM pointcloud_formats where pcid = 4;"

schm = pdsql.read_sql(schema_query, conn)
print(schm.schema)









    



0    <?xml version="1.0" encoding="UTF-8"?>\n<pc:Po...
Name: schema, dtype: object



In [17]:

    
pts.head()









    Out[17]:






  
    
      
      st_astext
    
  
  
    
      0
      POINT Z (690565.21 6094433.22 820.59)
    
    
      1
      POINT Z (690566.35 6094433.22 826.48)
    
    
      2
      POINT Z (690566.11 6094433.22 816.35)
    
    
      3
      POINT Z (690565.12 6094433.23 826.28)
    
    
      4
      POINT Z (690565.01 6094433.23 816.04)



In [18]:

    
thepoints = []

for point in pts.st_astext:
    this = wkt_loads(point)
    thepoints.append([this.x,this.y,this.z])



In [19]:

    
thepoints = np.squeeze(thepoints)

Plot the points



In [20]:

    
plt.scatter(thepoints[:,0], thepoints[:,1], c = thepoints[:,2], lw=0, s=5, cmap='BrBG')









    Out[20]:





<matplotlib.collections.PathCollection at 0x12688cf98>

Now make a pretty plot - points, patches in the subset, and all the patches in the region



In [21]:

    
fig = plt.figure()
fig.set_size_inches(25/2.51, 25/2.51)

BLUE = '#6699cc'
RED = '#cc6699'

ax = fig.gca() 
ax.scatter(thepoints[:,0], thepoints[:,1], c = thepoints[:,2], lw=0, s=3, cmap='BrBG')
for patch in thepatches.geom:
    ax.add_patch(PolygonPatch(patch, fc=BLUE, ec=BLUE, alpha=0.2, zorder=2 ))
    
for patch in highpatches.geom:
    ax.add_patch(PolygonPatch(patch, fc=BLUE, ec=RED, alpha=0.2, zorder=2 ))

Selecting points by classification



In [22]:

    
#ASPRS class 6 - buildings
bldng_query = "WITH filtered_patch AS (SELECT PC_FilterEquals(pa, 'Classification', 6) as f_patch FROM act_patches where PC_PatchAvg(pa, 'Z') > 820) SELECT st_astext(point::geometry) FROM filtered_patch, pc_explode(f_patch) AS point;" 
%time bld_pts = pdsql.read_sql(bldng_query, conn)









    



CPU times: user 12.7 ms, sys: 6.76 ms, total: 19.5 ms
Wall time: 20 s



In [23]:

    
bld_pts.head()

bldpoints = []

for point in bld_pts.st_astext:
    this = wkt_loads(point)
    bldpoints.append([this.x,this.y,this.z])
bldpoints = np.squeeze(bldpoints)



In [24]:

    
#ASPRS class 2 - ground
grnd_query = "WITH filtered_patch AS (SELECT PC_FilterEquals(pa, 'Classification', 2) as f_patch FROM act_patches where PC_PatchAvg(pa, 'Z') > 820) SELECT st_astext(point::geometry) FROM filtered_patch, pc_explode(f_patch) AS point;" 
%time grnd_pts = pdsql.read_sql(grnd_query, conn)









    



CPU times: user 100 ms, sys: 51.2 ms, total: 151 ms
Wall time: 22.4 s



In [25]:

    
grnd_pts.head()

grndpoints = []

for point in grnd_pts.st_astext:
    this = wkt_loads(point)
    grndpoints.append([this.x,this.y,this.z])
grndpoints = np.squeeze(grndpoints)



In [26]:

    
#ASPRS class 5 - high vegetation
hv_query = "WITH filtered_patch AS (SELECT PC_FilterEquals(pa, 'Classification', 5) as f_patch FROM act_patches where PC_PatchAvg(pa, 'Z') > 820) SELECT st_astext(point::geometry) FROM filtered_patch, pc_explode(f_patch) AS point;" 
%time hv_pts = pdsql.read_sql(hv_query, conn)









    



CPU times: user 202 ms, sys: 99 ms, total: 301 ms
Wall time: 24.6 s



In [27]:

    
hv_pts.head()

hvpoints = []

for point in hv_pts.st_astext:
    this = wkt_loads(point)
    hvpoints.append([this.x,this.y,this.z])
hvpoints = np.squeeze(hvpoints)



In [28]:

    
fig = plt.figure()
fig.set_size_inches(25/2.51, 25/2.51)

BLUE = '#6699cc'
RED = '#cc6699'

ax = fig.gca() 
ax.scatter(grndpoints[:,0], grndpoints[:,1], c = grndpoints[:,2], lw=0, s=3, cmap='plasma')
ax.scatter(bldpoints[:,0], bldpoints[:,1], c = bldpoints[:,2], lw=0, s=3, cmap='viridis')
ax.scatter(hvpoints[:,0], hvpoints[:,1], c = hvpoints[:,2], lw=0, s=3, cmap='BrBG')

for patch in thepatches.geom:
    ax.add_patch(PolygonPatch(patch, fc=BLUE, ec=BLUE, alpha=0.2, zorder=2 ))
    
for patch in highpatches.geom:
    ax.add_patch(PolygonPatch(patch, fc=BLUE, ec=RED, alpha=0.2, zorder=2 ))

Add a 3D plot



In [29]:

    
#set up for 3d plots
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pylab as pylab



In [37]:

    
plt_az=300
plt_elev = 50.
plt_s = 2

cb_fmt = '%.1f'

fig = plt.figure()
fig.set_size_inches(30/2.51, 25/2.51)

ax0 = fig.add_subplot(111, projection='3d')
ax0.scatter(grndpoints[:,0], grndpoints[:,1],grndpoints[:,2], c=np.ndarray.tolist(grndpoints[:,2]),\
                lw=0, s=plt_s, cmap='plasma')
ax0.scatter(bldpoints[:,0], bldpoints[:,1],bldpoints[:,2], c=np.ndarray.tolist(bldpoints[:,2]),\
                lw=0, s=plt_s, cmap='viridis')
ax0.scatter(hvpoints[:,0], hvpoints[:,1],hvpoints[:,2], c=np.ndarray.tolist(hvpoints[:,2]),\
                lw=0, s=plt_s-1, cmap='BrBG')









    Out[37]:





<mpl_toolkits.mplot3d.art3d.Path3DCollection at 0x143ad3470>

Export to three.js?



In [39]:

    
import vtk









    



---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-39-b7e11aadda62> in <module>()
----> 1 import vtk

ImportError: No module named 'vtk'



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [31]:

    
np.savetxt('ground_points.txt', grndpoints, delimiter=',')
np.savetxt('bld_points.txt', bldpoints, delimiter=',')
np.savetxt('hv_points.txt', hvpoints, delimiter=',')

To do:

too many things!
select from database by:
- class (demonstrated here)
- height above ground (need to integrate PDAL and PCL)
- tree cover
- intersection with objects
things on the list:
- comparing LandSAT bare ground and LIDAR bare ground
- tree heights and geophysical properties
- ...

All very cool but why?

National elevation maps - storing and managing many billions of points as a coherent dataset for precise elevation estimation. Also aiming to store provenance - if there's a data issue, we need more than just points. We need to figure out why the issue occurred, and fix it. We can also store things like point accuracy, some QC metrics, whatever point attributes we like! Or points from manifold sources:

airborne LiDAR
terrestrial scanners
3D photogrammetry
geophysical datasets (already as points in netCDF)
output of discrete element models (eg. Kool et al, or new sea ice models in development)



In [ ]:

	geom	elevation	id
0	POLYGON((688000.16 6092000,688000.16 6092017.0...	598.078240	1
1	POLYGON((688000.1 6092017.05,688000.1 6092034....	598.497148	2
2	POLYGON((688032.8 6092000,688032.8 6092017.39,...	595.365976	3
3	POLYGON((688032.82 6092017.39,688032.82 609203...	595.920190	4
4	POLYGON((688000.04 6092034.24,688000.04 609205...	599.188634	5

	geom	elevation	id
0	POLYGON ((689303.09 6095473.11, 689303.09 6095...	701.567520	20393
1	POLYGON ((689316.71 6095456.79, 689316.71 6095...	700.136368	20397
2	POLYGON ((689329.8200000001 6095456.79, 689329...	700.521912	20398
3	POLYGON ((689316.71 6095469.78, 689316.71 6095...	704.410690	20399
4	POLYGON ((689328.4 6095469.78, 689328.4 609549...	705.962588	20400

	geom	elevation	id
2230	POLYGON ((690562.53 6094433.22, 690562.53 6094...	821.132130	66998
2232	POLYGON ((690570.28 6094426.43, 690570.28 6094...	820.424162	67000
2239	POLYGON ((690589.48 6094433.04, 690589.48 6094...	820.406540	67007
2363	POLYGON ((690762.2000000001 6094355.310000001,...	820.228386	67131
2365	POLYGON ((690776.96 6094349.38, 690776.96 6094...	823.405956	67133

	st_astext
0	POINT Z (690565.21 6094433.22 820.59)
1	POINT Z (690566.35 6094433.22 826.48)
2	POINT Z (690566.11 6094433.22 816.35)
3	POINT Z (690565.12 6094433.23 826.28)
4	POINT Z (690565.01 6094433.23 816.04)