What is the proposed task:

  • ingest some liDAR points into a HDF file
  • ingest the aircraft trajectory into the file
  • anything else

...and then extract data from the HDF file at different rates using a spatial 'query'

What do the data look like?

ASCII point clouds with the following attributes, currently used as a full M x N array:

  • time (GPS second of week, float)
  • X coordinate (UTM metres, float)
  • Y coordinate (UTM metres, float)
  • Z coordinate (ellipsoidal height, float)
  • return intensity (unscaled, float)
  • scan angle (degrees, float)

Everything above is easily stored in the binary .LAS format (or .LAZ). It is kept in ASCII because the following additional data have no slots in .LAS:

  • X uncertainty (aircraft reference frame, m, float)
  • Y uncertainty (aircraft reference frame, m, float)
  • Z uncertainty (aircraft reference frame, m, float)
  • 3D uncertainty (metres, float)

...and optionally (depending on the use case):

  • aircraft trajectory height (to ITRF08, metres)
  • aicraft position uncertainty X (metres, relative to aircraft position)
  • aircraft and sensor attributes

...and derived data:

  • sea ice elevations (m, float)
  • estimted snow depths (m, float)
  • estimated snow depth uncertainty (m, float)
  • estimated ice thickness (m, float)
  • estimated ice thickness uncertainty (m, float)

So, you can see how quickly .LAS loses it's value. ASCII point clouds are conceptually simple, but very big - and not well suited to use in a HPC context. Too much storage overhead, and you have to read the entire file in order to extract a subset. Six million points gets to around 50MB, it's pretty inefficient.

So lets look at about 6 million points...

Let's look at a small set of points

Scenario: A small set of LiDAR and 3D photogrammetry points collected adjacent to a ship (RSV Aurora Australia) parked in sea ice.

Source: AAD's APPLS system (http://seaice.acecrc.org.au/crrs/)

https://data.aad.gov.au/aadc/metadata/metadata.cfm?entry_id=SIPEX_II_RAPPLS

https://data.aad.gov.au/aadc/metadata/metadata.cfm?entry_id=SIPEX_LiDAR_sea_ice

Pretty cover photo:


In [1]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import LogNorm
%matplotlib inline
#import plot_lidar
from datetime import datetime

import a LiDAR swath


In [3]:
swath = np.genfromtxt('../../PhD/python-phd/swaths/is6_f11_pass1_aa_nr2_522816_523019_c.xyz')

In [4]:
import pandas as pd

columns = ['time', 'X', 'Y', 'Z', 'I','A', 'x_u', 'y_u', 'z_u', '3D_u']
swath = pd.DataFrame(swath, columns=columns)

In [5]:
swath[1:5]


Out[5]:
time X Y Z I A x_u y_u z_u 3D_u
1 522816.00544 -113.710607 -6077.889536 -27.795405 8995.0 56.595811 0.2184 0.3094 0.1440 0.4052
2 522816.00547 -112.621341 -6077.699073 -27.805177 9252.0 56.795901 0.2184 0.3077 0.1431 0.4036
3 522816.00550 -111.563030 -6077.515571 -27.856036 8995.0 56.995991 0.2185 0.3060 0.1422 0.4020
4 522816.00554 -110.491277 -6077.328057 -27.878673 9252.0 57.196111 0.2185 0.3044 0.1413 0.4004

Now load up the aircraft trajectory


In [6]:
air_traj = np.genfromtxt('../../PhD/is6_f11/trajectory/is6_f11_pass1_local_ice_rot.3dp')

In [7]:
columns = ['time', 'X', 'Y', 'Z', 'R', 'P', 'H', 'x_u', 'y_u', 'z_u', 'r_u', 'p_u', 'h_u']
air_traj = pd.DataFrame(air_traj, columns=columns)

In [8]:
air_traj[1:5]


Out[8]:
time X Y Z R P H x_u y_u z_u r_u p_u h_u
1 522815.012 30.90322 -6092.56695 193.840 4.073 -3.345 -8.85948 0.054 0.083 0.053 0.034 0.037 0.104
2 522815.016 30.89646 -6092.37386 193.842 4.068 -3.339 -8.85748 0.054 0.083 0.053 0.034 0.037 0.104
3 522815.020 30.89081 -6092.18033 193.846 4.053 -3.327 -8.85448 0.054 0.083 0.053 0.034 0.037 0.104
4 522815.024 30.88405 -6091.98725 193.848 4.027 -3.304 -8.85048 0.054 0.083 0.053 0.034 0.037 0.104

take a quick look at the data


In [9]:
fig = plt.figure(figsize = ([30/2.54, 6/2.54]))
ax0 = fig.add_subplot(111)      
a0 = ax0.scatter(swath['Y'],  swath['X'], c=swath['Z'] - np.min(swath['Z']), cmap = 'gist_earth',
                vmin=0, vmax=10, edgecolors=None,lw=0, s=0.6)
a1 = ax0.scatter(air_traj['Y'], air_traj['X'], c=air_traj['Z'], cmap = 'Reds',
                lw=0, s=1)
plt.tight_layout()


Making a HDF file out of those points


In [10]:
import h5py

In [11]:
#create a file instance, with the intention to write it out
lidar_test = h5py.File('lidar_test.hdf5', 'w')

In [12]:
swath_data = lidar_test.create_group('swath_data')

In [13]:
swath_data.create_dataset('GPS_SOW', data=swath['time'])


Out[13]:
<HDF5 dataset "GPS_SOW": shape (1957932,), type "<f8">

In [14]:
#some data
swath_data.create_dataset('UTM_X', data=swath['X'])
swath_data.create_dataset('UTM_Y', data=swath['Y'])
swath_data.create_dataset('Z', data=swath['Z'])
swath_data.create_dataset('INTENS', data=swath['I'])
swath_data.create_dataset('ANGLE', data=swath['A'])
swath_data.create_dataset('X_UNCERT', data=swath['x_u'])
swath_data.create_dataset('Y_UNCERT', data=swath['y_u'])
swath_data.create_dataset('Z_UNCERT', data=swath['z_u'])
swath_data.create_dataset('3D_UNCERT', data=swath['3D_u'])


Out[14]:
<HDF5 dataset "3D_UNCERT": shape (1957932,), type "<f8">

In [15]:
#some attributes
lidar_test.attrs['file_name'] = 'lidar_test.hdf5'

lidar_test.attrs['codebase'] = 'https://github.com/adamsteer/matlab_LIDAR'

That's some swath data, now some trajectory data at a different sampling rate


In [16]:
traj_data = lidar_test.create_group('traj_data')

In [17]:
#some attributes
traj_data.attrs['flight'] = 11
traj_data.attrs['pass'] = 1
traj_data.attrs['source'] = 'RAPPLS flight 11, SIPEX-II 2012'

In [18]:
#some data
traj_data.create_dataset('pos_x', data = air_traj['X'])
traj_data.create_dataset('pos_y', data =  air_traj['Y'])
traj_data.create_dataset('pos_z', data =  air_traj['Z'])


Out[18]:
<HDF5 dataset "pos_z": shape (51495,), type "<f8">

close and write the file out


In [19]:
lidar_test.close()

OK, that's an arbitrary HDF file built

The generated file is substantially smaller than the combined sources - 158 MB from 193, with no attention paid to optimisation.

The .LAZ version of the input text file here is 66 MB. More compact, but we can't query it directly - and we have to fake fields! Everything in the swath dataset can be stored, but we need to pretend uncertainties are RGB, so if person X comes along and doesn't read the metadata well, they get crazy colours, call us up and complain. Or we need to use .LAZ extra bits, and deal with awkward ways of describing things.

It's also probably a terrible HDF, with no respect to CF compliance at all. That's to come :)

And now we add some 3D photogrammetry at about 80 points/m^2:


In [20]:
photo = np.genfromtxt('/Users/adam/Documents/PhD/is6_f11/photoscan/is6_f11_photoscan_Cloud.txt',skip_header=1)

In [21]:
columns = ['X', 'Y', 'Z', 'R', 'G', 'B']
photo = pd.DataFrame(photo[:,0:6], columns=columns)

In [22]:
#create a file instance, with the intention to write it out
lidar_test = h5py.File('lidar_test.hdf5', 'r+')

In [23]:
photo_data = lidar_test.create_group('3d_photo')

In [24]:
photo_data.create_dataset('UTM_X', data=photo['X'])
photo_data.create_dataset('UTM_Y', data=photo['Y'])
photo_data.create_dataset('Z', data=photo['Z'])
photo_data.create_dataset('R', data=photo['R'])
photo_data.create_dataset('G', data=photo['G'])
photo_data.create_dataset('B', data=photo['B'])


Out[24]:
<HDF5 dataset "B": shape (20151164,), type "<f8">

In [24]:
#del lidar_test['3d_photo']

In [25]:
lidar_test.close()

Storage is a bit less efficient here.

  • ASCII cloud: 2.1 GB
  • .LAZ format with same data: 215 MB
  • HDF file containing LiDAR, trajectory, 3D photo cloud: 1.33 GB

So, there's probably a case for keeping super dense clouds in different files (along with all their ancillary data). Note that .LAZ is able to store all the data used for the super dense cloud here. But - how do we query it efficiently?

Also, this is just a demonstration, so we push on!

now, lets look at the HDF file... and get stuff


In [26]:
from netCDF4 import Dataset

In [27]:
thedata = Dataset('lidar_test.hdf5', 'r')

In [28]:
thedata


Out[28]:
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    file_name: lidar_test.hdf5
    codebase: https://github.com/adamsteer/matlab_LIDAR
    dimensions(sizes): 
    variables(dimensions): 
    groups: 3d_photo, swath_data, traj_data

There are the two groups - swath_data and traj_data


In [29]:
swath = thedata['swath_data']

In [30]:
swath


Out[30]:
<class 'netCDF4._netCDF4.Group'>
group /swath_data:
    dimensions(sizes): phony_dim_1(1957932)
    variables(dimensions): float64 3D_UNCERT(phony_dim_1), float64 ANGLE(phony_dim_1), float64 GPS_SOW(phony_dim_1), float64 INTENS(phony_dim_1), float64 UTM_X(phony_dim_1), float64 UTM_Y(phony_dim_1), float64 X_UNCERT(phony_dim_1), float64 Y_UNCERT(phony_dim_1), float64 Z(phony_dim_1), float64 Z_UNCERT(phony_dim_1)
    groups: 

In [31]:
utm_xy = np.column_stack((swath['UTM_X'],swath['UTM_Y']))

In [32]:
idx = np.where((utm_xy[:,0] > -100) & (utm_xy[:,0] < 200) & (utm_xy[:,1] > -100) & (utm_xy[:,1] < 200)  )

In [33]:
chunk_z = swath['Z'][idx]
chunk_z.size


Out[33]:
43264

In [34]:
max(chunk_z)


Out[34]:
1.1131660000000001

In [35]:
chunk_x = swath['UTM_X'][idx]
chunk_x.size


Out[35]:
43264

In [36]:
chunk_y = swath['UTM_Y'][idx]
chunk_y.size


Out[36]:
43264

In [37]:
chunk_uncert = swath['Z_UNCERT'][idx]
chunk_uncert.size


Out[37]:
43264

In [38]:
plt.scatter(chunk_x, chunk_y, c=chunk_z, lw=0, s=2)


Out[38]:
<matplotlib.collections.PathCollection at 0x11e2a9e48>

That gave us a small chunk of LIDAR points, without loading the whole point dataset. Neat!

...but being continually dissatisfied, we want more! Lets get just the corresponding trajectory:


In [39]:
traj = thedata['traj_data']

In [40]:
traj


Out[40]:
<class 'netCDF4._netCDF4.Group'>
group /traj_data:
    flight: 11
    pass: 1
    source: RAPPLS flight 11, SIPEX-II 2012
    dimensions(sizes): phony_dim_2(51495)
    variables(dimensions): float64 pos_x(phony_dim_2), float64 pos_y(phony_dim_2), float64 pos_z(phony_dim_2)
    groups: 

Because there's essentiually no X extent for flight data, only the Y coordinate of the flight data are needed...


In [41]:
pos_y = traj['pos_y']

In [42]:
idx = np.where((pos_y[:] > -100.) & (pos_y[:] < 200.))

In [43]:
cpos_x = traj['pos_x'][idx]

In [44]:
cpos_y = traj['pos_y'][idx]

In [45]:
cpos_z = traj['pos_z'][idx]

Now plot the flight line and LiDAR together


In [46]:
plt.scatter(chunk_x, chunk_y, c=chunk_z, lw=0, s=3, cmap='gist_earth')
plt.scatter(cpos_x, cpos_y, c=cpos_z, lw=0, s=5, cmap='Oranges')


Out[46]:
<matplotlib.collections.PathCollection at 0x393cb0320>

...and prove that we are looking at a trajectory and some LiDAR


In [47]:
from mpl_toolkits.mplot3d import Axes3D

In [48]:
#set up a plot
plt_az=310
plt_elev = 40.
plt_s = 3
cb_fmt = '%.1f'

cmap1 = plt.get_cmap('gist_earth', 10)

#make a plot
fig = plt.figure()
fig.set_size_inches(35/2.51, 20/2.51)
ax0 = fig.add_subplot(111, projection='3d')
a0 = ax0.scatter(chunk_x, chunk_y, (chunk_z-min(chunk_z))*2,
                 c=np.ndarray.tolist((chunk_z-min(chunk_z))*2),\
                cmap=cmap1,lw=0, vmin = -0.5, vmax = 5, s=plt_s)
ax0.scatter(cpos_x, cpos_y, cpos_z, c=np.ndarray.tolist(cpos_z),\
                cmap='hot', lw=0, vmin = 250, vmax = 265, s=10)
ax0.view_init(elev=plt_elev, azim=plt_az)
plt.tight_layout()


plot coloured by point uncertainty


In [49]:
#set up a plot
plt_az=310
plt_elev = 40.
plt_s = 3
cb_fmt = '%.1f'

cmap1 = plt.get_cmap('gist_earth', 30)

#make a plot
fig = plt.figure()
fig.set_size_inches(35/2.51, 20/2.51)
ax0 = fig.add_subplot(111, projection='3d')
a0 = ax0.scatter(chunk_x, chunk_y, (chunk_z-min(chunk_z))*2,
                 c=np.ndarray.tolist(chunk_uncert),\
                 cmap=cmap1, lw=0, vmin = 0, vmax = 0.2, s=plt_s)
ax0.scatter(cpos_x, cpos_y, cpos_z, c=np.ndarray.tolist(cpos_z),\
                cmap='hot', lw=0, vmin = 250, vmax = 265, s=10)
ax0.view_init(elev=plt_elev, azim=plt_az)
plt.tight_layout()
plt.savefig('thefig.png')


now pull in the photogrammetry cloud

This gets a little messy, since it appears we still need to grab X and Y dimensions - so still 20 x 10^6 x 2 points. Better than 20 x 10^6 x 6, but I wonder if I'm missing something about indexing.


In [50]:
photo = thedata['3d_photo']

In [51]:
photo


Out[51]:
<class 'netCDF4._netCDF4.Group'>
group /3d_photo:
    dimensions(sizes): phony_dim_0(20151164)
    variables(dimensions): float64 B(phony_dim_0), float64 G(phony_dim_0), float64 R(phony_dim_0), float64 UTM_X(phony_dim_0), float64 UTM_Y(phony_dim_0), float64 Z(phony_dim_0)
    groups: 

In [52]:
photo_xy = np.column_stack((photo['UTM_X'],photo['UTM_Y']))

In [53]:
idx_p = np.where((photo_xy[:,0] > 0) & (photo_xy[:,0] < 100) & (photo_xy[:,1] > 0) & (photo_xy[:,1] < 100)  )

In [54]:
plt.scatter(photo['UTM_X'][idx_p], photo['UTM_Y'][idx_p], c = photo['Z'][idx_p],\
                cmap='hot',vmin=-1, vmax=1, lw=0, s=plt_s)


Out[54]:
<matplotlib.collections.PathCollection at 0x130b9e828>

In [55]:
p_x = photo['UTM_X'][idx_p]
p_y = photo['UTM_Y'][idx_p]
p_z = photo['Z'][idx_p]

In [56]:
plt_az=310
plt_elev = 70.
plt_s = 2

#make a plot
fig = plt.figure()
fig.set_size_inches(25/2.51, 10/2.51)
ax0 = fig.add_subplot(111, projection='3d')

#LiDAR points
ax0.scatter(chunk_x, chunk_y, chunk_z-50, \
            c=np.ndarray.tolist(chunk_z),\
            cmap=cmap1, vmin=-30, vmax=2, lw=0, s=plt_s)

#3D photogrammetry pointd
ax0.scatter(p_x, p_y, p_z, 
            c=np.ndarray.tolist(p_z),\
            cmap='hot', vmin=-1, vmax=1, lw=0, s=5)

#aicraft trajectory
ax0.scatter(cpos_x, cpos_y, cpos_z, c=np.ndarray.tolist(cpos_z),\
                cmap='hot', lw=0, vmin = 250, vmax = 265, s=10)


ax0.view_init(elev=plt_elev, azim=plt_az)
plt.tight_layout()
plt.savefig('with_photo.png')


This is kind of a clunky plot - but you get the idea (I hope). LiDAR is in blues, the 100 x 100 photogrammetry patch in orange, trajectory in orange. Different data sources, different resolutions, extracted using pretty much the same set of queries.


In [57]:
print('LiDAR points: {0}\nphotogrammetry points: {1}\ntrajectory points: {2}'.
      format(len(chunk_x), len(p_x), len(cpos_x) ))


LiDAR points: 43264
photogrammetry points: 817370
trajectory points: 1448

So what's happened here:

Using HDF as the storage medium, this worksheet has shown how to:

  • put point clouds into HDF, storing many attributes that are not possible with .LAS
  • start on provenance data, including the trajectory that LiDAR points were generated from (this could be expanded to other properties, eg airborne GPS positions, IMU observations, survey marks, versions of trajectories, ...)
  • query for a point subset without loading all the point cloud
  • and query for trajectory data in the same bounding box without loading the whole trajectory.

What could be better

  • Compression for bigger point clouds. .LAZ is obviously better, how can HDF/NetCDF storage happen more efficiently?

next steps:

  • recreate this sheet using a single NetCDF file per dataset, with proper metadata
  • break up different point cloud sources (makes obvious sense, put them together here just because)
  • recreate this example using Davis Station data, which has topography

use cases to test:

  • Say I want a region that is broken up across different flights/surveys/tiles. How can I get consistent data? and how can I find out if different regions of data have different uncertainties attached?
  • ???

Dreams:

  • integration with the plasio viewer (http://plas.io)? This is a webGL point cloud viewer, currently reading .LAZ files. I will contact the developers about reading from other formats. Obviously useful for data preview, but not an infrastructure priority.

In [ ]: