What is the proposed task:

ingest some liDAR points into a HDF file
ingest the aircraft trajectory into the file
anything else

...and then extract data from the HDF file at different rates using a spatial 'query'

What do the data look like?

ASCII point clouds with the following attributes, currently used as a full M x N array:

time (GPS second of week, float)
X coordinate (UTM metres, float)
Y coordinate (UTM metres, float)
Z coordinate (ellipsoidal height, float)
return intensity (unscaled, float)
scan angle (degrees, float)

Everything above is easily stored in the binary .LAS format (or .LAZ). It is kept in ASCII because the following additional data have no slots in .LAS:

X uncertainty (aircraft reference frame, m, float)
Y uncertainty (aircraft reference frame, m, float)
Z uncertainty (aircraft reference frame, m, float)
3D uncertainty (metres, float)

...and optionally (depending on the use case):

aircraft trajectory height (to ITRF08, metres)
aicraft position uncertainty X (metres, relative to aircraft position)
aircraft and sensor attributes

...and derived data:

sea ice elevations (m, float)
estimted snow depths (m, float)
estimated snow depth uncertainty (m, float)
estimated ice thickness (m, float)
estimated ice thickness uncertainty (m, float)

So, you can see how quickly .LAS loses it's value. ASCII point clouds are conceptually simple, but very big - and not well suited to use in a HPC context. Too much storage overhead, and you have to read the entire file in order to extract a subset. Six million points gets to around 50MB, it's pretty inefficient.

So lets look at about 6 million points...

Let's look at a small set of points

Scenario: A small set of LiDAR and 3D photogrammetry points collected adjacent to a ship (RSV Aurora Australia) parked in sea ice.

Source: AAD's APPLS system (http://seaice.acecrc.org.au/crrs/)

https://data.aad.gov.au/aadc/metadata/metadata.cfm?entry_id=SIPEX_II_RAPPLS

https://data.aad.gov.au/aadc/metadata/metadata.cfm?entry_id=SIPEX_LiDAR_sea_ice

Pretty cover photo:



In [1]:

    
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import LogNorm
%matplotlib inline
#import plot_lidar
from datetime import datetime

import a LiDAR swath



In [3]:

    
swath = np.genfromtxt('../../PhD/python-phd/swaths/is6_f11_pass1_aa_nr2_522816_523019_c.xyz')



In [4]:

    
import pandas as pd

columns = ['time', 'X', 'Y', 'Z', 'I','A', 'x_u', 'y_u', 'z_u', '3D_u']
swath = pd.DataFrame(swath, columns=columns)



In [5]:

    
swath[1:5]









    Out[5]:






  
    
      
      time
      X
      Y
      Z
      I
      A
      x_u
      y_u
      z_u
      3D_u
    
  
  
    
      1
      522816.00544
      -113.710607
      -6077.889536
      -27.795405
      8995.0
      56.595811
      0.2184
      0.3094
      0.1440
      0.4052
    
    
      2
      522816.00547
      -112.621341
      -6077.699073
      -27.805177
      9252.0
      56.795901
      0.2184
      0.3077
      0.1431
      0.4036
    
    
      3
      522816.00550
      -111.563030
      -6077.515571
      -27.856036
      8995.0
      56.995991
      0.2185
      0.3060
      0.1422
      0.4020
    
    
      4
      522816.00554
      -110.491277
      -6077.328057
      -27.878673
      9252.0
      57.196111
      0.2185
      0.3044
      0.1413
      0.4004

Now load up the aircraft trajectory



In [6]:

    
air_traj = np.genfromtxt('../../PhD/is6_f11/trajectory/is6_f11_pass1_local_ice_rot.3dp')



In [7]:

    
columns = ['time', 'X', 'Y', 'Z', 'R', 'P', 'H', 'x_u', 'y_u', 'z_u', 'r_u', 'p_u', 'h_u']
air_traj = pd.DataFrame(air_traj, columns=columns)



In [8]:

    
air_traj[1:5]









    Out[8]:






  
    
      
      time
      X
      Y
      Z
      R
      P
      H
      x_u
      y_u
      z_u
      r_u
      p_u
      h_u
    
  
  
    
      1
      522815.012
      30.90322
      -6092.56695
      193.840
      4.073
      -3.345
      -8.85948
      0.054
      0.083
      0.053
      0.034
      0.037
      0.104
    
    
      2
      522815.016
      30.89646
      -6092.37386
      193.842
      4.068
      -3.339
      -8.85748
      0.054
      0.083
      0.053
      0.034
      0.037
      0.104
    
    
      3
      522815.020
      30.89081
      -6092.18033
      193.846
      4.053
      -3.327
      -8.85448
      0.054
      0.083
      0.053
      0.034
      0.037
      0.104
    
    
      4
      522815.024
      30.88405
      -6091.98725
      193.848
      4.027
      -3.304
      -8.85048
      0.054
      0.083
      0.053
      0.034
      0.037
      0.104

take a quick look at the data



In [9]:

    
fig = plt.figure(figsize = ([30/2.54, 6/2.54]))
ax0 = fig.add_subplot(111)      
a0 = ax0.scatter(swath['Y'],  swath['X'], c=swath['Z'] - np.min(swath['Z']), cmap = 'gist_earth',
                vmin=0, vmax=10, edgecolors=None,lw=0, s=0.6)
a1 = ax0.scatter(air_traj['Y'], air_traj['X'], c=air_traj['Z'], cmap = 'Reds',
                lw=0, s=1)
plt.tight_layout()

Making a HDF file out of those points



In [10]:

    
import h5py



In [11]:

    
#create a file instance, with the intention to write it out
lidar_test = h5py.File('lidar_test.hdf5', 'w')



In [12]:

    
swath_data = lidar_test.create_group('swath_data')



In [13]:

    
swath_data.create_dataset('GPS_SOW', data=swath['time'])









    Out[13]:





<HDF5 dataset "GPS_SOW": shape (1957932,), type "<f8">



In [14]:

    
#some data
swath_data.create_dataset('UTM_X', data=swath['X'])
swath_data.create_dataset('UTM_Y', data=swath['Y'])
swath_data.create_dataset('Z', data=swath['Z'])
swath_data.create_dataset('INTENS', data=swath['I'])
swath_data.create_dataset('ANGLE', data=swath['A'])
swath_data.create_dataset('X_UNCERT', data=swath['x_u'])
swath_data.create_dataset('Y_UNCERT', data=swath['y_u'])
swath_data.create_dataset('Z_UNCERT', data=swath['z_u'])
swath_data.create_dataset('3D_UNCERT', data=swath['3D_u'])









    Out[14]:





<HDF5 dataset "3D_UNCERT": shape (1957932,), type "<f8">



In [15]:

    
#some attributes
lidar_test.attrs['file_name'] = 'lidar_test.hdf5'

lidar_test.attrs['codebase'] = 'https://github.com/adamsteer/matlab_LIDAR'

That's some swath data, now some trajectory data at a different sampling rate



In [16]:

    
traj_data = lidar_test.create_group('traj_data')



In [17]:

    
#some attributes
traj_data.attrs['flight'] = 11
traj_data.attrs['pass'] = 1
traj_data.attrs['source'] = 'RAPPLS flight 11, SIPEX-II 2012'



In [18]:

    
#some data
traj_data.create_dataset('pos_x', data = air_traj['X'])
traj_data.create_dataset('pos_y', data =  air_traj['Y'])
traj_data.create_dataset('pos_z', data =  air_traj['Z'])









    Out[18]:





<HDF5 dataset "pos_z": shape (51495,), type "<f8">

close and write the file out



In [19]:

    
lidar_test.close()

OK, that's an arbitrary HDF file built

The generated file is substantially smaller than the combined sources - 158 MB from 193, with no attention paid to optimisation.

The .LAZ version of the input text file here is 66 MB. More compact, but we can't query it directly - and we have to fake fields! Everything in the swath dataset can be stored, but we need to pretend uncertainties are RGB, so if person X comes along and doesn't read the metadata well, they get crazy colours, call us up and complain. Or we need to use .LAZ extra bits, and deal with awkward ways of describing things.

It's also probably a terrible HDF, with no respect to CF compliance at all. That's to come :)

And now we add some 3D photogrammetry at about 80 points/m^2:



In [20]:

    
photo = np.genfromtxt('/Users/adam/Documents/PhD/is6_f11/photoscan/is6_f11_photoscan_Cloud.txt',skip_header=1)



In [21]:

    
columns = ['X', 'Y', 'Z', 'R', 'G', 'B']
photo = pd.DataFrame(photo[:,0:6], columns=columns)



In [22]:

    
#create a file instance, with the intention to write it out
lidar_test = h5py.File('lidar_test.hdf5', 'r+')



In [23]:

    
photo_data = lidar_test.create_group('3d_photo')



In [24]:

    
photo_data.create_dataset('UTM_X', data=photo['X'])
photo_data.create_dataset('UTM_Y', data=photo['Y'])
photo_data.create_dataset('Z', data=photo['Z'])
photo_data.create_dataset('R', data=photo['R'])
photo_data.create_dataset('G', data=photo['G'])
photo_data.create_dataset('B', data=photo['B'])









    Out[24]:





<HDF5 dataset "B": shape (20151164,), type "<f8">



In [24]:

    
#del lidar_test['3d_photo']



In [25]:

    
lidar_test.close()

Storage is a bit less efficient here.

ASCII cloud: 2.1 GB
.LAZ format with same data: 215 MB
HDF file containing LiDAR, trajectory, 3D photo cloud: 1.33 GB

So, there's probably a case for keeping super dense clouds in different files (along with all their ancillary data). Note that .LAZ is able to store all the data used for the super dense cloud here. But - how do we query it efficiently?

Also, this is just a demonstration, so we push on!

now, lets look at the HDF file... and get stuff



In [26]:

    
from netCDF4 import Dataset



In [27]:

    
thedata = Dataset('lidar_test.hdf5', 'r')



In [28]:

    
thedata









    Out[28]:





<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    file_name: lidar_test.hdf5
    codebase: https://github.com/adamsteer/matlab_LIDAR
    dimensions(sizes): 
    variables(dimensions): 
    groups: 3d_photo, swath_data, traj_data

There are the two groups - swath_data and traj_data



In [29]:

    
swath = thedata['swath_data']



In [30]:

    
swath









    Out[30]:





<class 'netCDF4._netCDF4.Group'>
group /swath_data:
    dimensions(sizes): phony_dim_1(1957932)
    variables(dimensions): float64 3D_UNCERT(phony_dim_1), float64 ANGLE(phony_dim_1), float64 GPS_SOW(phony_dim_1), float64 INTENS(phony_dim_1), float64 UTM_X(phony_dim_1), float64 UTM_Y(phony_dim_1), float64 X_UNCERT(phony_dim_1), float64 Y_UNCERT(phony_dim_1), float64 Z(phony_dim_1), float64 Z_UNCERT(phony_dim_1)
    groups:



In [31]:

    
utm_xy = np.column_stack((swath['UTM_X'],swath['UTM_Y']))



In [32]:

    
idx = np.where((utm_xy[:,0] > -100) & (utm_xy[:,0] < 200) & (utm_xy[:,1] > -100) & (utm_xy[:,1] < 200)  )



In [33]:

    
chunk_z = swath['Z'][idx]
chunk_z.size









    Out[33]:





43264



In [34]:

    
max(chunk_z)









    Out[34]:





1.1131660000000001



In [35]:

    
chunk_x = swath['UTM_X'][idx]
chunk_x.size









    Out[35]:





43264



In [36]:

    
chunk_y = swath['UTM_Y'][idx]
chunk_y.size









    Out[36]:





43264



In [37]:

    
chunk_uncert = swath['Z_UNCERT'][idx]
chunk_uncert.size









    Out[37]:





43264



In [38]:

    
plt.scatter(chunk_x, chunk_y, c=chunk_z, lw=0, s=2)









    Out[38]:





<matplotlib.collections.PathCollection at 0x11e2a9e48>

That gave us a small chunk of LIDAR points, without loading the whole point dataset. Neat!

...but being continually dissatisfied, we want more! Lets get just the corresponding trajectory:



In [39]:

    
traj = thedata['traj_data']



In [40]:

    
traj









    Out[40]:





<class 'netCDF4._netCDF4.Group'>
group /traj_data:
    flight: 11
    pass: 1
    source: RAPPLS flight 11, SIPEX-II 2012
    dimensions(sizes): phony_dim_2(51495)
    variables(dimensions): float64 pos_x(phony_dim_2), float64 pos_y(phony_dim_2), float64 pos_z(phony_dim_2)
    groups:

Because there's essentiually no X extent for flight data, only the Y coordinate of the flight data are needed...



In [41]:

    
pos_y = traj['pos_y']



In [42]:

    
idx = np.where((pos_y[:] > -100.) & (pos_y[:] < 200.))



In [43]:

    
cpos_x = traj['pos_x'][idx]



In [44]:

    
cpos_y = traj['pos_y'][idx]



In [45]:

    
cpos_z = traj['pos_z'][idx]

Now plot the flight line and LiDAR together



In [46]:

    
plt.scatter(chunk_x, chunk_y, c=chunk_z, lw=0, s=3, cmap='gist_earth')
plt.scatter(cpos_x, cpos_y, c=cpos_z, lw=0, s=5, cmap='Oranges')









    Out[46]:





<matplotlib.collections.PathCollection at 0x393cb0320>

...and prove that we are looking at a trajectory and some LiDAR



In [47]:

    
from mpl_toolkits.mplot3d import Axes3D



In [48]:

    
#set up a plot
plt_az=310
plt_elev = 40.
plt_s = 3
cb_fmt = '%.1f'

cmap1 = plt.get_cmap('gist_earth', 10)

#make a plot
fig = plt.figure()
fig.set_size_inches(35/2.51, 20/2.51)
ax0 = fig.add_subplot(111, projection='3d')
a0 = ax0.scatter(chunk_x, chunk_y, (chunk_z-min(chunk_z))*2,
                 c=np.ndarray.tolist((chunk_z-min(chunk_z))*2),\
                cmap=cmap1,lw=0, vmin = -0.5, vmax = 5, s=plt_s)
ax0.scatter(cpos_x, cpos_y, cpos_z, c=np.ndarray.tolist(cpos_z),\
                cmap='hot', lw=0, vmin = 250, vmax = 265, s=10)
ax0.view_init(elev=plt_elev, azim=plt_az)
plt.tight_layout()

plot coloured by point uncertainty



In [49]:

    
#set up a plot
plt_az=310
plt_elev = 40.
plt_s = 3
cb_fmt = '%.1f'

cmap1 = plt.get_cmap('gist_earth', 30)

#make a plot
fig = plt.figure()
fig.set_size_inches(35/2.51, 20/2.51)
ax0 = fig.add_subplot(111, projection='3d')
a0 = ax0.scatter(chunk_x, chunk_y, (chunk_z-min(chunk_z))*2,
                 c=np.ndarray.tolist(chunk_uncert),\
                 cmap=cmap1, lw=0, vmin = 0, vmax = 0.2, s=plt_s)
ax0.scatter(cpos_x, cpos_y, cpos_z, c=np.ndarray.tolist(cpos_z),\
                cmap='hot', lw=0, vmin = 250, vmax = 265, s=10)
ax0.view_init(elev=plt_elev, azim=plt_az)
plt.tight_layout()
plt.savefig('thefig.png')

now pull in the photogrammetry cloud

This gets a little messy, since it appears we still need to grab X and Y dimensions - so still 20 x 10^6 x 2 points. Better than 20 x 10^6 x 6, but I wonder if I'm missing something about indexing.



In [50]:

    
photo = thedata['3d_photo']



In [51]:

    
photo









    Out[51]:





<class 'netCDF4._netCDF4.Group'>
group /3d_photo:
    dimensions(sizes): phony_dim_0(20151164)
    variables(dimensions): float64 B(phony_dim_0), float64 G(phony_dim_0), float64 R(phony_dim_0), float64 UTM_X(phony_dim_0), float64 UTM_Y(phony_dim_0), float64 Z(phony_dim_0)
    groups:



In [52]:

    
photo_xy = np.column_stack((photo['UTM_X'],photo['UTM_Y']))



In [53]:

    
idx_p = np.where((photo_xy[:,0] > 0) & (photo_xy[:,0] < 100) & (photo_xy[:,1] > 0) & (photo_xy[:,1] < 100)  )



In [54]:

    
plt.scatter(photo['UTM_X'][idx_p], photo['UTM_Y'][idx_p], c = photo['Z'][idx_p],\
                cmap='hot',vmin=-1, vmax=1, lw=0, s=plt_s)









    Out[54]:





<matplotlib.collections.PathCollection at 0x130b9e828>



In [55]:

    
p_x = photo['UTM_X'][idx_p]
p_y = photo['UTM_Y'][idx_p]
p_z = photo['Z'][idx_p]



In [56]:

    
plt_az=310
plt_elev = 70.
plt_s = 2

#make a plot
fig = plt.figure()
fig.set_size_inches(25/2.51, 10/2.51)
ax0 = fig.add_subplot(111, projection='3d')

#LiDAR points
ax0.scatter(chunk_x, chunk_y, chunk_z-50, \
            c=np.ndarray.tolist(chunk_z),\
            cmap=cmap1, vmin=-30, vmax=2, lw=0, s=plt_s)

#3D photogrammetry pointd
ax0.scatter(p_x, p_y, p_z, 
            c=np.ndarray.tolist(p_z),\
            cmap='hot', vmin=-1, vmax=1, lw=0, s=5)

#aicraft trajectory
ax0.scatter(cpos_x, cpos_y, cpos_z, c=np.ndarray.tolist(cpos_z),\
                cmap='hot', lw=0, vmin = 250, vmax = 265, s=10)


ax0.view_init(elev=plt_elev, azim=plt_az)
plt.tight_layout()
plt.savefig('with_photo.png')

This is kind of a clunky plot - but you get the idea (I hope). LiDAR is in blues, the 100 x 100 photogrammetry patch in orange, trajectory in orange. Different data sources, different resolutions, extracted using pretty much the same set of queries.



In [57]:

    
print('LiDAR points: {0}\nphotogrammetry points: {1}\ntrajectory points: {2}'.
      format(len(chunk_x), len(p_x), len(cpos_x) ))









    



LiDAR points: 43264
photogrammetry points: 817370
trajectory points: 1448

So what's happened here:

Using HDF as the storage medium, this worksheet has shown how to:

put point clouds into HDF, storing many attributes that are not possible with .LAS
start on provenance data, including the trajectory that LiDAR points were generated from (this could be expanded to other properties, eg airborne GPS positions, IMU observations, survey marks, versions of trajectories, ...)
query for a point subset without loading all the point cloud
and query for trajectory data in the same bounding box without loading the whole trajectory.

What could be better

Compression for bigger point clouds. .LAZ is obviously better, how can HDF/NetCDF storage happen more efficiently?

next steps:

recreate this sheet using a single NetCDF file per dataset, with proper metadata
break up different point cloud sources (makes obvious sense, put them together here just because)
recreate this example using Davis Station data, which has topography

use cases to test:

Say I want a region that is broken up across different flights/surveys/tiles. How can I get consistent data? and how can I find out if different regions of data have different uncertainties attached?
???

Dreams:

integration with the plasio viewer (http://plas.io)? This is a webGL point cloud viewer, currently reading .LAZ files. I will contact the developers about reading from other formats. Obviously useful for data preview, but not an infrastructure priority.



In [ ]:

	time	X	Y	Z	I	A	x_u	y_u	z_u	3D_u
1	522816.00544	-113.710607	-6077.889536	-27.795405	8995.0	56.595811	0.2184	0.3094	0.1440	0.4052
2	522816.00547	-112.621341	-6077.699073	-27.805177	9252.0	56.795901	0.2184	0.3077	0.1431	0.4036
3	522816.00550	-111.563030	-6077.515571	-27.856036	8995.0	56.995991	0.2185	0.3060	0.1422	0.4020
4	522816.00554	-110.491277	-6077.328057	-27.878673	9252.0	57.196111	0.2185	0.3044	0.1413	0.4004

	time	X	Y	Z	R	P	H	x_u	y_u	z_u	r_u	p_u	h_u
1	522815.012	30.90322	-6092.56695	193.840	4.073	-3.345	-8.85948	0.054	0.083	0.053	0.034	0.037	0.104
2	522815.016	30.89646	-6092.37386	193.842	4.068	-3.339	-8.85748	0.054	0.083	0.053	0.034	0.037	0.104
3	522815.020	30.89081	-6092.18033	193.846	4.053	-3.327	-8.85448	0.054	0.083	0.053	0.034	0.037	0.104
4	522815.024	30.88405	-6091.98725	193.848	4.027	-3.304	-8.85048	0.054	0.083	0.053	0.034	0.037	0.104