...and then extract data from the HDF file at different rates using a spatial 'query'
ASCII point clouds with the following attributes, currently used as a full M x N array:
Everything above is easily stored in the binary .LAS format (or .LAZ). It is kept in ASCII because the following additional data have no slots in .LAS:
...and optionally (depending on the use case):
...and derived data:
So, you can see how quickly .LAS loses it's value. ASCII point clouds are conceptually simple, but very big - and not well suited to use in a HPC context. Too much storage overhead, and you have to read the entire file in order to extract a subset. Six million points gets to around 50MB, it's pretty inefficient.
So lets look at about 6 million points...
Scenario: A small set of LiDAR and 3D photogrammetry points collected adjacent to a ship (RSV Aurora Australia) parked in sea ice.
Source: AAD's APPLS system (http://seaice.acecrc.org.au/crrs/)
https://data.aad.gov.au/aadc/metadata/metadata.cfm?entry_id=SIPEX_II_RAPPLS
https://data.aad.gov.au/aadc/metadata/metadata.cfm?entry_id=SIPEX_LiDAR_sea_ice
Pretty cover photo:
In [1]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import LogNorm
%matplotlib inline
#import plot_lidar
from datetime import datetime
In [3]:
swath = np.genfromtxt('../../PhD/python-phd/swaths/is6_f11_pass1_aa_nr2_522816_523019_c.xyz')
In [4]:
import pandas as pd
columns = ['time', 'X', 'Y', 'Z', 'I','A', 'x_u', 'y_u', 'z_u', '3D_u']
swath = pd.DataFrame(swath, columns=columns)
In [5]:
swath[1:5]
Out[5]:
In [6]:
air_traj = np.genfromtxt('../../PhD/is6_f11/trajectory/is6_f11_pass1_local_ice_rot.3dp')
In [7]:
columns = ['time', 'X', 'Y', 'Z', 'R', 'P', 'H', 'x_u', 'y_u', 'z_u', 'r_u', 'p_u', 'h_u']
air_traj = pd.DataFrame(air_traj, columns=columns)
In [8]:
air_traj[1:5]
Out[8]:
In [9]:
fig = plt.figure(figsize = ([30/2.54, 6/2.54]))
ax0 = fig.add_subplot(111)
a0 = ax0.scatter(swath['Y'], swath['X'], c=swath['Z'] - np.min(swath['Z']), cmap = 'gist_earth',
vmin=0, vmax=10, edgecolors=None,lw=0, s=0.6)
a1 = ax0.scatter(air_traj['Y'], air_traj['X'], c=air_traj['Z'], cmap = 'Reds',
lw=0, s=1)
plt.tight_layout()
In [10]:
import h5py
In [11]:
#create a file instance, with the intention to write it out
lidar_test = h5py.File('lidar_test.hdf5', 'w')
In [12]:
swath_data = lidar_test.create_group('swath_data')
In [13]:
swath_data.create_dataset('GPS_SOW', data=swath['time'])
Out[13]:
In [14]:
#some data
swath_data.create_dataset('UTM_X', data=swath['X'])
swath_data.create_dataset('UTM_Y', data=swath['Y'])
swath_data.create_dataset('Z', data=swath['Z'])
swath_data.create_dataset('INTENS', data=swath['I'])
swath_data.create_dataset('ANGLE', data=swath['A'])
swath_data.create_dataset('X_UNCERT', data=swath['x_u'])
swath_data.create_dataset('Y_UNCERT', data=swath['y_u'])
swath_data.create_dataset('Z_UNCERT', data=swath['z_u'])
swath_data.create_dataset('3D_UNCERT', data=swath['3D_u'])
Out[14]:
In [15]:
#some attributes
lidar_test.attrs['file_name'] = 'lidar_test.hdf5'
lidar_test.attrs['codebase'] = 'https://github.com/adamsteer/matlab_LIDAR'
In [16]:
traj_data = lidar_test.create_group('traj_data')
In [17]:
#some attributes
traj_data.attrs['flight'] = 11
traj_data.attrs['pass'] = 1
traj_data.attrs['source'] = 'RAPPLS flight 11, SIPEX-II 2012'
In [18]:
#some data
traj_data.create_dataset('pos_x', data = air_traj['X'])
traj_data.create_dataset('pos_y', data = air_traj['Y'])
traj_data.create_dataset('pos_z', data = air_traj['Z'])
Out[18]:
In [19]:
lidar_test.close()
The generated file is substantially smaller than the combined sources - 158 MB from 193, with no attention paid to optimisation.
The .LAZ version of the input text file here is 66 MB. More compact, but we can't query it directly - and we have to fake fields! Everything in the swath dataset can be stored, but we need to pretend uncertainties are RGB, so if person X comes along and doesn't read the metadata well, they get crazy colours, call us up and complain. Or we need to use .LAZ extra bits, and deal with awkward ways of describing things.
It's also probably a terrible HDF, with no respect to CF compliance at all. That's to come :)
In [20]:
photo = np.genfromtxt('/Users/adam/Documents/PhD/is6_f11/photoscan/is6_f11_photoscan_Cloud.txt',skip_header=1)
In [21]:
columns = ['X', 'Y', 'Z', 'R', 'G', 'B']
photo = pd.DataFrame(photo[:,0:6], columns=columns)
In [22]:
#create a file instance, with the intention to write it out
lidar_test = h5py.File('lidar_test.hdf5', 'r+')
In [23]:
photo_data = lidar_test.create_group('3d_photo')
In [24]:
photo_data.create_dataset('UTM_X', data=photo['X'])
photo_data.create_dataset('UTM_Y', data=photo['Y'])
photo_data.create_dataset('Z', data=photo['Z'])
photo_data.create_dataset('R', data=photo['R'])
photo_data.create_dataset('G', data=photo['G'])
photo_data.create_dataset('B', data=photo['B'])
Out[24]:
In [24]:
#del lidar_test['3d_photo']
In [25]:
lidar_test.close()
Storage is a bit less efficient here.
So, there's probably a case for keeping super dense clouds in different files (along with all their ancillary data). Note that .LAZ is able to store all the data used for the super dense cloud here. But - how do we query it efficiently?
Also, this is just a demonstration, so we push on!
In [26]:
from netCDF4 import Dataset
In [27]:
thedata = Dataset('lidar_test.hdf5', 'r')
In [28]:
thedata
Out[28]:
There are the two groups - swath_data and traj_data
In [29]:
swath = thedata['swath_data']
In [30]:
swath
Out[30]:
In [31]:
utm_xy = np.column_stack((swath['UTM_X'],swath['UTM_Y']))
In [32]:
idx = np.where((utm_xy[:,0] > -100) & (utm_xy[:,0] < 200) & (utm_xy[:,1] > -100) & (utm_xy[:,1] < 200) )
In [33]:
chunk_z = swath['Z'][idx]
chunk_z.size
Out[33]:
In [34]:
max(chunk_z)
Out[34]:
In [35]:
chunk_x = swath['UTM_X'][idx]
chunk_x.size
Out[35]:
In [36]:
chunk_y = swath['UTM_Y'][idx]
chunk_y.size
Out[36]:
In [37]:
chunk_uncert = swath['Z_UNCERT'][idx]
chunk_uncert.size
Out[37]:
In [38]:
plt.scatter(chunk_x, chunk_y, c=chunk_z, lw=0, s=2)
Out[38]:
In [39]:
traj = thedata['traj_data']
In [40]:
traj
Out[40]:
Because there's essentiually no X extent for flight data, only the Y coordinate of the flight data are needed...
In [41]:
pos_y = traj['pos_y']
In [42]:
idx = np.where((pos_y[:] > -100.) & (pos_y[:] < 200.))
In [43]:
cpos_x = traj['pos_x'][idx]
In [44]:
cpos_y = traj['pos_y'][idx]
In [45]:
cpos_z = traj['pos_z'][idx]
In [46]:
plt.scatter(chunk_x, chunk_y, c=chunk_z, lw=0, s=3, cmap='gist_earth')
plt.scatter(cpos_x, cpos_y, c=cpos_z, lw=0, s=5, cmap='Oranges')
Out[46]:
...and prove that we are looking at a trajectory and some LiDAR
In [47]:
from mpl_toolkits.mplot3d import Axes3D
In [48]:
#set up a plot
plt_az=310
plt_elev = 40.
plt_s = 3
cb_fmt = '%.1f'
cmap1 = plt.get_cmap('gist_earth', 10)
#make a plot
fig = plt.figure()
fig.set_size_inches(35/2.51, 20/2.51)
ax0 = fig.add_subplot(111, projection='3d')
a0 = ax0.scatter(chunk_x, chunk_y, (chunk_z-min(chunk_z))*2,
c=np.ndarray.tolist((chunk_z-min(chunk_z))*2),\
cmap=cmap1,lw=0, vmin = -0.5, vmax = 5, s=plt_s)
ax0.scatter(cpos_x, cpos_y, cpos_z, c=np.ndarray.tolist(cpos_z),\
cmap='hot', lw=0, vmin = 250, vmax = 265, s=10)
ax0.view_init(elev=plt_elev, azim=plt_az)
plt.tight_layout()
In [49]:
#set up a plot
plt_az=310
plt_elev = 40.
plt_s = 3
cb_fmt = '%.1f'
cmap1 = plt.get_cmap('gist_earth', 30)
#make a plot
fig = plt.figure()
fig.set_size_inches(35/2.51, 20/2.51)
ax0 = fig.add_subplot(111, projection='3d')
a0 = ax0.scatter(chunk_x, chunk_y, (chunk_z-min(chunk_z))*2,
c=np.ndarray.tolist(chunk_uncert),\
cmap=cmap1, lw=0, vmin = 0, vmax = 0.2, s=plt_s)
ax0.scatter(cpos_x, cpos_y, cpos_z, c=np.ndarray.tolist(cpos_z),\
cmap='hot', lw=0, vmin = 250, vmax = 265, s=10)
ax0.view_init(elev=plt_elev, azim=plt_az)
plt.tight_layout()
plt.savefig('thefig.png')
In [50]:
photo = thedata['3d_photo']
In [51]:
photo
Out[51]:
In [52]:
photo_xy = np.column_stack((photo['UTM_X'],photo['UTM_Y']))
In [53]:
idx_p = np.where((photo_xy[:,0] > 0) & (photo_xy[:,0] < 100) & (photo_xy[:,1] > 0) & (photo_xy[:,1] < 100) )
In [54]:
plt.scatter(photo['UTM_X'][idx_p], photo['UTM_Y'][idx_p], c = photo['Z'][idx_p],\
cmap='hot',vmin=-1, vmax=1, lw=0, s=plt_s)
Out[54]:
In [55]:
p_x = photo['UTM_X'][idx_p]
p_y = photo['UTM_Y'][idx_p]
p_z = photo['Z'][idx_p]
In [56]:
plt_az=310
plt_elev = 70.
plt_s = 2
#make a plot
fig = plt.figure()
fig.set_size_inches(25/2.51, 10/2.51)
ax0 = fig.add_subplot(111, projection='3d')
#LiDAR points
ax0.scatter(chunk_x, chunk_y, chunk_z-50, \
c=np.ndarray.tolist(chunk_z),\
cmap=cmap1, vmin=-30, vmax=2, lw=0, s=plt_s)
#3D photogrammetry pointd
ax0.scatter(p_x, p_y, p_z,
c=np.ndarray.tolist(p_z),\
cmap='hot', vmin=-1, vmax=1, lw=0, s=5)
#aicraft trajectory
ax0.scatter(cpos_x, cpos_y, cpos_z, c=np.ndarray.tolist(cpos_z),\
cmap='hot', lw=0, vmin = 250, vmax = 265, s=10)
ax0.view_init(elev=plt_elev, azim=plt_az)
plt.tight_layout()
plt.savefig('with_photo.png')
This is kind of a clunky plot - but you get the idea (I hope). LiDAR is in blues, the 100 x 100 photogrammetry patch in orange, trajectory in orange. Different data sources, different resolutions, extracted using pretty much the same set of queries.
In [57]:
print('LiDAR points: {0}\nphotogrammetry points: {1}\ntrajectory points: {2}'.
format(len(chunk_x), len(p_x), len(cpos_x) ))
Using HDF as the storage medium, this worksheet has shown how to:
In [ ]: