Plotting massive data sets

This notebook plots about half a million LIDAR points around Toronto from the KITTI data set. (Source) The data is meant to be played over time. With pydeck, we can render these points and interact with them.

Cleaning the data

First we need to import the data. Each row of data represents one x/y/z coordinate for a point in space at a point in time, with each frame representing about 115,000 points.

We also need to scale the points to plot closely on a map. These point coordinates are not given in latitude and longitude, so as a workaround we'll plot them very close to (0, 0) on the earth.

In future versions of pydeck other viewports, like a flat plane, will be supported out-of-the-box. For now, we'll make do with scaling the points.


In [ ]:
import pandas as pd
all_lidar = pd.concat([
    pd.read_csv('https://raw.githubusercontent.com/ajduberstein/kitti_subset/master/kitti_1.csv'),
    pd.read_csv('https://raw.githubusercontent.com/ajduberstein/kitti_subset/master/kitti_2.csv'),
    pd.read_csv('https://raw.githubusercontent.com/ajduberstein/kitti_subset/master/kitti_3.csv'),
    pd.read_csv('https://raw.githubusercontent.com/ajduberstein/kitti_subset/master/kitti_4.csv'),
])

# Filter to one frame of data
lidar = all_lidar[all_lidar['source'] == 136]
lidar.loc[: , ['x', 'y']] = lidar[['x', 'y']] / 10000

Plotting the data

We'll define a single PointCloudLayer and plot it.

Pydeck by default expects the input of get_position to be a string name indicating a single position value. For convenience, you can pass in a string indicating the X/Y/Z coordinate, here get_position='[x, y, z]'. You also have access to a small expression parser--in our get_position function here, we increase the size of the z coordinate times 10.

Using pydeck.data_utils.compute_view, we'll zoom to the approximate center of the data.


In [ ]:
import pydeck as pdk


point_cloud = pdk.Layer(
    'PointCloudLayer',
    lidar[['x', 'y', 'z']],
    get_position='@@=[x, y, z * 10]',
    get_normal=[0, 0, 1],
    get_color=[255, 0, 100, 200],
    pickable=True,  
    auto_highlight=True,
    point_size=1)


view_state = pdk.data_utils.compute_view(lidar[['x', 'y']], 0.9)
view_state.max_pitch = 360
view_state.pitch = 80
view_state.bearing = 120

r = pdk.Deck(
    point_cloud,
    initial_view_state=view_state,
    map_style='')
r.show()

In [ ]:
import time
from collections import deque

# Choose a handful of frames to loop through
frame_buffer = deque([42, 56, 81, 95])
print('Press the stop icon to exit')
while True:
    current_frame = frame_buffer[0]
    lidar = all_lidar[all_lidar['source'] == current_frame]
    r.layers[0].get_position = '@@=[x / 10000, y / 10000, z * 10]'
    r.layers[0].data = lidar.to_dict(orient='records')
    frame_buffer.rotate()
    r.update()
    time.sleep(0.5)