Finding an Offset Between a Video file and an XIO file with tracking data

Mumodo Demo Notebook -- Updated on 24.04.2015

Summary: This notebook explains how to synchronize data that comes from different sensors. There is tracking data stored in an XIO file (see relevant notebook) and Audio-Video data. The method described uses a visible Timecode in the video that represents the simultaneous timestamp of the tracking data

(c) Dialogue Systems Group, University of Bielefeld


In [1]:
%matplotlib inline
from IPython.display import Image
from mumodo.xiofile import XIOFile
from mumodo.mumodoIO import open_streamframe_from_xiofile
from mumodo.plotting import plot_scalar
import pandas as pd

After recording a video and audio with a camera and tracking with venice.hub, we need to synchronize the two data channels. We use a visual synchronization method, by displaying the timestamp of the computer logging data with venice.hub in the view of the video camera.


In [2]:
Image(filename="sampledata/testimage.png")


Out[2]:

We load the video in an editor with frame-by-frame seeking capabilities and find a frame in the video in which the numbers on the screen can be read well (such as above). We store this number and its time in the video ((converted to milliseconds), e.g.


In [3]:
Timepoints = []
Timepoints.append({'video_time': 2408, 'timestamp': 9192636608})
Timepoints


Out[3]:
[{'timestamp': 9192636608, 'video_time': 2408}]

We now turn to our recorded tracking data, of which we want to find the minimum time, i.e. the *first timestamp in the XIO file", as follows:


In [4]:
minXIOtime = XIOFile("sampledata/test.xio.gz").min_time
minXIOtime


opening compressed file ...
opening file without indexing
Out[4]:
1429192624579

We see that the last 10 digits of the timestamp were visible on the screem, so we keep only those for our minXIOtime:


In [5]:
minXIOtime = minXIOtime % 10000000000
minXIOtime


Out[5]:
9192624579

Next, we adjust the timestamps in Timepoints to be relative to this initial timestamp, e.g.


In [6]:
for tp in Timepoints:
    tp['timestamp'] -= minXIOtime
Timepoints


Out[6]:
[{'timestamp': 12029, 'video_time': 2408}]

Finally, we compute the offset between this two times: The timestamp is relative to the beginning of the XIO file, and the 'video time' is relative to the beginning of the video file. Hence, their difference is the offset between these two files


In [7]:
for tp in Timepoints:
    tp['offset'] = tp['video_time'] - tp['timestamp']
Timepoints


Out[7]:
[{'offset': -9621, 'timestamp': 12029, 'video_time': 2408}]

So we compute an offset on -9621 ms, which means can be used in venice.hub for synchronized playback of the data, or to analyze the tracking data that we have recorded at times of interest taken from the audio/video (e.g. transcriptions)

It is possible to do this at more than one timepoint to make sure it is done correctly and to be more precise, by computing an average offset, e.g.:


In [8]:
Timepoints = []
Timepoints.append({'video_time': 2408, 'timestamp': 9192636608})
Timepoints.append({'video_time': 4015, 'timestamp': 9192638239})
Timepoints.append({'video_time': 7965, 'timestamp': 9192642139})
Timepoints.append({'video_time': 15170, 'timestamp': 9192649352})
for tp in Timepoints:
    tp['timestamp'] -= minXIOtime
    tp['offset'] = tp['video_time'] - tp['timestamp']
TimepointsFrame = pd.DataFrame(Timepoints)
TimepointsFrame


Out[8]:
offset timestamp video_time
0 -9621 12029 2408
1 -9645 13660 4015
2 -9595 17560 7965
3 -9603 24773 15170

The offset can now be computed more precisely:


In [9]:
TimepointsFrame['offset'].mean(), TimepointsFrame['offset'].std()


Out[9]:
(-9616.0, 22.181073012818835)

Replaying the data with venice.hub

    java - jar venicehub.jar -i Disk -o IIO -f test.xio.gz --offset -9616

Importing tracking data with known offset

The following command imports the skeleton data of all skeletons into one mumodo StreamFrame. The arguments are:

  • the name of the XIO file
  • the sensor name as one XIO file may contain data from several sensors
  • the offsetm which we computed above

In [10]:
skeletons = open_streamframe_from_xiofile("sampledata/test.xio.gz", 'VeniceHubReplay/Venice/Body1', timestamp_offset=-9616)


opening compressed file ...
opening file without indexing

In [11]:
skeletons.ix[:, ['JointPositions3', 'JointPositions4']].ix[0:100]


Out[11]:
JointPositions3 JointPositions4
30 [0.932158 -0.138976 1.84316, 0.935288 0.190036... [0.201554 -0.136232 2.08095, 0.186003 0.185379...
64 [0.931839 -0.138824 1.84315, 0.934947 0.190308... [0.201571 -0.136225 2.08102, 0.186005 0.185381...
96 [0.931625 -0.138733 1.84325, 0.934714 0.190464... [0.201583 -0.136239 2.08114, 0.186037 0.185401...

Syncing Data from sensors that have inherent lag

Facelab Eyetracker has an inherent lag due to tracking filters, when logging the 'accurate', rather than the 'realtime' data. That is, the timestamp of the index (time of logging) is not the same as the timestamp of the event described by the data point.

Luckily, we have access to the time of frame capture from the cameras in the data point itself.

The timestamp_offset kwargs allows us to import the raw timestamps, so that we can compute the lag!


In [12]:
SensorStream = open_streamframe_from_xiofile("sampledata/othersensor.xio.gz", 
                                             "Sensors/EyeTracker", 
                                             timestamp_offset=None)
SensorStream.dropna(inplace=True)


opening compressed file ...
opening file without indexing
non-int offset in input: raw timestamps from the file will be used

In the table below, the index is the (raw) time when the event was logged

However, frametimeSeconds and frameTimeMilliSeconds jointly store the time when the video frame was captured, that the data is based on


In [13]:
SensorStream[:10]


Out[13]:
frameTimeMilliSeconds frameTimeSeconds gazeCalibrated gazeQualityLevelLeft gazeQualityLevelRight gazedObject headPosition headPositionConfidence headRotation headTrackerState ... modelQualityLevel rightEyePosition rightEyeRotation rightHeadEyePosition rightPupilDiameter rightPupilPosition saccade time unifiedEyePosition unifiedEyeRotation
1341393414837 10 1341393413 False 0 0 no object 0.0 0.0 0.0 0 0.0 0.0 0.0 1.0 False ... 0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0 0.0 0.0 0.0 False 1341393414837 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1341393414856 26 1341393413 False 0 0 no object 0.0 0.0 0.0 0 0.0 0.0 0.0 1.0 False ... 0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0 0.0 0.0 0.0 False 1341393414856 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1341393414870 43 1341393413 False 0 0 no object 0.0 0.0 0.0 0 0.0 0.0 0.0 1.0 False ... 0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0 0.0 0.0 0.0 False 1341393414870 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1341393414884 60 1341393413 False 0 0 no object 0.0 0.0 0.0 0 0.0 0.0 0.0 1.0 False ... 0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0 0.0 0.0 0.0 False 1341393414884 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1341393414901 76 1341393413 False 0 0 no object 0.0 0.0 0.0 0 0.0 0.0 0.0 1.0 False ... 0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0 0.0 0.0 0.0 False 1341393414901 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1341393414915 93 1341393413 False 0 0 no object 0.0 0.0 0.0 0 0.0 0.0 0.0 1.0 False ... 0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0 0.0 0.0 0.0 False 1341393414915 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1341393414932 110 1341393413 False 0 0 no object 0.0 0.0 0.0 0 0.0 0.0 0.0 1.0 False ... 0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0 0.0 0.0 0.0 False 1341393414932 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1341393414947 126 1341393413 False 0 0 no object 0.0 0.0 0.0 0 0.0 0.0 0.0 1.0 False ... 0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0 0.0 0.0 0.0 False 1341393414947 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1341393414963 143 1341393413 False 0 0 no object 0.0 0.0 0.0 0 0.0 0.0 0.0 1.0 False ... 0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0 0.0 0.0 0.0 False 1341393414963 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1341393414978 160 1341393413 False 0 0 no object 0.0 0.0 0.0 0 0.0 0.0 0.0 1.0 False ... 0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0 0.0 0.0 0.0 False 1341393414978 0.0 0.0 0.0 0.0 0.0 0.0 0.0

10 rows × 26 columns


In [14]:
#Compute the time of the camera frame capture, from the two relevant fields 
SensorStream['time_real'] = 1000 * SensorStream['frameTimeSeconds'] + SensorStream['frameTimeMilliSeconds']

In [15]:
#plot the lag betweem the frame capture and the time of logging
SensorStream['lag'] = SensorStream['time']-SensorStream['time_real']
plot_scalar(SensorStream, ['lag']) #the Pandas f['lag'].plot() can also be used
print 'mean lag: {} \nstd of lag: {}'.format(SensorStream['lag'].mean(), SensorStream['lag'].std())


mean lag: 1824.78114478 
std of lag: 5.78005362735

We see than the mean lag is 1825 ms on average. Re-sync this input and use its time as index. Again, subtract the first timestamp to get relative times


In [16]:
SensorStream.drop_duplicates(subset=['time_real'], inplace=True)
SensorStream.index = SensorStream['time_real'].map(lambda x: int(x) - SensorStream.index[0]) 
SensorStream = SensorStream.sort_index()

In [17]:
SensorStream[:10]


Out[17]:
frameTimeMilliSeconds frameTimeSeconds gazeCalibrated gazeQualityLevelLeft gazeQualityLevelRight gazedObject headPosition headPositionConfidence headRotation headTrackerState ... rightEyeRotation rightHeadEyePosition rightPupilDiameter rightPupilPosition saccade time unifiedEyePosition unifiedEyeRotation time_real lag
time_real
-1827 10 1341393413 False 0 0 no object 0.0 0.0 0.0 0 0.0 0.0 0.0 1.0 False ... 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0 0.0 0.0 0.0 False 1341393414837 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1341393413010 1827
-1811 26 1341393413 False 0 0 no object 0.0 0.0 0.0 0 0.0 0.0 0.0 1.0 False ... 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0 0.0 0.0 0.0 False 1341393414856 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1341393413026 1830
-1794 43 1341393413 False 0 0 no object 0.0 0.0 0.0 0 0.0 0.0 0.0 1.0 False ... 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0 0.0 0.0 0.0 False 1341393414870 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1341393413043 1827
-1777 60 1341393413 False 0 0 no object 0.0 0.0 0.0 0 0.0 0.0 0.0 1.0 False ... 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0 0.0 0.0 0.0 False 1341393414884 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1341393413060 1824
-1761 76 1341393413 False 0 0 no object 0.0 0.0 0.0 0 0.0 0.0 0.0 1.0 False ... 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0 0.0 0.0 0.0 False 1341393414901 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1341393413076 1825
-1744 93 1341393413 False 0 0 no object 0.0 0.0 0.0 0 0.0 0.0 0.0 1.0 False ... 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0 0.0 0.0 0.0 False 1341393414915 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1341393413093 1822
-1727 110 1341393413 False 0 0 no object 0.0 0.0 0.0 0 0.0 0.0 0.0 1.0 False ... 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0 0.0 0.0 0.0 False 1341393414932 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1341393413110 1822
-1711 126 1341393413 False 0 0 no object 0.0 0.0 0.0 0 0.0 0.0 0.0 1.0 False ... 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0 0.0 0.0 0.0 False 1341393414947 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1341393413126 1821
-1694 143 1341393413 False 0 0 no object 0.0 0.0 0.0 0 0.0 0.0 0.0 1.0 False ... 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0 0.0 0.0 0.0 False 1341393414963 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1341393413143 1820
-1677 160 1341393413 False 0 0 no object 0.0 0.0 0.0 0 0.0 0.0 0.0 1.0 False ... 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0 0.0 0.0 0.0 False 1341393414978 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1341393413160 1818

10 rows × 28 columns

As the negative indices now show, the actual time of the event at time zero, actually occurred 1827 ms before that and so on. In other words, the data point at time 6 ms below is the one that actually occurred at time 6, not logged at that time


In [18]:
SensorStream.ix[10:25]


Out[18]:
frameTimeMilliSeconds frameTimeSeconds gazeCalibrated gazeQualityLevelLeft gazeQualityLevelRight gazedObject headPosition headPositionConfidence headRotation headTrackerState ... rightEyeRotation rightHeadEyePosition rightPupilDiameter rightPupilPosition saccade time unifiedEyePosition unifiedEyeRotation time_real lag
time_real
23 860 1341393414 True 1 1 no object -0.058093723 0.27434936 0.8690072 0.2778256 -0.15590253 -0.24577384 0.0857443 0.95285755 True ... -0.13766831 -0.24356543 -0.034949202 0.9594279 -0.04074008 0.2856072 0.8914654 0 0.0 0.0 0.0 False 1341393416680 -0.058093723 0.27434936 0.8690072 -0.13766831 -0.24356543 -0.034949202 0.9594279 1341393414860 1820

1 rows × 28 columns

If we already know the lag, then we can import the file and synhronize the times directly as follows:


In [19]:
SensorStream = open_streamframe_from_xiofile("sampledata/othersensor.xio.gz", 
                                             "Sensors/EyeTracker", 
                                             timestamp_offset=-1825)
SensorStream.dropna(inplace=True)


opening compressed file ...
opening file without indexing

In [20]:
SensorStream.ix[0:20]


Out[20]:
frameTimeMilliSeconds frameTimeSeconds gazeCalibrated gazeQualityLevelLeft gazeQualityLevelRight gazedObject headPosition headPositionConfidence headRotation headTrackerState ... modelQualityLevel rightEyePosition rightEyeRotation rightHeadEyePosition rightPupilDiameter rightPupilPosition saccade time unifiedEyePosition unifiedEyeRotation
18 860 1341393414 True 1 1 no object -0.058093723 0.27434936 0.8690072 0.2778256 -0.15590253 -0.24577384 0.0857443 0.95285755 True ... 100 -0.08762002 0.26899412 0.86431086 -0.13766831 -0.24356543 -0.034949202 0.9594279 -0.04074008 0.2856072 0.8914654 0 0.0 0.0 0.0 False 1341393416680 -0.058093723 0.27434936 0.8690072 -0.13766831 -0.24356543 -0.034949202 0.9594279

1 rows × 26 columns

This difference of 7 ms from the previous method is due to the fact that in the second method, we synchronized all points with the mean lag. In the first method, we happen to know the lag for each point, hence we can compute the mean lag. It is more common that we only have a rough idea about the lag, so the second method is included in the open_streamframe_from_xiofile() as standard