Working with Real Tracking Data

Mumodo Demo Notebook - Updated on 24.04.2015

Summary: This notebook showcases working with real tracking data. In particular, data from a Kinect V2 sensor is imported and analyzed

(c) Dialogue Systems Group, University of Bielefeld



In [1]:

    
%matplotlib inline
import math
import matplotlib.pyplot as plt
from mumodo.mumodoIO import open_streamframe_from_xiofile, open_intervalframe_from_textgrid
from mumodo.plotting import plot_scalar, plot_annotations
from mumodo.analysis import create_intervalframe_from_streamframe, convert_times_of_tier

We are going to import data recorded with a Microsoft Kinect V2 for Windows sensor from an XIO file



In [2]:

    
KinectData = open_streamframe_from_xiofile("sampledata/test.xio.gz", 'VeniceHubReplay/Venice/Body1')









    



opening compressed file ...
opening file without indexing



In [3]:

    
KinectData[:2]









    Out[3]:






  
    
      
      JointPositions1
      JointPositions2
      JointPositions3
      JointPositions4
      JointPositions5
      JointPositions6
      time
    
  
  
    
      0 
       []
       []
       [0.957876 -0.152858 1.7562, 0.945315 0.162776 ...
       [-0.345256 -0.750549 0.922279, -0.359958 -0.48...
       []
       []
       1429192624579
    
    
      18
       []
       []
       [0.957869 -0.152829 1.75661, 0.945076 0.162929...
       [-0.345244 -0.750662 0.922202, -0.359946 -0.48...
       []
       []
       1429192624597

The sensor can record data for up to 6 bodies, but it has recorded data from two only. So why not keep only those two?



In [4]:

    
skeletons = KinectData.ix[:, ['JointPositions3', 'JointPositions4']]
skeletons[:2]









    Out[4]:






  
    
      
      JointPositions3
      JointPositions4
    
  
  
    
      0 
       [0.957876 -0.152858 1.7562, 0.945315 0.162776 ...
       [-0.345256 -0.750549 0.922279, -0.359958 -0.48...
    
    
      18
       [0.957869 -0.152829 1.75661, 0.945076 0.162929...
       [-0.345244 -0.750662 0.922202, -0.359946 -0.48...

Commonly we need to look for (and drop) not-a-numbers:



In [5]:

    
print len(skeletons), len(skeletons.dropna())
skeletons.dropna(inplace=True)

Each cell in the table has a snapshot of the whole skeleton (25 joints) at that moment in time.



In [6]:

    
set(skeletons['JointPositions3'].map(lambda x: len(x))), set(skeletons['JointPositions4'].map(lambda x: len(x)))









    Out[6]:





({25}, {25})

We can decode the cells of each table into individual joints using the following input from the sensor's documentation

From http://msdn.microsoft.com/en-us/library/microsoft.kinect.kinect.jointtype.aspx

typedef enum _JointType
{
    JointType_SpineBase = 0,
    JointType_SpineMid = 1,
    JointType_Neck = 2,
    JointType_Head = 3,
    JointType_ShoulderLeft = 4,
    JointType_ElbowLeft = 5,
    JointType_WristLeft = 6,
    JointType_HandLeft = 7,
    JointType_ShoulderRight = 8,
    JointType_ElbowRight = 9,
    JointType_WristRight = 10,
    JointType_HandRight = 11,
    JointType_HipLeft = 12,
    JointType_KneeLeft = 13,
    JointType_AnkleLeft = 14,
    JointType_FootLeft = 15,
    JointType_HipRight = 16,
    JointType_KneeRight = 17,
    JointType_AnkleRight = 18,
    JointType_FootRight = 19,
    JointType_SpineShoulder = 20,
    JointType_HandTipLeft = 21,
    JointType_ThumbLeft = 22,
    JointType_HandTipRight = 23,
    JointType_ThumbRight = 24,
    JointType_Count = (JointType_ThumbRight+1)
}



In [7]:

    
#Create new columns for some of the joints we are interested in|
skeletons['HandRight3'] = skeletons['JointPositions3'].map(lambda x: x[11])
skeletons['HandRight4'] = skeletons['JointPositions4'].map(lambda x: x[11])

Example Analysis

During the demo video included in sample data, the two people do a "high-five" clap 3 times, using their right hands. We would like to measure the distance between their hands during this joint gesture.



In [8]:

    
clap_times = open_intervalframe_from_textgrid("sampledata/test.TextGrid")['CLAPS']
clap_times









    Out[8]:






  
    
      
      time
      mark
    
  
  
    
      0
       11.654230
        First Clap
    
    
      1
       33.824485
       Second Clap
    
    
      2
       47.672685
        Third Clap

We want to see the distance before and after the clap instant, so let's turn this into an IntervalFrame instead:



In [9]:

    
context = 2 #duration before and after the clap
clap_times['start_time'] = clap_times['time'] - context
clap_times['end_time'] = clap_times['time'] + context
del clap_times['time']
clap_times['text'] = clap_times['mark']
del clap_times['mark']
clap_times = clap_times.ix[:, ['start_time', 'end_time', 'text']]
clap_times









    Out[9]:






  
    
      
      start_time
      end_time
      text
    
  
  
    
      0
        9.654230
       13.654230
        First Clap
    
    
      1
       31.824485
       35.824485
       Second Clap
    
    
      2
       45.672685
       49.672685
        Third Clap

In addition, we need to offset the tracking data in order to synchronize it with these times. See the notebook "ComputingOffset" for more details



In [10]:

    
skeletons.index -= 9616

Next we define a function to copute distance (or we can use the one from scipy or numpy)



In [11]:

    
def euclidean_distance(a, b):
    """ compute the euclidean distance between two SFVec3f 
    
    """
    return math.sqrt( (a.x - b.x) ** 2 + (a.y - b.y) ** 2  + (a.z - b.z) ** 2)



In [12]:

    
#create and populate the new column
skeletons['HandDistance'] = skeletons['HandRight3']
for i in skeletons.index:
    skeletons['HandDistance'].ix[i] = euclidean_distance(skeletons['HandRight3'].ix[i], 
                                                         skeletons['HandRight4'].ix[i])



In [13]:

    
plot_scalar(skeletons, ['HandDistance'])

we can plot the data only around clap episodes



In [14]:

    
episode_no = 2 #this the index of the interval in the IntervalFrame with clap episodes
start = int(1000 * clap_times['start_time'].ix[episode_no]) #convert start and end times to ms
end = int(1000 * clap_times['end_time'].ix[episode_no]) 
plot_scalar(skeletons, ['HandDistance'], start, end)

Conversely, we can try to locate the claps by the distance measure, e.g.



In [15]:

    
detected_claps = create_intervalframe_from_streamframe(skeletons, 'HandDistance', lambda x: x < 0.2, 40)
convert_times_of_tier(detected_claps, lambda x: float(x) / 1000) #convert times to ms
detected_claps



In [16]:

    
#plot the detected claps as well as the annotated claps
plot_annotations({'annotated': open_intervalframe_from_textgrid("sampledata/test.TextGrid")['CLAPS'],
                  'detected': detected_claps}, linespan = 10, hscale=2)

	start_time	end_time	text
0	11.827	11.891	True
1	33.883	34.046	True
2	47.706	47.807	True
3	47.873	47.906	True

	JointPositions1	JointPositions2	JointPositions3	JointPositions4	JointPositions5	JointPositions6	time
0	[]	[]	[0.957876 -0.152858 1.7562, 0.945315 0.162776 ...	[-0.345256 -0.750549 0.922279, -0.359958 -0.48...	[]	[]	1429192624579
18	[]	[]	[0.957869 -0.152829 1.75661, 0.945076 0.162929...	[-0.345244 -0.750662 0.922202, -0.359946 -0.48...	[]	[]	1429192624597

	time	mark
0	11.654230	First Clap
1	33.824485	Second Clap
2	47.672685	Third Clap

	start_time	end_time	text
0	9.654230	13.654230	First Clap
1	31.824485	35.824485	Second Clap
2	45.672685	49.672685	Third Clap