Working with Real Tracking Data

Mumodo Demo Notebook - Updated on 24.04.2015

Summary: This notebook showcases working with real tracking data. In particular, data from a Kinect V2 sensor is imported and analyzed

(c) Dialogue Systems Group, University of Bielefeld


In [1]:
%matplotlib inline
import math
import matplotlib.pyplot as plt
from mumodo.mumodoIO import open_streamframe_from_xiofile, open_intervalframe_from_textgrid
from mumodo.plotting import plot_scalar, plot_annotations
from mumodo.analysis import create_intervalframe_from_streamframe, convert_times_of_tier

We are going to import data recorded with a Microsoft Kinect V2 for Windows sensor from an XIO file


In [2]:
KinectData = open_streamframe_from_xiofile("sampledata/test.xio.gz", 'VeniceHubReplay/Venice/Body1')


opening compressed file ...
opening file without indexing

In [3]:
KinectData[:2]


Out[3]:
JointPositions1 JointPositions2 JointPositions3 JointPositions4 JointPositions5 JointPositions6 time
0 [] [] [0.957876 -0.152858 1.7562, 0.945315 0.162776 ... [-0.345256 -0.750549 0.922279, -0.359958 -0.48... [] [] 1429192624579
18 [] [] [0.957869 -0.152829 1.75661, 0.945076 0.162929... [-0.345244 -0.750662 0.922202, -0.359946 -0.48... [] [] 1429192624597

The sensor can record data for up to 6 bodies, but it has recorded data from two only. So why not keep only those two?


In [4]:
skeletons = KinectData.ix[:, ['JointPositions3', 'JointPositions4']]
skeletons[:2]


Out[4]:
JointPositions3 JointPositions4
0 [0.957876 -0.152858 1.7562, 0.945315 0.162776 ... [-0.345256 -0.750549 0.922279, -0.359958 -0.48...
18 [0.957869 -0.152829 1.75661, 0.945076 0.162929... [-0.345244 -0.750662 0.922202, -0.359946 -0.48...

Commonly we need to look for (and drop) not-a-numbers:


In [5]:
print len(skeletons), len(skeletons.dropna())
skeletons.dropna(inplace=True)


2016 2015

Each cell in the table has a snapshot of the whole skeleton (25 joints) at that moment in time.


In [6]:
set(skeletons['JointPositions3'].map(lambda x: len(x))), set(skeletons['JointPositions4'].map(lambda x: len(x)))


Out[6]:
({25}, {25})

We can decode the cells of each table into individual joints using the following input from the sensor's documentation

From http://msdn.microsoft.com/en-us/library/microsoft.kinect.kinect.jointtype.aspx

typedef enum _JointType
{
    JointType_SpineBase = 0,
    JointType_SpineMid = 1,
    JointType_Neck = 2,
    JointType_Head = 3,
    JointType_ShoulderLeft = 4,
    JointType_ElbowLeft = 5,
    JointType_WristLeft = 6,
    JointType_HandLeft = 7,
    JointType_ShoulderRight = 8,
    JointType_ElbowRight = 9,
    JointType_WristRight = 10,
    JointType_HandRight = 11,
    JointType_HipLeft = 12,
    JointType_KneeLeft = 13,
    JointType_AnkleLeft = 14,
    JointType_FootLeft = 15,
    JointType_HipRight = 16,
    JointType_KneeRight = 17,
    JointType_AnkleRight = 18,
    JointType_FootRight = 19,
    JointType_SpineShoulder = 20,
    JointType_HandTipLeft = 21,
    JointType_ThumbLeft = 22,
    JointType_HandTipRight = 23,
    JointType_ThumbRight = 24,
    JointType_Count = (JointType_ThumbRight+1)
}

In [7]:
#Create new columns for some of the joints we are interested in|
skeletons['HandRight3'] = skeletons['JointPositions3'].map(lambda x: x[11])
skeletons['HandRight4'] = skeletons['JointPositions4'].map(lambda x: x[11])

Example Analysis

During the demo video included in sample data, the two people do a "high-five" clap 3 times, using their right hands. We would like to measure the distance between their hands during this joint gesture.


In [8]:
clap_times = open_intervalframe_from_textgrid("sampledata/test.TextGrid")['CLAPS']
clap_times


Out[8]:
time mark
0 11.654230 First Clap
1 33.824485 Second Clap
2 47.672685 Third Clap

We want to see the distance before and after the clap instant, so let's turn this into an IntervalFrame instead:


In [9]:
context = 2 #duration before and after the clap
clap_times['start_time'] = clap_times['time'] - context
clap_times['end_time'] = clap_times['time'] + context
del clap_times['time']
clap_times['text'] = clap_times['mark']
del clap_times['mark']
clap_times = clap_times.ix[:, ['start_time', 'end_time', 'text']]
clap_times


Out[9]:
start_time end_time text
0 9.654230 13.654230 First Clap
1 31.824485 35.824485 Second Clap
2 45.672685 49.672685 Third Clap

In addition, we need to offset the tracking data in order to synchronize it with these times. See the notebook "ComputingOffset" for more details


In [10]:
skeletons.index -= 9616

Next we define a function to copute distance (or we can use the one from scipy or numpy)


In [11]:
def euclidean_distance(a, b):
    """ compute the euclidean distance between two SFVec3f 
    
    """
    return math.sqrt( (a.x - b.x) ** 2 + (a.y - b.y) ** 2  + (a.z - b.z) ** 2)

In [12]:
#create and populate the new column
skeletons['HandDistance'] = skeletons['HandRight3']
for i in skeletons.index:
    skeletons['HandDistance'].ix[i] = euclidean_distance(skeletons['HandRight3'].ix[i], 
                                                         skeletons['HandRight4'].ix[i])

In [13]:
plot_scalar(skeletons, ['HandDistance'])


we can plot the data only around clap episodes


In [14]:
episode_no = 2 #this the index of the interval in the IntervalFrame with clap episodes
start = int(1000 * clap_times['start_time'].ix[episode_no]) #convert start and end times to ms
end = int(1000 * clap_times['end_time'].ix[episode_no]) 
plot_scalar(skeletons, ['HandDistance'], start, end)


Conversely, we can try to locate the claps by the distance measure, e.g.


In [15]:
detected_claps = create_intervalframe_from_streamframe(skeletons, 'HandDistance', lambda x: x < 0.2, 40)
convert_times_of_tier(detected_claps, lambda x: float(x) / 1000) #convert times to ms
detected_claps


Out[15]:
start_time end_time text
0 11.827 11.891 True
1 33.883 34.046 True
2 47.706 47.807 True
3 47.873 47.906 True

In [16]:
#plot the detected claps as well as the annotated claps
plot_annotations({'annotated': open_intervalframe_from_textgrid("sampledata/test.TextGrid")['CLAPS'],
                  'detected': detected_claps}, linespan = 10, hscale=2)