Mumodo Demo Notebook - Updated on 25.04.2015
Summary: This notebook describes how to work with mumodos, getting slices and concatenating the slices back together
(c) Dialogue Systems Group, University of Bielefeld
In [1]:
%matplotlib inline
import mumodo.corpus as cp
from mumodo.analysis import shift_tier
from moviepy.editor import ImageClip, concatenate
from mumodo.plotting import plot_scalar, plot_annotations
import pandas as pd
from IPython.display import HTML
NOTE: This notebook assumes basic knowledge of mumodo.corpus functionality. Please have a look at the notebook titled "Managing Corpora and Resources with mumodo" if you haven't already.
Let's imagine the following analysis scenario. We want to look at the three clap gestures in all of our recorded streams. We want to get three "slices" or cross-section of our multimodal corpus, and concatenate them together. The latter we need for presentation purposes, but one could think of other uses, such as running analysis on specific parts of the corpus.
Let's import our mumodos!
In [2]:
inputmumodos = cp.read_mumodo_from_file('sampledata/test.mumodo')
When slicing, it is important to know what units we are using. Currently mumodo does not know how to cleverly read and convert time formats, and the units must be explicitly set. Our corpus only has resources in seconds and milliseconds, so we only need to convert between those.
In [3]:
#helper function for unit conversions
def convert_units(source, target, values):
conversions = [('seconds', 'ms', lambda x: int(1000 * x)),
('ms', 'seconds', lambda x: float(x) / 1000.)]
for c in conversions:
#print c[:2]
if (source, target) == c[:2]:
return map(c[2], values)
Next, we make a test that all the resources can be retrieved from their respective files. In addition, we use "duck typing" (a pythonic way) to find out which resources can be sliced.
NOTE: This operation also causes each resource to load into memory. Subsequent calls that access the resource objects will be considerabaly faster
In [4]:
sliceable = []
for m in inputmumodos:
boundaries = [0, 1.0] #not necessary, but by convention we use float for seconds and int for ms
units = 'seconds' #explicitly say what units the values represent
for r in m:
if r.get_units() != units and r.get_units() is not None:
start, end = convert_units(units, r.get_units(), boundaries)
else:
start, end = boundaries
print r.get_name()
#duck type to see if this resource can be sliced
try:
r.get_slice(start, end)
sliceable.append(r.get_name())
except AttributeError:
print "cannot slice " + r.get_type()
print sliceable
Next, we define the times of the slices 5 seconds around the CLAP points. This is best done as an IntervalFrame
NOTE: The 'mumodo' column is not required, but it is used to show how it is possible to get slices from several mumodos at once
In [5]:
slices = inputmumodos[0]['clap_points'].get_tier().copy(deep=True)
slices['start_time'] = slices['time'] - 6.0
slices['end_time'] = slices['time'] + 2.0
slices['mumodo'] = inputmumodos[0].get_name()
slices = slices.ix[:, ['start_time', 'end_time', 'mumodo']]
slices
Out[5]:
The code below generates the concatenated slices based on the inputmumodos list and the slices IntervalFrame. Here is what it does in a nutshell:
The following parameters are of interest:
silence - the duration of the silence inserted between slices during concatenation
units - this applies both to silence as well as the times of the slices
Finally, a black image file is required to insert silence in the concatenated video, and it must have the same dimensions as the video
In [6]:
silence = 2
silentclip = ImageClip("sampledata/black.jpg")
units = 'seconds'
result = dict() #the concatenated sliced objects are stored here
newboundaries = [] #this will be the annotation of the slice boundaries after concatenation
for i in slices.index:
#compute the new times of this slice AFTER concatenation
boundaries = (slices['start_time'].ix[i], slices['end_time'].ix[i])
mumodoid = slices['mumodo'].ix[i]
if i > 0:
cumulative_duration = (slices['end_time'].ix[:i-1] - slices['start_time'].ix[:i-1]).sum() + (i -1) * silence
newboundaries.append((silence + cumulative_duration,
silence + cumulative_duration + (boundaries[1] - boundaries[0])))
else:
cumulative_duration = 0
newboundaries.append((0, boundaries[1] - boundaries[0]))
#find which mumodo we are talking about!
C = None
for m in inputmumodos:
if m.get_name() == mumodoid:
C = m
break
#no loop through the resources in mumodo object C
for r in C:
#exclude non-sliceable resource types
if r.get_name() not in sliceable:
continue
#convert units if required
if r.get_units() != units:
start, end = convert_units(units, r.get_units(), boundaries)
nstart, nend = convert_units(units, r.get_units(), newboundaries[-1])
silencedur = convert_units(units, r.get_units(), [silence])[0]
else:
start, end = boundaries
nstart, nend = newboundaries[-1]
silencedur = silence
#get the new slice out of this resource
#streamframes and intervalframes must not preserve times!
right = r.get_slice(start, end)
if r.get_type() == 'TierResource':
right = right.copy(deep=True) #need a copy of the object as times are shifted
shift_tier(right, -start)
if r.get_type() == 'StreamResource':
right.index -= start
#if this the first slice of this resource, just put it in the result dict
if r.get_name() not in result:
result[r.get_name()] = right
else:
#temporary storage of the old value
left = result[r.get_name()]
#perform concatenation based on the resource type
if r.get_type() == 'VideoResource':
silentclip.duration = silencedur
result[r.get_name()] = concatenate([left, silentclip, right])
#This is a rather complicated way to concatenate audios with moviepy
elif r.get_type() == 'AudioResource':
silentclip.duration = silencedur
leftclip = ImageClip("sampledata/black.jpg").set_duration(left.duration)
leftclip.audio = left
rightclip = ImageClip("sampledata/black.jpg").set_duration(right.duration)
rightclip.audio = right
result[r.get_name()] = concatenate([leftclip, silentclip, rightclip]).audio
elif r.get_type() == 'StreamResource':
right.index += nstart
result[r.get_name()] = left.append(right)
elif r.get_type() == 'TierResource':
shift_tier(right, nstart)
result[r.get_name()] = left.append(right, ignore_index=True)
Let's see what we got!
In [7]:
result.keys()
Out[7]:
We make the new slice boundaries into an IntervalFrame, and plot annotations
In [8]:
nb = pd.DataFrame(newboundaries, columns=['start_time', 'end_time'])
nb['text'] = ['1', '2', '3']
In [9]:
plot_annotations({'O': result['transcriptionO'],
'S': result['transcriptionS'],
'claps': result['clap_points'],
'slices': nb}, pointwidth=0.2,
linespan=40, hscale=0.475)
Let's look at the right hand Y trajectory for both kinect bodies
In [10]:
result['kinect_body']['rhandY1'] = result['kinect_body']['JointPositions3'].map(lambda x: x[11].y)
result['kinect_body']['rhandY2'] = result['kinect_body']['JointPositions4'].map(lambda x: x[11].y)
plot_scalar(result['kinect_body'], ['rhandY1', 'rhandY2'])
Finally, let's create a concatenated AV file
In [11]:
result['video'].audio = result['audio'] #this adds the audio to the video
result['video'].write_videofile("concatenated.mp4")
In [12]:
def show_html_video(fname, mimetype):
"""Load the video in the file `fname`, with given mimetype, and display as HTML5 video.
"""
video_encoded = open(fname, "rb").read().encode("base64")
video_tag = '<video controls alt="test" src="data:video/{0};base64,{1}">'.format(mimetype, video_encoded)
return HTML(data=video_tag)
In [13]:
show_html_video("concatenated.mp4", 'mp4')
Out[13]: