Working with mumodo Slices

Mumodo Demo Notebook - Updated on 25.04.2015

Summary: This notebook describes how to work with mumodos, getting slices and concatenating the slices back together

(c) Dialogue Systems Group, University of Bielefeld


In [1]:
%matplotlib inline
import mumodo.corpus as cp
from mumodo.analysis import shift_tier
from moviepy.editor import ImageClip, concatenate
from mumodo.plotting import plot_scalar, plot_annotations
import pandas as pd
from IPython.display import HTML

NOTE: This notebook assumes basic knowledge of mumodo.corpus functionality. Please have a look at the notebook titled "Managing Corpora and Resources with mumodo" if you haven't already.

Let's imagine the following analysis scenario. We want to look at the three clap gestures in all of our recorded streams. We want to get three "slices" or cross-section of our multimodal corpus, and concatenate them together. The latter we need for presentation purposes, but one could think of other uses, such as running analysis on specific parts of the corpus.

Let's import our mumodos!


In [2]:
inputmumodos = cp.read_mumodo_from_file('sampledata/test.mumodo')

When slicing, it is important to know what units we are using. Currently mumodo does not know how to cleverly read and convert time formats, and the units must be explicitly set. Our corpus only has resources in seconds and milliseconds, so we only need to convert between those.


In [3]:
#helper function for unit conversions
def convert_units(source, target, values):
    conversions = [('seconds', 'ms', lambda x: int(1000 * x)),
                   ('ms', 'seconds', lambda x: float(x) / 1000.)]
    for c in conversions:
        #print c[:2]
        if (source, target) == c[:2]:
            return map(c[2], values)

Next, we make a test that all the resources can be retrieved from their respective files. In addition, we use "duck typing" (a pythonic way) to find out which resources can be sliced.

NOTE: This operation also causes each resource to load into memory. Subsequent calls that access the resource objects will be considerabaly faster


In [4]:
sliceable = []
for m in inputmumodos:
    boundaries = [0, 1.0] #not necessary, but by convention we use float for seconds and int for ms
    units = 'seconds' #explicitly say what units the values represent
    for r in m:
        if r.get_units() != units and r.get_units() is not None:
            start, end = convert_units(units, r.get_units(), boundaries)
        else:
            start, end = boundaries
        print r.get_name()
        #duck type to see if this resource can be sliced
        try:
            r.get_slice(start, end)
            sliceable.append(r.get_name())
        except AttributeError:
            print "cannot slice " + r.get_type()
print sliceable


clap_points
image
cannot slice ImageResource
transcriptionO
kinect_body
Parsing XIO file (will be done only once).
Please wait ...
opening compressed file ...
opening file without indexing
EAF
cannot slice GenericResource
transcriptionS
video
audio
['clap_points', 'transcriptionO', 'kinect_body', 'transcriptionS', 'video', 'audio']

Next, we define the times of the slices 5 seconds around the CLAP points. This is best done as an IntervalFrame

NOTE: The 'mumodo' column is not required, but it is used to show how it is possible to get slices from several mumodos at once


In [5]:
slices = inputmumodos[0]['clap_points'].get_tier().copy(deep=True)
slices['start_time'] = slices['time'] - 6.0
slices['end_time'] = slices['time'] + 2.0
slices['mumodo'] = inputmumodos[0].get_name()
slices = slices.ix[:, ['start_time', 'end_time', 'mumodo']]
slices


Out[5]:
start_time end_time mumodo
0 5.654230 13.654230 A test corpus
1 27.824485 35.824485 A test corpus
2 41.672685 49.672685 A test corpus

The code below generates the concatenated slices based on the inputmumodos list and the slices IntervalFrame. Here is what it does in a nutshell:

  • loop through the rows of slices
  • for each row, read the start and end times of the slice, and transform them to those after concatenation
  • find and select the mumodo in the mumodo column from many possible in a list (inputmumodos)
    • loop through the (sliceable) resources of the currently selected mumodo and extract a slice of each
    • concatenate the resource chunks together and store the results incrementally
  • repeat until all of the slices have been cut/pasted

The following parameters are of interest:

silence - the duration of the silence inserted between slices during concatenation

units - this applies both to silence as well as the times of the slices

Finally, a black image file is required to insert silence in the concatenated video, and it must have the same dimensions as the video


In [6]:
silence = 2 
silentclip = ImageClip("sampledata/black.jpg")
units = 'seconds'
result = dict() #the concatenated sliced objects are stored here
newboundaries = [] #this will be the annotation of the slice boundaries after concatenation
for i in slices.index:
    #compute the new times of this slice AFTER concatenation
    boundaries = (slices['start_time'].ix[i], slices['end_time'].ix[i])
    mumodoid = slices['mumodo'].ix[i]
    if i > 0:
        cumulative_duration = (slices['end_time'].ix[:i-1] - slices['start_time'].ix[:i-1]).sum() + (i -1) * silence
        newboundaries.append((silence + cumulative_duration, 
                              silence + cumulative_duration + (boundaries[1] - boundaries[0])))
    else:
        cumulative_duration = 0
        newboundaries.append((0, boundaries[1] - boundaries[0]))
    
    #find which mumodo we are talking about!
    C = None
    for m in inputmumodos:
        if m.get_name() == mumodoid:
            C = m
            break
            
    #no loop through the resources in mumodo object C        
    for r in C:
        #exclude non-sliceable resource types
        if r.get_name() not in sliceable:
            continue    
        #convert units if required
        if r.get_units() != units:
            start, end = convert_units(units, r.get_units(), boundaries)
            nstart, nend = convert_units(units, r.get_units(), newboundaries[-1])
            silencedur = convert_units(units, r.get_units(), [silence])[0]
        else:
            start, end = boundaries
            nstart, nend = newboundaries[-1]
            silencedur = silence
        
        #get the new slice out of this resource
        #streamframes and intervalframes must not preserve times!
        right = r.get_slice(start, end)
        if r.get_type() == 'TierResource':
            right = right.copy(deep=True) #need a copy of the object as times are shifted
            shift_tier(right, -start)
        if r.get_type() == 'StreamResource':
            right.index -= start
        
        #if this the first slice of this resource, just put it in the result dict
        if  r.get_name() not in result:
            result[r.get_name()] = right
        else:
            #temporary storage of the old value
            left = result[r.get_name()]
            #perform concatenation based on the resource type
            if r.get_type() == 'VideoResource':
                silentclip.duration = silencedur
                result[r.get_name()] = concatenate([left, silentclip, right])
            #This is a rather complicated way to concatenate audios with moviepy
            elif r.get_type() == 'AudioResource':
                silentclip.duration = silencedur
                leftclip = ImageClip("sampledata/black.jpg").set_duration(left.duration)
                leftclip.audio = left
                rightclip = ImageClip("sampledata/black.jpg").set_duration(right.duration)
                rightclip.audio = right
                result[r.get_name()] = concatenate([leftclip, silentclip, rightclip]).audio
            elif r.get_type() == 'StreamResource':
                right.index += nstart
                result[r.get_name()] = left.append(right)
            elif r.get_type() == 'TierResource':
                shift_tier(right, nstart)
                result[r.get_name()] = left.append(right, ignore_index=True)

Let's see what we got!


In [7]:
result.keys()


Out[7]:
['clap_points',
 'transcriptionO',
 'kinect_body',
 'transcriptionS',
 'video',
 'audio']

We make the new slice boundaries into an IntervalFrame, and plot annotations


In [8]:
nb = pd.DataFrame(newboundaries, columns=['start_time', 'end_time'])
nb['text'] = ['1', '2', '3']

In [9]:
plot_annotations({'O': result['transcriptionO'],
                  'S': result['transcriptionS'],
                  'claps': result['clap_points'],
                  'slices': nb}, pointwidth=0.2,
                 linespan=40, hscale=0.475)


Let's look at the right hand Y trajectory for both kinect bodies


In [10]:
result['kinect_body']['rhandY1'] = result['kinect_body']['JointPositions3'].map(lambda x: x[11].y)
result['kinect_body']['rhandY2'] = result['kinect_body']['JointPositions4'].map(lambda x: x[11].y)
plot_scalar(result['kinect_body'], ['rhandY1', 'rhandY2'])


Finally, let's create a concatenated AV file


In [11]:
result['video'].audio = result['audio']  #this adds the audio to the video
result['video'].write_videofile("concatenated.mp4")


MoviePy: building video file concatenated.mp4
----------------------------------------
Writing audio in concatenatedTEMP_MPY_write_videofile_SOUND.mp3
Done writing Audio in concatenatedTEMP_MPY_write_videofile_SOUND.mp3 !

Writing video into concatenated.mp4
Done writing video in concatenated.mp4 !
Your video is ready !

In [12]:
def show_html_video(fname, mimetype):
    """Load the video in the file `fname`, with given mimetype, and display as HTML5 video.
    """
    video_encoded = open(fname, "rb").read().encode("base64")
    video_tag = '<video controls alt="test" src="data:video/{0};base64,{1}">'.format(mimetype, video_encoded)
    return HTML(data=video_tag)

In [13]:
show_html_video("concatenated.mp4", 'mp4')


Out[13]: