Doing Analysis with Mumodo

Mumodo Demo Notebook - Updated on 24.04.2015

Summary: This notebook showcases some useful analysis functions offered by mumodo. It is advised to look into the documentation of the code for a comprehensive list of all functions

(c) Dialogue Systems Group, University of Bielefeld


In [1]:
%matplotlib inline
from mumodo.analysis import create_streamframe_from_intervalframe, create_intervalframe_from_streamframe, \
                            intervalframe_overlaps, intervalframe_union, invert_intervalframe, \
                            slice_streamframe_on_intervals, slice_intervalframe_by_time, \
                            get_tier_boundaries, get_tier_type, convert_times_of_tier, shift_tier, convert_times_of_tiers
from mumodo.plotting import plot_annotations, plot_scalar
import pandas as pd
from random import random

Manipulating IntervalFrames

A typical analysis scenario involves manipulating interval frames. Let'look at some ways we can do that in mumodo. Suppose we have the following two IntervalFrames:


In [2]:
FrameA = pd.DataFrame([(0.1, 1.19, 'A'),
                       (1.2, 4.16, 'A'),
                       (8.1, 12.14, 'A'),
                       (15.3, 16.1, 'A'),
                       (17.9, 20.7, 'A')], columns=['start_time', 'end_time', 'text'])
FrameB = pd.DataFrame([(3.2, 9.1, 'B'),
                       (13.9, 18.02, 'B'),
                       (21.06, 22.71, 'B')], columns=['start_time', 'end_time', 'text'])

In [3]:
FrameA


Out[3]:
start_time end_time text
0 0.1 1.19 A
1 1.2 4.16 A
2 8.1 12.14 A
3 15.3 16.10 A
4 17.9 20.70 A

In [4]:
FrameB


Out[4]:
start_time end_time text
0 3.20 9.10 B
1 13.90 18.02 B
2 21.06 22.71 B

Get a slice from an IntervalFrame


In [5]:
#Get a slice of an intervalframe
slice_intervalframe_by_time(FrameA, 1, 8)


Out[5]:
start_time end_time text
0 1.0 1.19 A
1 1.2 4.16 A

In [6]:
plot_annotations({'original A': FrameA, 'sliced A': slice_intervalframe_by_time(FrameA, 1, 8)}, linespan=30)


Get the overlap/union of two IntervalFrames


In [7]:
plot_annotations({'A': FrameA,
                  'B': FrameB,
                  'overlap': intervalframe_overlaps(FrameA, FrameB),
                  'union': intervalframe_union(FrameA, FrameB)},
                 tierorder=['A', 'B', 'overlap', 'union'],
                 linespan=30)


Get the inverse of an intervalframe


In [8]:
plot_annotations({'original A': FrameA,
                  'inverted A': invert_intervalframe(FrameA)}, linespan=30)


Some useful functions for IntervalFrames


In [9]:
print get_tier_type(FrameA)  #can be one of 'point' or 'interval'
print get_tier_boundaries(FrameA)  #get the start and end time of an IntervalFrame or PointFrame


interval
(0.10000000000000001, 20.699999999999999)

In [10]:
A_in_ms = FrameA.copy(deep=True) #make a copy of Frame A
convert_times_of_tier(A_in_ms, lambda x: int(1000 * x)) #convert times to ms - changes the object 'in place'
A_in_ms


Out[10]:
start_time end_time text
0 100 1190 A
1 1200 4160 A
2 8100 12140 A
3 15300 16100 A
4 17900 20700 A

In [11]:
A_shifted = FrameA.copy(deep=True) #make a copy of Frame A
shift_tier(A_shifted, 3) #shifts all times by 3 seconds - changes the object 'in place'
A_shifted


Out[11]:
start_time end_time text
0 3.1 4.19 A
1 4.2 7.16 A
2 11.1 15.14 A
3 18.3 19.10 A
4 20.9 23.70 A

In [12]:
plot_annotations({'original_A': FrameA,
                  'shifted_A': A_shifted}, linespan=30)


Manipulating StreamFrames

Let as consider a typical time series as our tracking data


In [13]:
X = []
for i in range(0, 25000, 20):
    X.append((i, (i / 100) % 9, 3 * random() - 1.5))
Stream = pd.DataFrame(X, columns=['time', 'sawtooth', 'noise'])
Stream.index = Stream['time']
Stream.index.name = None
del Stream['time']

Due to the fact that time is the index of the StreamFrame, slicing is facilitated by Pandas:


In [14]:
#Get data between 140 and 450 ms
Stream.ix[140:450]


Out[14]:
sawtooth noise
140 1 1.122486
160 1 0.423973
180 1 -1.318555
200 2 -0.645495
220 2 0.869222
240 2 0.179023
260 2 -0.762345
280 2 -0.705659
300 3 1.259223
320 3 0.280827
340 3 -1.419144
360 3 0.679977
380 3 1.373179
400 4 -0.631176
420 4 0.768897
440 4 -1.005416

The boundaries are also easy to get, e.g:


In [15]:
Stream.index.min(), Stream.index.max()  #alternatively: Stream.index[0], Stream.index[-1]


Out[15]:
(0, 24980)

Other functions can be performed between columns, or create new ones, e.g:


In [16]:
Stream['model'] = Stream['sawtooth'] + Stream['noise']

Many Pandas methods exist to make analysis really user-friendly. Check Pandas documentation for more info


In [17]:
Stream['model'].describe()


Out[17]:
count    1250.000000
mean        3.947036
std         2.725404
min        -1.474620
25%         1.574892
50%         4.020390
75%         6.142063
max         9.497892
Name: model, dtype: float64

In [18]:
plot_scalar(Stream, ['model', 'sawtooth', 'noise'], 2000, 6000) #a sawtooth + noise model


Interaction between IntervalFrames and StreamFrames

Sometimes we want to abstract away from the raw data and instead create automatc annotations.

We can do this easily in mumodo by applying functions to StreamFrames in order to generate IntervalFrames, e.g:


In [19]:
GeneratedIntervals = create_intervalframe_from_streamframe(Stream, 'model', lambda x: x > 6, 50) #apply a boolean function
convert_times_of_tier(GeneratedIntervals, lambda x: float(x) / 1000) #turn times into seconds

In [20]:
#plot the intervals in which the condition is satisfied
plot_annotations({'Generated': GeneratedIntervals}, 2, 6, hscale=3.5)


We would also want to look into the value of a streamframe based on given intervals, e.g. what is the mean value of model during A, during B, and during overlaps of A and B? For this we can slice_streamframe_on_intervals


In [21]:
tierdict = {'A': FrameA.copy(deep=True),
            'B': FrameB.copy(deep=True),
            'O': intervalframe_overlaps(FrameA, FrameB)}
#convert all times to ms
convert_times_of_tiers(tierdict, lambda x: int(1000 * x))
#slice the StreamFrame on each IntervalFrame and get the mean value
for key in tierdict.keys():
    avg = slice_streamframe_on_intervals(Stream, tierdict[key])['model'].mean()
    print "the mean value of model during intervals of {} is {}".format(key, avg)


the mean value of model during intervals of A is 3.81000510471
the mean value of model during intervals of B is 4.10299520612
the mean value of model during intervals of O is 3.76827686617