Doing Analysis with Mumodo

Mumodo Demo Notebook - Updated on 24.04.2015

Summary: This notebook showcases some useful analysis functions offered by mumodo. It is advised to look into the documentation of the code for a comprehensive list of all functions

(c) Dialogue Systems Group, University of Bielefeld



In [1]:

    
%matplotlib inline
from mumodo.analysis import create_streamframe_from_intervalframe, create_intervalframe_from_streamframe, \
                            intervalframe_overlaps, intervalframe_union, invert_intervalframe, \
                            slice_streamframe_on_intervals, slice_intervalframe_by_time, \
                            get_tier_boundaries, get_tier_type, convert_times_of_tier, shift_tier, convert_times_of_tiers
from mumodo.plotting import plot_annotations, plot_scalar
import pandas as pd
from random import random

Manipulating IntervalFrames

A typical analysis scenario involves manipulating interval frames. Let'look at some ways we can do that in mumodo. Suppose we have the following two IntervalFrames:



In [2]:

    
FrameA = pd.DataFrame([(0.1, 1.19, 'A'),
                       (1.2, 4.16, 'A'),
                       (8.1, 12.14, 'A'),
                       (15.3, 16.1, 'A'),
                       (17.9, 20.7, 'A')], columns=['start_time', 'end_time', 'text'])
FrameB = pd.DataFrame([(3.2, 9.1, 'B'),
                       (13.9, 18.02, 'B'),
                       (21.06, 22.71, 'B')], columns=['start_time', 'end_time', 'text'])



In [3]:

    
FrameA



In [4]:

    
FrameB

Get a slice from an IntervalFrame



In [5]:

    
#Get a slice of an intervalframe
slice_intervalframe_by_time(FrameA, 1, 8)



In [6]:

    
plot_annotations({'original A': FrameA, 'sliced A': slice_intervalframe_by_time(FrameA, 1, 8)}, linespan=30)

Get the overlap/union of two IntervalFrames



In [7]:

    
plot_annotations({'A': FrameA,
                  'B': FrameB,
                  'overlap': intervalframe_overlaps(FrameA, FrameB),
                  'union': intervalframe_union(FrameA, FrameB)},
                 tierorder=['A', 'B', 'overlap', 'union'],
                 linespan=30)

Get the inverse of an intervalframe



In [8]:

    
plot_annotations({'original A': FrameA,
                  'inverted A': invert_intervalframe(FrameA)}, linespan=30)

Some useful functions for IntervalFrames



In [9]:

    
print get_tier_type(FrameA)  #can be one of 'point' or 'interval'
print get_tier_boundaries(FrameA)  #get the start and end time of an IntervalFrame or PointFrame









    



interval
(0.10000000000000001, 20.699999999999999)



In [10]:

    
A_in_ms = FrameA.copy(deep=True) #make a copy of Frame A
convert_times_of_tier(A_in_ms, lambda x: int(1000 * x)) #convert times to ms - changes the object 'in place'
A_in_ms



In [11]:

    
A_shifted = FrameA.copy(deep=True) #make a copy of Frame A
shift_tier(A_shifted, 3) #shifts all times by 3 seconds - changes the object 'in place'
A_shifted



In [12]:

    
plot_annotations({'original_A': FrameA,
                  'shifted_A': A_shifted}, linespan=30)

Manipulating StreamFrames

Let as consider a typical time series as our tracking data



In [13]:

    
X = []
for i in range(0, 25000, 20):
    X.append((i, (i / 100) % 9, 3 * random() - 1.5))
Stream = pd.DataFrame(X, columns=['time', 'sawtooth', 'noise'])
Stream.index = Stream['time']
Stream.index.name = None
del Stream['time']

Due to the fact that time is the index of the StreamFrame, slicing is facilitated by Pandas:



In [14]:

    
#Get data between 140 and 450 ms
Stream.ix[140:450]

The boundaries are also easy to get, e.g:



In [15]:

    
Stream.index.min(), Stream.index.max()  #alternatively: Stream.index[0], Stream.index[-1]









    Out[15]:





(0, 24980)

Other functions can be performed between columns, or create new ones, e.g:



In [16]:

    
Stream['model'] = Stream['sawtooth'] + Stream['noise']

Many Pandas methods exist to make analysis really user-friendly. Check Pandas documentation for more info



In [17]:

    
Stream['model'].describe()









    Out[17]:





count    1250.000000
mean        3.947036
std         2.725404
min        -1.474620
25%         1.574892
50%         4.020390
75%         6.142063
max         9.497892
Name: model, dtype: float64



In [18]:

    
plot_scalar(Stream, ['model', 'sawtooth', 'noise'], 2000, 6000) #a sawtooth + noise model

Interaction between IntervalFrames and StreamFrames

Sometimes we want to abstract away from the raw data and instead create automatc annotations.

We can do this easily in mumodo by applying functions to StreamFrames in order to generate IntervalFrames, e.g:



In [19]:

    
GeneratedIntervals = create_intervalframe_from_streamframe(Stream, 'model', lambda x: x > 6, 50) #apply a boolean function
convert_times_of_tier(GeneratedIntervals, lambda x: float(x) / 1000) #turn times into seconds



In [20]:

    
#plot the intervals in which the condition is satisfied
plot_annotations({'Generated': GeneratedIntervals}, 2, 6, hscale=3.5)

We would also want to look into the value of a streamframe based on given intervals, e.g. what is the mean value of model during A, during B, and during overlaps of A and B? For this we can slice_streamframe_on_intervals



In [21]:

    
tierdict = {'A': FrameA.copy(deep=True),
            'B': FrameB.copy(deep=True),
            'O': intervalframe_overlaps(FrameA, FrameB)}
#convert all times to ms
convert_times_of_tiers(tierdict, lambda x: int(1000 * x))
#slice the StreamFrame on each IntervalFrame and get the mean value
for key in tierdict.keys():
    avg = slice_streamframe_on_intervals(Stream, tierdict[key])['model'].mean()
    print "the mean value of model during intervals of {} is {}".format(key, avg)









    



the mean value of model during intervals of A is 3.81000510471
the mean value of model during intervals of B is 4.10299520612
the mean value of model during intervals of O is 3.76827686617

	start_time	end_time	text
0	100	1190	A
1	1200	4160	A
2	8100	12140	A
3	15300	16100	A
4	17900	20700	A

	start_time	end_time	text
0	3.1	4.19	A
1	4.2	7.16	A
2	11.1	15.14	A
3	18.3	19.10	A
4	20.9	23.70	A

	sawtooth	noise
140	1	1.122486
160	1	0.423973
180	1	-1.318555
200	2	-0.645495
220	2	0.869222
240	2	0.179023
260	2	-0.762345
280	2	-0.705659
300	3	1.259223
320	3	0.280827
340	3	-1.419144
360	3	0.679977
380	3	1.373179
400	4	-0.631176
420	4	0.768897
440	4	-1.005416