# Doing Analysis with Mumodo

Mumodo Demo Notebook - Updated on 24.04.2015

Summary: This notebook showcases some useful analysis functions offered by mumodo. It is advised to look into the documentation of the code for a comprehensive list of all functions

(c) Dialogue Systems Group, University of Bielefeld

``````

In [1]:

%matplotlib inline
from mumodo.analysis import create_streamframe_from_intervalframe, create_intervalframe_from_streamframe, \
intervalframe_overlaps, intervalframe_union, invert_intervalframe, \
slice_streamframe_on_intervals, slice_intervalframe_by_time, \
get_tier_boundaries, get_tier_type, convert_times_of_tier, shift_tier, convert_times_of_tiers
from mumodo.plotting import plot_annotations, plot_scalar
import pandas as pd
from random import random

``````

### Manipulating IntervalFrames

A typical analysis scenario involves manipulating interval frames. Let'look at some ways we can do that in mumodo. Suppose we have the following two IntervalFrames:

``````

In [2]:

FrameA = pd.DataFrame([(0.1, 1.19, 'A'),
(1.2, 4.16, 'A'),
(8.1, 12.14, 'A'),
(15.3, 16.1, 'A'),
(17.9, 20.7, 'A')], columns=['start_time', 'end_time', 'text'])
FrameB = pd.DataFrame([(3.2, 9.1, 'B'),
(13.9, 18.02, 'B'),
(21.06, 22.71, 'B')], columns=['start_time', 'end_time', 'text'])

``````
``````

In [3]:

FrameA

``````
``````

Out[3]:

start_time
end_time
text

0
0.1
1.19
A

1
1.2
4.16
A

2
8.1
12.14
A

3
15.3
16.10
A

4
17.9
20.70
A

``````
``````

In [4]:

FrameB

``````
``````

Out[4]:

start_time
end_time
text

0
3.20
9.10
B

1
13.90
18.02
B

2
21.06
22.71
B

``````

#### Get a slice from an IntervalFrame

``````

In [5]:

#Get a slice of an intervalframe
slice_intervalframe_by_time(FrameA, 1, 8)

``````
``````

Out[5]:

start_time
end_time
text

0
1.0
1.19
A

1
1.2
4.16
A

``````
``````

In [6]:

plot_annotations({'original A': FrameA, 'sliced A': slice_intervalframe_by_time(FrameA, 1, 8)}, linespan=30)

``````
``````

``````

#### Get the overlap/union of two IntervalFrames

``````

In [7]:

plot_annotations({'A': FrameA,
'B': FrameB,
'overlap': intervalframe_overlaps(FrameA, FrameB),
'union': intervalframe_union(FrameA, FrameB)},
tierorder=['A', 'B', 'overlap', 'union'],
linespan=30)

``````
``````

``````

#### Get the inverse of an intervalframe

``````

In [8]:

plot_annotations({'original A': FrameA,
'inverted A': invert_intervalframe(FrameA)}, linespan=30)

``````
``````

``````

#### Some useful functions for IntervalFrames

``````

In [9]:

print get_tier_type(FrameA)  #can be one of 'point' or 'interval'
print get_tier_boundaries(FrameA)  #get the start and end time of an IntervalFrame or PointFrame

``````
``````

interval
(0.10000000000000001, 20.699999999999999)

``````
``````

In [10]:

A_in_ms = FrameA.copy(deep=True) #make a copy of Frame A
convert_times_of_tier(A_in_ms, lambda x: int(1000 * x)) #convert times to ms - changes the object 'in place'
A_in_ms

``````
``````

Out[10]:

start_time
end_time
text

0
100
1190
A

1
1200
4160
A

2
8100
12140
A

3
15300
16100
A

4
17900
20700
A

``````
``````

In [11]:

A_shifted = FrameA.copy(deep=True) #make a copy of Frame A
shift_tier(A_shifted, 3) #shifts all times by 3 seconds - changes the object 'in place'
A_shifted

``````
``````

Out[11]:

start_time
end_time
text

0
3.1
4.19
A

1
4.2
7.16
A

2
11.1
15.14
A

3
18.3
19.10
A

4
20.9
23.70
A

``````
``````

In [12]:

plot_annotations({'original_A': FrameA,
'shifted_A': A_shifted}, linespan=30)

``````
``````

``````

### Manipulating StreamFrames

Let as consider a typical time series as our tracking data

``````

In [13]:

X = []
for i in range(0, 25000, 20):
X.append((i, (i / 100) % 9, 3 * random() - 1.5))
Stream = pd.DataFrame(X, columns=['time', 'sawtooth', 'noise'])
Stream.index = Stream['time']
Stream.index.name = None
del Stream['time']

``````

Due to the fact that time is the index of the StreamFrame, slicing is facilitated by Pandas:

``````

In [14]:

#Get data between 140 and 450 ms
Stream.ix[140:450]

``````
``````

Out[14]:

sawtooth
noise

140
1
1.122486

160
1
0.423973

180
1
-1.318555

200
2
-0.645495

220
2
0.869222

240
2
0.179023

260
2
-0.762345

280
2
-0.705659

300
3
1.259223

320
3
0.280827

340
3
-1.419144

360
3
0.679977

380
3
1.373179

400
4
-0.631176

420
4
0.768897

440
4
-1.005416

``````

The boundaries are also easy to get, e.g:

``````

In [15]:

Stream.index.min(), Stream.index.max()  #alternatively: Stream.index[0], Stream.index[-1]

``````
``````

Out[15]:

(0, 24980)

``````

Other functions can be performed between columns, or create new ones, e.g:

``````

In [16]:

Stream['model'] = Stream['sawtooth'] + Stream['noise']

``````

Many Pandas methods exist to make analysis really user-friendly. Check Pandas documentation for more info

``````

In [17]:

Stream['model'].describe()

``````
``````

Out[17]:

count    1250.000000
mean        3.947036
std         2.725404
min        -1.474620
25%         1.574892
50%         4.020390
75%         6.142063
max         9.497892
Name: model, dtype: float64

``````
``````

In [18]:

plot_scalar(Stream, ['model', 'sawtooth', 'noise'], 2000, 6000) #a sawtooth + noise model

``````
``````

``````

### Interaction between IntervalFrames and StreamFrames

Sometimes we want to abstract away from the raw data and instead create automatc annotations.

We can do this easily in mumodo by applying functions to StreamFrames in order to generate IntervalFrames, e.g:

``````

In [19]:

GeneratedIntervals = create_intervalframe_from_streamframe(Stream, 'model', lambda x: x > 6, 50) #apply a boolean function
convert_times_of_tier(GeneratedIntervals, lambda x: float(x) / 1000) #turn times into seconds

``````
``````

In [20]:

#plot the intervals in which the condition is satisfied
plot_annotations({'Generated': GeneratedIntervals}, 2, 6, hscale=3.5)

``````
``````

``````

We would also want to look into the value of a streamframe based on given intervals, e.g. what is the mean value of model during A, during B, and during overlaps of A and B? For this we can slice_streamframe_on_intervals

``````

In [21]:

tierdict = {'A': FrameA.copy(deep=True),
'B': FrameB.copy(deep=True),
'O': intervalframe_overlaps(FrameA, FrameB)}
#convert all times to ms
convert_times_of_tiers(tierdict, lambda x: int(1000 * x))
#slice the StreamFrame on each IntervalFrame and get the mean value
for key in tierdict.keys():
avg = slice_streamframe_on_intervals(Stream, tierdict[key])['model'].mean()
print "the mean value of model during intervals of {} is {}".format(key, avg)

``````
``````

the mean value of model during intervals of A is 3.81000510471
the mean value of model during intervals of B is 4.10299520612
the mean value of model during intervals of O is 3.76827686617

``````