Mumodo Demo Notebook - Updated on 24.04.2015
Summary: This notebook showcases some useful analysis functions offered by mumodo. It is advised to look into the documentation of the code for a comprehensive list of all functions
(c) Dialogue Systems Group, University of Bielefeld
In [1]:
%matplotlib inline
from mumodo.analysis import create_streamframe_from_intervalframe, create_intervalframe_from_streamframe, \
intervalframe_overlaps, intervalframe_union, invert_intervalframe, \
slice_streamframe_on_intervals, slice_intervalframe_by_time, \
get_tier_boundaries, get_tier_type, convert_times_of_tier, shift_tier, convert_times_of_tiers
from mumodo.plotting import plot_annotations, plot_scalar
import pandas as pd
from random import random
A typical analysis scenario involves manipulating interval frames. Let'look at some ways we can do that in mumodo. Suppose we have the following two IntervalFrames:
In [2]:
FrameA = pd.DataFrame([(0.1, 1.19, 'A'),
(1.2, 4.16, 'A'),
(8.1, 12.14, 'A'),
(15.3, 16.1, 'A'),
(17.9, 20.7, 'A')], columns=['start_time', 'end_time', 'text'])
FrameB = pd.DataFrame([(3.2, 9.1, 'B'),
(13.9, 18.02, 'B'),
(21.06, 22.71, 'B')], columns=['start_time', 'end_time', 'text'])
In [3]:
FrameA
Out[3]:
In [4]:
FrameB
Out[4]:
In [5]:
#Get a slice of an intervalframe
slice_intervalframe_by_time(FrameA, 1, 8)
Out[5]:
In [6]:
plot_annotations({'original A': FrameA, 'sliced A': slice_intervalframe_by_time(FrameA, 1, 8)}, linespan=30)
In [7]:
plot_annotations({'A': FrameA,
'B': FrameB,
'overlap': intervalframe_overlaps(FrameA, FrameB),
'union': intervalframe_union(FrameA, FrameB)},
tierorder=['A', 'B', 'overlap', 'union'],
linespan=30)
In [8]:
plot_annotations({'original A': FrameA,
'inverted A': invert_intervalframe(FrameA)}, linespan=30)
In [9]:
print get_tier_type(FrameA) #can be one of 'point' or 'interval'
print get_tier_boundaries(FrameA) #get the start and end time of an IntervalFrame or PointFrame
In [10]:
A_in_ms = FrameA.copy(deep=True) #make a copy of Frame A
convert_times_of_tier(A_in_ms, lambda x: int(1000 * x)) #convert times to ms - changes the object 'in place'
A_in_ms
Out[10]:
In [11]:
A_shifted = FrameA.copy(deep=True) #make a copy of Frame A
shift_tier(A_shifted, 3) #shifts all times by 3 seconds - changes the object 'in place'
A_shifted
Out[11]:
In [12]:
plot_annotations({'original_A': FrameA,
'shifted_A': A_shifted}, linespan=30)
Let as consider a typical time series as our tracking data
In [13]:
X = []
for i in range(0, 25000, 20):
X.append((i, (i / 100) % 9, 3 * random() - 1.5))
Stream = pd.DataFrame(X, columns=['time', 'sawtooth', 'noise'])
Stream.index = Stream['time']
Stream.index.name = None
del Stream['time']
Due to the fact that time is the index of the StreamFrame, slicing is facilitated by Pandas:
In [14]:
#Get data between 140 and 450 ms
Stream.ix[140:450]
Out[14]:
The boundaries are also easy to get, e.g:
In [15]:
Stream.index.min(), Stream.index.max() #alternatively: Stream.index[0], Stream.index[-1]
Out[15]:
Other functions can be performed between columns, or create new ones, e.g:
In [16]:
Stream['model'] = Stream['sawtooth'] + Stream['noise']
Many Pandas methods exist to make analysis really user-friendly. Check Pandas documentation for more info
In [17]:
Stream['model'].describe()
Out[17]:
In [18]:
plot_scalar(Stream, ['model', 'sawtooth', 'noise'], 2000, 6000) #a sawtooth + noise model
Sometimes we want to abstract away from the raw data and instead create automatc annotations.
We can do this easily in mumodo by applying functions to StreamFrames in order to generate IntervalFrames, e.g:
In [19]:
GeneratedIntervals = create_intervalframe_from_streamframe(Stream, 'model', lambda x: x > 6, 50) #apply a boolean function
convert_times_of_tier(GeneratedIntervals, lambda x: float(x) / 1000) #turn times into seconds
In [20]:
#plot the intervals in which the condition is satisfied
plot_annotations({'Generated': GeneratedIntervals}, 2, 6, hscale=3.5)
We would also want to look into the value of a streamframe based on given intervals, e.g. what is the mean value of model during A, during B, and during overlaps of A and B? For this we can slice_streamframe_on_intervals
In [21]:
tierdict = {'A': FrameA.copy(deep=True),
'B': FrameB.copy(deep=True),
'O': intervalframe_overlaps(FrameA, FrameB)}
#convert all times to ms
convert_times_of_tiers(tierdict, lambda x: int(1000 * x))
#slice the StreamFrame on each IntervalFrame and get the mean value
for key in tierdict.keys():
avg = slice_streamframe_on_intervals(Stream, tierdict[key])['model'].mean()
print "the mean value of model during intervals of {} is {}".format(key, avg)