This tutorial addresses the following points:
In [1]:
%pylab inline
In [2]:
# pyannote.core package provides core pyannote data structures.
# (available at http://github.com/pyannote)
from pyannote.core import Segment, Timeline
uri stands for uniform resource identifier.
In [3]:
uri = 'GameOfThrones.Season01.Episode01'
Let's start by loading the reference (i.e. manual) segmentation into scenes.
It is stored in data/GameOfThrones.Season01.Episode01/scenes.txt.
In [4]:
with open('data/GameOfThrones.Season01.Episode01/scenes.txt', 'r') as f:
lines = [line.split() for line in f.readlines()]
Timeline objects are used to store set of Segment (one per scene).
A Segment corresponds to a time range (with a start time and an end time, in seconds).
In [5]:
reference = Timeline(uri=uri)
for start_time, end_time, _ in lines:
segment = Segment(start=float(start_time), end=float(end_time))
reference.add(segment)
Now, we will initialize an extractor of MFCC features (including energy and 12 first coefficients).
In [6]:
# pyannote.features package provides feature extraction tools.
# (available at http://github.com/pyannote)
from pyannote.features.audio.yaafe import YaafeMFCC
mfcc_extractor = YaafeMFCC(e=True, coefs=12)
Once initialized, it can be used to extract the actual features.
Beware, it may take a while (a few seconds for a one hour episode).
In [7]:
features = mfcc_extractor.extract('data/GameOfThrones.Season01.Episode01/english.wav')
Features instances have several handy methods.
crop is one of them and allows to get all features for a given pyannote.Segment in a numpy.array.
In [8]:
x = features.crop(Segment(0., 10.))
print x.shape
Let's plot audio signal energy for the first 30 seconds.
In [9]:
plt.plot(features.crop(Segment(40, 60))[:,0])
Out[9]:
Now, we are going to segment the episode using Gaussian divergence.
Two sliding windows (left and right) of 20 seconds each are used, with a step of 1 second.
In [10]:
# pyannote.algorithms provides algorithms for multimedia document processing.
# (available at http://github.com/pyannote)
from pyannote.algorithms.segmentation.sliding_window import SegmentationGaussianDivergence
segmenter = SegmentationGaussianDivergence(duration=20, step=1)
One can use segmenter to compute the Gaussian divergence d between left and right windows for each position t of the sliding windows...
In [11]:
T, D = zip(*[(t, d) for (t, d) in segmenter.iterdiff(features)]);
... and consequently plot $d = f(t)$ alongside the actual position of scene boundaries.
In [12]:
for segment in reference:
plt.plot([segment.start, segment.start], [0, 20], 'r')
plt.plot(T, D)
plt.xlim(0, 2000)
plt.ylim(0,20);
It looks like setting a detection threshold $\theta = 7$ might do (some of) the trick.
In [13]:
segmenter = SegmentationGaussianDivergence(duration=20, step=1, threshold=7)
hypothesis = segmenter.apply(features)
Let's evaluate the results visually (reference in green, hypothesis in red).
In [14]:
for segment in hypothesis:
plt.plot([segment.start, segment.start], [-10, -0.5], 'r')
for segment in reference:
plt.plot([segment.start, segment.start], [0.5, 10], 'g')
plt.ylim(-11, 11);
plt.xlim(0, segment.end);
plt.xlabel('Time (seconds)');
One can also use evaluation metrics :
In [15]:
# pyannote.metrics provides evaluation metrics for various tasks.
# (available at http://github.com/pyannote)
from pyannote.metrics.segmentation import SegmentationPurity, SegmentationCoverage
from pyannote.metrics import f_measure
purity = SegmentationPurity()
coverage = SegmentationCoverage()
In [16]:
p = purity(reference, hypothesis)
c = coverage(reference, hypothesis)
f = f_measure(p, c)
print "Purity {p:.1f}% / Coverage {c:.1f}% / F-Measure {f:.1f}%".format(p=100*p,c=100*c,f=100*f)