Segmentation into scenes using audio

This tutorial addresses the following points:

audio feature extraction
temporal segmentation using "sliding window" approach
evaluation of the segmentation result



In [1]:

    
%pylab inline









    



Populating the interactive namespace from numpy and matplotlib



In [2]:

    
# pyannote.core package provides core pyannote data structures.
# (available at http://github.com/pyannote)
from pyannote.core import Segment, Timeline

uri stands for uniform resource identifier.



In [3]:

    
uri = 'GameOfThrones.Season01.Episode01'

Let's start by loading the reference (i.e. manual) segmentation into scenes.
It is stored in data/GameOfThrones.Season01.Episode01/scenes.txt.



In [4]:

    
with open('data/GameOfThrones.Season01.Episode01/scenes.txt', 'r') as f:
    lines = [line.split() for line in f.readlines()]

Timeline objects are used to store set of Segment (one per scene).
A Segment corresponds to a time range (with a start time and an end time, in seconds).



In [5]:

    
reference = Timeline(uri=uri)
for start_time, end_time, _ in lines:
    segment = Segment(start=float(start_time), end=float(end_time))
    reference.add(segment)

Now, we will initialize an extractor of MFCC features (including energy and 12 first coefficients).



In [6]:

    
# pyannote.features package provides feature extraction tools.
# (available at http://github.com/pyannote)
from pyannote.features.audio.yaafe import YaafeMFCC
mfcc_extractor = YaafeMFCC(e=True, coefs=12)

Once initialized, it can be used to extract the actual features.
Beware, it may take a while (a few seconds for a one hour episode).



In [7]:

    
features = mfcc_extractor.extract('data/GameOfThrones.Season01.Episode01/english.wav')

Features instances have several handy methods.
crop is one of them and allows to get all features for a given pyannote.Segment in a numpy.array.



In [8]:

    
x = features.crop(Segment(0., 10.))
print x.shape

Let's plot audio signal energy for the first 30 seconds.



In [9]:

    
plt.plot(features.crop(Segment(40, 60))[:,0])









    Out[9]:





[<matplotlib.lines.Line2D at 0x11a434a90>]

Now, we are going to segment the episode using Gaussian divergence.
Two sliding windows (left and right) of 20 seconds each are used, with a step of 1 second.



In [10]:

    
# pyannote.algorithms provides algorithms for multimedia document processing.
# (available at http://github.com/pyannote)
from pyannote.algorithms.segmentation.sliding_window import SegmentationGaussianDivergence
segmenter = SegmentationGaussianDivergence(duration=20, step=1)

One can use segmenter to compute the Gaussian divergence d between left and right windows for each position t of the sliding windows...



In [11]:

    
T, D = zip(*[(t, d) for (t, d) in segmenter.iterdiff(features)]);









    



/Volumes/home/Development/virtualenv/pyannote.algorithms/lib/python2.7/site-packages/numpy/core/_methods.py:55: RuntimeWarning: Mean of empty slice.
  warnings.warn("Mean of empty slice.", RuntimeWarning)

... and consequently plot $d = f(t)$ alongside the actual position of scene boundaries.



In [12]:

    
for segment in reference:
    plt.plot([segment.start, segment.start], [0, 20], 'r')
plt.plot(T, D)
plt.xlim(0, 2000)
plt.ylim(0,20);

It looks like setting a detection threshold $\theta = 7$ might do (some of) the trick.



In [13]:

    
segmenter = SegmentationGaussianDivergence(duration=20, step=1, threshold=7)
hypothesis = segmenter.apply(features)

Let's evaluate the results visually (reference in green, hypothesis in red).



In [14]:

    
for segment in hypothesis:
    plt.plot([segment.start, segment.start], [-10, -0.5], 'r')
for segment in reference:
    plt.plot([segment.start, segment.start], [0.5, 10], 'g')
plt.ylim(-11, 11);
plt.xlim(0, segment.end);
plt.xlabel('Time (seconds)');

One can also use evaluation metrics :



In [15]:

    
# pyannote.metrics provides evaluation metrics for various tasks.
# (available at http://github.com/pyannote)
from pyannote.metrics.segmentation import SegmentationPurity, SegmentationCoverage
from pyannote.metrics import f_measure
purity = SegmentationPurity()
coverage = SegmentationCoverage()



In [16]:

    
p = purity(reference, hypothesis)
c = coverage(reference, hypothesis)
f = f_measure(p, c)
print "Purity {p:.1f}% / Coverage {c:.1f}% / F-Measure {f:.1f}%".format(p=100*p,c=100*c,f=100*f)









    



Purity 77.6% / Coverage 85.9% / F-Measure 81.5%