MFCC feature extraction using Yaafe in pyannote.features


In [1]:
from pyannote.features.audio.yaafe import YaafeMFCC
wav = '/Volumes/data/tvseries/TheBigBangTheory/wav/TheBigBangTheory.Season01.Episode01.en.wav'
mfcc = YaafeMFCC().extract(wav)

Groundtruth annotation using pyannote.core data structure


In [2]:
from pyannote.core import Annotation, Segment

In [3]:
reference = Annotation()
reference[Segment(325.000,329.110)] = 'sheldon'
reference[Segment(330.430, 331.770)] = 'penny'
reference[Segment(332.540, 333.680)] = 'leonard'
reference[Segment(334.110, 336.270)] = 'penny'
reference[Segment(336.380, 336.580)] = 'leonard'
reference[Segment(337.050, 339.980)] = 'penny'
reference[Segment(340.550, 342.190)] = 'sheldon'
reference


Out[3]:

In this example, we suppose that the exact segmentation is available. We just do not know the label.


In [4]:
input_segmentation = reference.anonymize_tracks()
input_segmentation


Out[4]:

Let us initialize BIC clustering algorithms from pyannote.algorithms.


In [5]:
from pyannote.algorithms.clustering.bic import BICClustering
# covariance_type can be 'full' or 'diag'
bicClustering = BICClustering(covariance_type='full', penalty_coef=1.2)

We now apply BIC clustering.


In [6]:
hypothesis = bicClustering(input_segmentation, feature=mfcc)
hypothesis


Out[6]:

We can also analyse the behavior of BIC clustering, step by step.


In [7]:
bicClustering.initialize(input_segmentation, feature=mfcc)

In [8]:
# internal similarity matrix
bicClustering.matrix


Out[8]:
#1 #2 #3 #4 #5 #6 #7
#1 -inf -353.749730 -313.916143 -566.119503 -223.892872 -431.409333 314.272050
#2 -353.749730 -inf -304.230939 -165.263249 -424.235203 -224.524699 -213.317525
#3 -313.916143 -304.230939 -inf -166.479384 -309.723275 -31.191449 -198.081470
#4 -566.119503 -165.263249 -166.479384 -inf -248.918573 2.028118 -320.425257
#5 -223.892872 -424.235203 -309.723275 -248.918573 -inf -185.454997 -348.267289
#6 -431.409333 -224.524699 -31.191449 2.028118 -185.454997 -inf -198.611312
#7 314.272050 -213.317525 -198.081470 -320.425257 -348.267289 -198.611312 -inf

7 rows × 7 columns


In [9]:
iterations = bicClustering.iterate(feature=mfcc)

In [10]:
current_hypothesis = next(iterations)
current_hypothesis


Out[10]:

In [11]:
bicClustering.matrix


Out[11]:
#1 #2 #3 #4 #5 #6
#1 -inf -429.836131 -374.994923 -739.605318 -204.227083 -523.814877
#2 -429.836131 -inf -304.230939 -165.263249 -424.235203 -224.524699
#3 -374.994923 -304.230939 -inf -166.479384 -309.723275 -31.191449
#4 -739.605318 -165.263249 -166.479384 -inf -248.918573 2.028118
#5 -204.227083 -424.235203 -309.723275 -248.918573 -inf -185.454997
#6 -523.814877 -224.524699 -31.191449 2.028118 -185.454997 -inf

6 rows × 6 columns


In [12]:
current_hypothesis = next(iterations)
current_hypothesis


Out[12]:

In [13]:
bicClustering.matrix


Out[13]:
#1 #2 #3 #4 #5
#1 -inf -429.836131 -374.994923 -962.168189 -204.227083
#2 -429.836131 -inf -304.230939 -230.412046 -424.235203
#3 -374.994923 -304.230939 -inf -110.987201 -309.723275
#4 -962.168189 -230.412046 -110.987201 -inf -148.334239
#5 -204.227083 -424.235203 -309.723275 -148.334239 -inf

5 rows × 5 columns


In [14]:
try:
    next(iterations)
except:
    print "Reached stoping criterion."


Reached stoping criterion.

In [15]:
hypothesis = bicClustering.finalize(feature=mfcc)
hypothesis


Out[15]:

In [16]:
reference


Out[16]:

Let's evaluate the result numerically using metrics available in pyannote.metrics


In [17]:
from pyannote.metrics.diarization import DiarizationErrorRate, DiarizationPurity, DiarizationCoverage
der = DiarizationErrorRate()
purity = DiarizationPurity()
coverage = DiarizationCoverage()

In [18]:
p = purity(reference, hypothesis)
c = coverage(reference, hypothesis)
d = der(reference, hypothesis)
print "Purity {p:.1f}% / Coverage {c:.1f}% / Diarization error rate {d:.1f}%".format(p=100*p,c=100*c,d=100*d)


Purity 100.0% / Coverage 88.6% / Diarization error rate 11.4%