MFCC feature extraction using Yaafe in pyannote.features



In [1]:

    
from pyannote.features.audio.yaafe import YaafeMFCC
wav = '/Volumes/data/tvseries/TheBigBangTheory/wav/TheBigBangTheory.Season01.Episode01.en.wav'
mfcc = YaafeMFCC().extract(wav)

Groundtruth annotation using pyannote.core data structure



In [2]:

    
from pyannote.core import Annotation, Segment



In [3]:

    
reference = Annotation()
reference[Segment(325.000,329.110)] = 'sheldon'
reference[Segment(330.430, 331.770)] = 'penny'
reference[Segment(332.540, 333.680)] = 'leonard'
reference[Segment(334.110, 336.270)] = 'penny'
reference[Segment(336.380, 336.580)] = 'leonard'
reference[Segment(337.050, 339.980)] = 'penny'
reference[Segment(340.550, 342.190)] = 'sheldon'
reference









    Out[3]:

In this example, we suppose that the exact segmentation is available. We just do not know the label.



In [4]:

    
input_segmentation = reference.anonymize_tracks()
input_segmentation









    Out[4]:

Let us initialize BIC clustering algorithms from pyannote.algorithms.



In [5]:

    
from pyannote.algorithms.clustering.bic import BICClustering
# covariance_type can be 'full' or 'diag'
bicClustering = BICClustering(covariance_type='full', penalty_coef=1.2)

We now apply BIC clustering.



In [6]:

    
hypothesis = bicClustering(input_segmentation, feature=mfcc)
hypothesis









    Out[6]:

We can also analyse the behavior of BIC clustering, step by step.



In [7]:

    
bicClustering.initialize(input_segmentation, feature=mfcc)



In [8]:

    
# internal similarity matrix
bicClustering.matrix









    Out[8]:






  
    
      
      #1
      #2
      #3
      #4
      #5
      #6
      #7
    
  
  
    
      #1
             -inf
      -353.749730
      -313.916143
      -566.119503
      -223.892872
      -431.409333
       314.272050
    
    
      #2
      -353.749730
             -inf
      -304.230939
      -165.263249
      -424.235203
      -224.524699
      -213.317525
    
    
      #3
      -313.916143
      -304.230939
             -inf
      -166.479384
      -309.723275
       -31.191449
      -198.081470
    
    
      #4
      -566.119503
      -165.263249
      -166.479384
             -inf
      -248.918573
         2.028118
      -320.425257
    
    
      #5
      -223.892872
      -424.235203
      -309.723275
      -248.918573
             -inf
      -185.454997
      -348.267289
    
    
      #6
      -431.409333
      -224.524699
       -31.191449
         2.028118
      -185.454997
             -inf
      -198.611312
    
    
      #7
       314.272050
      -213.317525
      -198.081470
      -320.425257
      -348.267289
      -198.611312
             -inf
    
  

7 rows × 7 columns



In [9]:

    
iterations = bicClustering.iterate(feature=mfcc)



In [10]:

    
current_hypothesis = next(iterations)
current_hypothesis









    Out[10]:



In [11]:

    
bicClustering.matrix









    Out[11]:






  
    
      
      #1
      #2
      #3
      #4
      #5
      #6
    
  
  
    
      #1
             -inf
      -429.836131
      -374.994923
      -739.605318
      -204.227083
      -523.814877
    
    
      #2
      -429.836131
             -inf
      -304.230939
      -165.263249
      -424.235203
      -224.524699
    
    
      #3
      -374.994923
      -304.230939
             -inf
      -166.479384
      -309.723275
       -31.191449
    
    
      #4
      -739.605318
      -165.263249
      -166.479384
             -inf
      -248.918573
         2.028118
    
    
      #5
      -204.227083
      -424.235203
      -309.723275
      -248.918573
             -inf
      -185.454997
    
    
      #6
      -523.814877
      -224.524699
       -31.191449
         2.028118
      -185.454997
             -inf
    
  

6 rows × 6 columns



In [12]:

    
current_hypothesis = next(iterations)
current_hypothesis









    Out[12]:



In [13]:

    
bicClustering.matrix









    Out[13]:






  
    
      
      #1
      #2
      #3
      #4
      #5
    
  
  
    
      #1
             -inf
      -429.836131
      -374.994923
      -962.168189
      -204.227083
    
    
      #2
      -429.836131
             -inf
      -304.230939
      -230.412046
      -424.235203
    
    
      #3
      -374.994923
      -304.230939
             -inf
      -110.987201
      -309.723275
    
    
      #4
      -962.168189
      -230.412046
      -110.987201
             -inf
      -148.334239
    
    
      #5
      -204.227083
      -424.235203
      -309.723275
      -148.334239
             -inf
    
  

5 rows × 5 columns



In [14]:

    
try:
    next(iterations)
except:
    print "Reached stoping criterion."









    



Reached stoping criterion.



In [15]:

    
hypothesis = bicClustering.finalize(feature=mfcc)
hypothesis









    Out[15]:



In [16]:

    
reference









    Out[16]:

Let's evaluate the result numerically using metrics available in pyannote.metrics



In [17]:

    
from pyannote.metrics.diarization import DiarizationErrorRate, DiarizationPurity, DiarizationCoverage
der = DiarizationErrorRate()
purity = DiarizationPurity()
coverage = DiarizationCoverage()



In [18]:

    
p = purity(reference, hypothesis)
c = coverage(reference, hypothesis)
d = der(reference, hypothesis)
print "Purity {p:.1f}% / Coverage {c:.1f}% / Diarization error rate {d:.1f}%".format(p=100*p,c=100*c,d=100*d)









    



Purity 100.0% / Coverage 88.6% / Diarization error rate 11.4%

	#1	#2	#3	#4	#5	#6	#7
#1	-inf	-353.749730	-313.916143	-566.119503	-223.892872	-431.409333	314.272050
#2	-353.749730	-inf	-304.230939	-165.263249	-424.235203	-224.524699	-213.317525
#3	-313.916143	-304.230939	-inf	-166.479384	-309.723275	-31.191449	-198.081470
#4	-566.119503	-165.263249	-166.479384	-inf	-248.918573	2.028118	-320.425257
#5	-223.892872	-424.235203	-309.723275	-248.918573	-inf	-185.454997	-348.267289
#6	-431.409333	-224.524699	-31.191449	2.028118	-185.454997	-inf	-198.611312
#7	314.272050	-213.317525	-198.081470	-320.425257	-348.267289	-198.611312	-inf

	#1	#2	#3	#4	#5	#6
#1	-inf	-429.836131	-374.994923	-739.605318	-204.227083	-523.814877
#2	-429.836131	-inf	-304.230939	-165.263249	-424.235203	-224.524699
#3	-374.994923	-304.230939	-inf	-166.479384	-309.723275	-31.191449
#4	-739.605318	-165.263249	-166.479384	-inf	-248.918573	2.028118
#5	-204.227083	-424.235203	-309.723275	-248.918573	-inf	-185.454997
#6	-523.814877	-224.524699	-31.191449	2.028118	-185.454997	-inf