In [44]:
%matplotlib inline
import mir_eval, librosa, numpy, matplotlib.pyplot as plt

Evaluation using mir_eval

mir_eval (documentation, paper) is a Python library containing evaluation functions for a variety of common audio and music processing tasks.

mir_eval was primarily created by Colin Raffel. This notebook was created by Brian McFee and edited by Steve Tjoa.

Why mir_eval?

Most tasks in MIR are complicated. Evaluation is also complicated!

Any given task has many ways to evaluate a system. There is no one right away.

For example, here are issues to consider when choosing an evaluation method:

  • event matching
  • time padding
  • tolerance windows
  • vocabulary alignment

mir_eval tasks and submodules

  • onset, tempo, beat
  • chord, key
  • melody, multipitch
  • transcription
  • segment, hierarchy, pattern
  • separation (like bss_eval in Matlab)

Install mir_eval

pip install mir_eval

If that doesn't work:

pip install --no-deps mir_eval

Example: Onset Detection


In [5]:
y, sr = librosa.load('audio/simple_piano.wav')

In [13]:
# Estimate onsets.
est_onsets = librosa.onset.onset_detect(y=y, sr=sr, units='time')

In [14]:
est_onsets


Out[14]:
array([0.27863946, 0.510839  , 0.81269841, 1.021678  , 1.32353741,
       1.50929705, 1.83437642, 2.02013605, 2.36843537, 2.53097506,
       2.87927438, 3.0185941 , 3.36689342, 3.59909297])

In [15]:
# Load the reference annotation.
ref_onsets = numpy.array([0.1, 0.21, 0.3])

In [19]:
mir_eval.onset.evaluate(ref_onsets, est_onsets)


Out[19]:
OrderedDict([('F-measure', 0.11764705882352941),
             ('Precision', 0.07142857142857142),
             ('Recall', 0.3333333333333333)])

mir_eval finds the largest feasible set of matches using the Hopcroft-Karp algorithm.

Example: Beat Tracking


In [31]:
est_tempo, est_beats = librosa.beat.beat_track(y=y, sr=sr)
est_beats = librosa.frames_to_time(est_beats, sr=sr)

In [32]:
est_beats


Out[32]:
array([0.53405896, 1.021678  , 1.53251701, 2.04335601, 2.53097506])

In [33]:
# Load the reference annotation.
ref_beats = numpy.array([0.53, 1.02])

In [34]:
mir_eval.beat.evaluate(ref_beats, est_beats)


/Users/stjoa/anaconda3/lib/python3.6/site-packages/mir_eval/beat.py:91: UserWarning: Reference beats are empty.
  warnings.warn("Reference beats are empty.")
/Users/stjoa/anaconda3/lib/python3.6/site-packages/mir_eval/beat.py:93: UserWarning: Estimated beats are empty.
  warnings.warn("Estimated beats are empty.")
Out[34]:
OrderedDict([('F-measure', 0.0),
             ('Cemgil', 0.0),
             ('Cemgil Best Metric Level', 0.0),
             ('Goto', 0.0),
             ('P-score', 0.0),
             ('Correct Metric Level Continuous', 0.0),
             ('Correct Metric Level Total', 0.0),
             ('Any Metric Level Continuous', 0.0),
             ('Any Metric Level Total', 0.0),
             ('Information gain', 0.0)])

Example: Chord Estimation


In [35]:
mir_eval.chord.evaluate()


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-35-16035b8d87a1> in <module>()
----> 1 mir_eval.chord.evaluate()

TypeError: evaluate() missing 4 required positional arguments: 'ref_intervals', 'ref_labels', 'est_intervals', and 'est_labels'

Hidden benefits

  • Input validation! Many errors can be traced back to ill-formatted data.
  • Standardized behavior, full test coverage.

More than metrics

mir_eval has tools for display and sonification.


In [38]:
import librosa.display
import mir_eval.display

Common plots: events, labeled_intervals

pitch, multipitch, piano_roll segments, hierarchy, separation

Example: Events


In [37]:
librosa.display.specshow(S, x_axis='time', y_axis='mel')
mir_eval.display.events(ref_beats, color='w', alpha=0.8, linewidth=3)
mir_eval.display.events(est_beats, color='c', alpha=0.8, linewidth=3, linestyle='--')


---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-37-982cafece7a6> in <module>()
----> 1 librosa.display.specshow(S, x_axis='time', y_axis='mel')
      2 mir_eval.display.events(ref_beats, color='w', alpha=0.8, linewidth=3)
      3 mir_eval.display.events(est_beats, color='c', alpha=0.8, linewidth=3, linestyle='--')

AttributeError: module 'librosa' has no attribute 'display'

Example: Labeled Intervals

Example: Source Separation


In [39]:
y_harm, y_perc = librosa.effects.hpss(y, margin=8)

In [45]:
plt.figure(figsize=(12, 4))
mir_eval.display.separation([y_perc, y_harm], sr, labels=['percussive', 'harmonic'])
plt.legend()


Out[45]:
<matplotlib.legend.Legend at 0x117a2f048>

In [ ]:
Audio(data=numpy.vstack([

In [ ]:
mir_eval.sonify.chords()