This notebook demonstrates some of the basic functionality of librosa version 0.3.

Following through this example, you'll learn how to:

  • Load audio input
  • Compute mel spectrogram, MFCC, delta features, chroma
  • Locate beat events
  • Compute beat-synchronous features
  • Display features
  • Save beat tracker output to a CSV file

In [1]:
# We'll need the os module for file path manipulation
import os

# And numpy for some mathematical operations
import numpy as np

# Librosa for audio
import librosa

# matplotlib for displaying the output
import matplotlib.pyplot as plt
%matplotlib inline

# and IPython.display for audio output
import IPython.display

In [2]:
audio_path = librosa.util.example_audio_file()

# or uncomment the line below and point it at your favorite song:
#
# audio_path = '/path/to/your/favorite/song.mp3'

y, sr = librosa.load(audio_path)

By default, librosa will resample the signal to 22050Hz.

You can change this behavior by setting the sr parameter of librosa.load(), or disable resampling entirely by setting sr=None.

Warning

You might want to stop here to verify that scikits.samplerate is installed and functioning correctly. Without this, audio resampling will fall back on scipy.signal.resample(), which is rather inefficient.

You can test this by printing out the librosa.core._HAS_SAMPLERATE flag:


In [3]:
print 'HAS_SAMPLERATE: ', librosa.core._HAS_SAMPLERATE


HAS_SAMPLERATE:  True

Mel spectrogram

This first step will show how to compute a Mel spectrogram from an audio waveform.


In [4]:
# Let's make and display a mel-scaled power (energy-squared) spectrogram
# We use a small hop length of 64 here so that the frames line up with the beat tracker example below.
S = librosa.feature.melspectrogram(y, sr=sr, n_fft=2048, hop_length=64, n_mels=128)

# Convert to log scale (dB). We'll use the peak power as reference.
log_S = librosa.logamplitude(S, ref_power=np.max)

# Make a new figure
plt.figure(figsize=(12,4))

# Display the spectrogram on a mel scale
# sample rate and hop length parameters are used to render the time axis
librosa.display.specshow(log_S, sr=sr, hop_length=64, x_axis='time', y_axis='mel')

# Put a descriptive title on the plot
plt.title('mel power spectrogram')

# draw a color bar
plt.colorbar(format='%+02.0f dB')

# Make the figure layout compact
plt.tight_layout()


Harmonic-percussive source separate

Before doing any signal analysis, let's pull apart the harmonic and percussive components of the audio. This is pretty easy to do with the effects module.


In [5]:
y_harmonic, y_percussive = librosa.effects.hpss(y)

In [6]:
# What do the spectrograms look like?
# Let's make and display a mel-scaled power (energy-squared) spectrogram
# We use a small hop length of 64 here so that the frames line up with the beat tracker example below.
S_harmonic   = librosa.feature.melspectrogram(y_harmonic, sr=sr, n_fft=2048, hop_length=64, n_mels=128)
S_percussive = librosa.feature.melspectrogram(y_percussive, sr=sr, n_fft=2048, hop_length=64, n_mels=128)

# Convert to log scale (dB). We'll use the peak power as reference.
log_Sh = librosa.logamplitude(S_harmonic, ref_power=np.max)
log_Sp = librosa.logamplitude(S_percussive, ref_power=np.max)

# Make a new figure
plt.figure(figsize=(12,6))

plt.subplot(2,1,1)
# Display the spectrogram on a mel scale
librosa.display.specshow(log_Sh, y_axis='mel')

# Put a descriptive title on the plot
plt.title('mel power spectrogram (Harmonic)')

# draw a color bar
plt.colorbar(format='%+02.0f dB')

plt.subplot(2,1,2)
librosa.display.specshow(log_Sp, sr=sr, hop_length=64, x_axis='time', y_axis='mel')

# Put a descriptive title on the plot
plt.title('mel power spectrogram (Percussive)')

# draw a color bar
plt.colorbar(format='%+02.0f dB')

# Make the figure layout compact
plt.tight_layout()


Chromagram

Next, we'll extract Chroma features to represent pitch information.


In [7]:
# We'll use a longer FFT window here to better resolve low frequencies
# We'll use the harmonic component to avoid pollution from transients
C = librosa.feature.chromagram(y=y_harmonic, sr=sr, n_fft=4096, hop_length=64)

# Make a new figure
plt.figure(figsize=(12,4))

# Display the chromagram: the energy in each chromatic pitch class as a function of time
# To make sure that the colors span the full range of chroma values, set vmin and vmax
librosa.display.specshow(C, sr=sr, hop_length=64, x_axis='time', y_axis='chroma', vmin=0, vmax=1)

plt.title('Chromagram')
plt.colorbar()

plt.tight_layout()


MFCC

Mel-frequency cepstral coefficients are commonly used to represent texture or timbre of sound.


In [8]:
# Next, we'll extract the top 20 Mel-frequency cepstral coefficients (MFCCs)
mfcc        = librosa.feature.mfcc(S=log_S, n_mfcc=20)

# Let's pad on the first and second deltas while we're at it
delta_mfcc  = librosa.feature.delta(mfcc)
delta2_mfcc = librosa.feature.delta(mfcc, order=2)

# How do they look?  We'll show each in its own subplot
plt.figure(figsize=(12, 6))

plt.subplot(3,1,1)
librosa.display.specshow(mfcc)
plt.ylabel('MFCC')
plt.colorbar()

plt.subplot(3,1,2)
librosa.display.specshow(delta_mfcc)
plt.ylabel('MFCC-$\Delta$')
plt.colorbar()

plt.subplot(3,1,3)
librosa.display.specshow(delta2_mfcc, sr=sr, hop_length=64, x_axis='time')
plt.ylabel('MFCC-$\Delta^2$')
plt.colorbar()

plt.tight_layout()

# For future use, we'll stack these together into one matrix
M = np.vstack([mfcc, delta_mfcc, delta2_mfcc])


About color maps...

In the above examples, Mel power spectrogram is negative-valued (dB relative to peak power), and specshow() defaults to a purple->white color gradient.

The chromagram example is positive-valued, and specshow() will default to a white->red color gradient.

If the input data has both positive and negative values, as in the MFCC example, then a purple->white->orange diverging color gradient will be used.

These defaults have been selected to ensure readability in print (grayscale) and are color-blind friendly.

Just as in pyplot.imshow(), the color map can be overriden by setting the cmap keyword argument.

Beat tracking

The beat tracker returns an estimate of the tempo (in beats per minute) and frame identifiers of beat events.

The input can be either an audio time series (as we do below), or an onset strength envelope as calculated by librosa.onset.onset_strength().


In [9]:
# Now, let's run the beat tracker
# We'll use the percussive component for this part
tempo, beats = librosa.beat.beat_track(y=y_percussive, sr=sr, hop_length=64)

# Let's re-draw the spectrogram, but this time, overlay the detected beats
plt.figure(figsize=(12,4))
librosa.display.specshow(log_S, sr=sr, hop_length=64, x_axis='time', y_axis='mel')

# Let's draw lines with a drop shadow on the beat events
plt.vlines(beats, 0, log_S.shape[0], colors='k', linestyles='-', linewidth=2.5)
plt.vlines(beats, 0, log_S.shape[0], colors='w', linestyles='-', linewidth=1.5)
plt.axis('tight')

plt.tight_layout()


By default, the beat tracker will trim away any leading or trailing beats that don't appear strong enough.

To disable this behavior, call beat_track() with trim=False.


In [10]:
print 'Estimated tempo:        %.2f BPM' % tempo

print 'First 5 beat frames:   ', beats[:5]

# Frame numbers are great and all, but when do those beats occur?
print 'First 5 beat times:    ', librosa.frames_to_time(beats[:5], sr=sr, hop_length=64)

# We could also get frame numbers from times by librosa.time_to_frames()


Estimated tempo:        103.88 BPM
First 5 beat frames:    [ 372  575  811 1010 1217]
First 5 beat times:     [ 1.07972789  1.66893424  2.3539229   2.93151927  3.5323356 ]

Beat-synchronous feature aggregation

Once we've located the beat events, we can use them to summarize the feature content of each beat.

This can be useful for reducing data dimensionality, and removing transient noise from the features.


In [11]:
# feature.sync will summarize each beat event by the mean feature vector within that beat

M_sync = librosa.feature.sync(M, beats)

plt.figure(figsize=(12,6))

# Let's plot the original and beat-synchronous features against each other
plt.subplot(2,1,1)
librosa.display.specshow(M)
plt.title('MFCC-$\Delta$-$\Delta^2$')

# We can also use pyplot *ticks directly
# Let's mark off the raw MFCC and the delta features
plt.yticks(np.arange(0, M.shape[0], 20), ['MFCC', '$\Delta$', '$\Delta^2$'])

plt.colorbar()

plt.subplot(2,1,2)
librosa.display.specshow(M_sync)

# librosa can generate axis ticks from arbitrary timestamps and beat events also
librosa.display.time_ticks(librosa.frames_to_time(beats, sr=sr, hop_length=64))

plt.yticks(np.arange(0, M_sync.shape[0], 20), ['MFCC', '$\Delta$', '$\Delta^2$'])             
plt.title('Beat-synchronous MFCC-$\Delta$-$\Delta^2$')
plt.colorbar()

plt.tight_layout()



In [12]:
# Beat synchronization is flexible.
# Instead of computing the mean delta-MFCC within each beat, let's do beat-synchronous chroma
# We can replace the mean with any statistical aggregation function, such as min, max, or median.

C_sync = librosa.feature.sync(C, beats, aggregate=np.median)

plt.figure(figsize=(12,6))

plt.subplot(2, 1, 1)
librosa.display.specshow(C, sr=sr, hop_length=64, y_axis='chroma', vmin=0.0, vmax=1.0)
plt.title('Chroma')
plt.colorbar()

plt.subplot(2, 1, 2)
librosa.display.specshow(C_sync, y_axis='chroma', vmin=0.0, vmax=1.0)

beat_times = librosa.frames_to_time(beats, sr=sr, hop_length=64)
librosa.display.time_ticks(beat_times)

plt.title('Beat-synchronous Chroma (median aggregation)')

plt.colorbar()
plt.tight_layout()