In [1]:
import numpy, scipy, matplotlib.pyplot as plt, sklearn, librosa, urllib, IPython.display
import essentia, essentia.standard as ess
plt.rcParams['figure.figsize'] = (14,4)
The mel frequency cepstral coefficients (MFCCs) of a signal are a small set of features (usually about 10-20) which concisely describe the overall shape of a spectral envelope. In MIR, it is often used to describe timbre.
Download an audio file:
In [2]:
url = 'http://audio.musicinformationretrieval.com/simple_loop.wav'
urllib.urlretrieve(url, filename='simple_loop.wav')
Out[2]:
Plot the audio signal:
In [3]:
x, fs = librosa.load('simple_loop.wav')
librosa.display.waveplot(x, sr=fs)
Out[3]:
Play the audio:
In [4]:
IPython.display.Audio(x, rate=fs)
Out[4]:
librosa.feature.mfcc
computes MFCCs across an audio signal:
In [5]:
mfccs = librosa.feature.mfcc(x, sr=fs)
print mfccs.shape
In this case, mfcc
computed 20 MFCCs over 130 frames.
The very first MFCC, the 0th coefficient, does not convey information relevant to the overall shape of the spectrum. It only conveys a constant offset, i.e. adding a constant value to the entire spectrum. Therefore, many practitioners will discard the first MFCC when performing classification. For now, we will use the MFCCs as is.
Display the MFCCs:
In [6]:
librosa.display.specshow(mfccs, sr=fs, x_axis='time')
Out[6]:
Let's scale the MFCCs such that each coefficient dimension has zero mean and unit variance:
In [7]:
mfccs = sklearn.preprocessing.scale(mfccs, axis=1)
print mfccs.mean(axis=1)
print mfccs.var(axis=1)
Display the scaled MFCCs:
In [8]:
librosa.display.specshow(mfccs, sr=fs, x_axis='time')
Out[8]:
We can also use essentia.standard.MFCC
to compute MFCCs across a signal, and we will display them as a "MFCC-gram":
In [9]:
hamming_window = ess.Windowing(type='hamming')
spectrum = ess.Spectrum() # we just want the magnitude spectrum
mfcc = ess.MFCC(numberCoefficients=13)
frame_sz = 1024
hop_sz = 500
mfccs = numpy.array([mfcc(spectrum(hamming_window(frame)))[1]
for frame in ess.FrameGenerator(x, frameSize=frame_sz, hopSize=hop_sz)])
print mfccs.shape
Scale the MFCCs:
In [10]:
mfccs = sklearn.preprocessing.scale(mfccs)
Plot the MFCCs:
In [11]:
plt.imshow(mfccs.T, origin='lower', aspect='auto', interpolation='nearest')
plt.ylabel('MFCC Coefficient Index')
plt.xlabel('Frame Index')
Out[11]: