In [1]:
%matplotlib inline
import seaborn
import numpy, scipy, matplotlib.pyplot as plt, IPython.display as ipd, sklearn
import librosa, librosa.display
plt.rcParams['figure.figsize'] = (14, 5)

Spectral Features

For classification, we're going to be using new features in our arsenal: spectral moments (centroid, bandwidth, skewness, kurtosis) and other spectral statistics.

Moments is a term used in physics and statistics. There are raw moments and central moments.

You are probably already familiar with two examples of moments: mean and variance. The first raw moment is known as the mean. The second central moment is known as the variance.

Spectral Centroid

Load an audio file:


In [2]:
x, sr = librosa.load('audio/simple_loop.wav')
ipd.Audio(x, rate=sr)


Out[2]:

The spectral centroid (Wikipedia) indicates at which frequency the energy of a spectrum is centered upon. This is like a weighted mean:

$$ f_c = \frac{\sum_k S(k) f(k)}{\sum_k S(k)} $$

where $S(k)$ is the spectral magnitude at frequency bin $k$, $f(k)$ is the frequency at bin $k$.

librosa.feature.spectral_centroid computes the spectral centroid for each frame in a signal:


In [3]:
spectral_centroids = librosa.feature.spectral_centroid(x, sr=sr)[0]
spectral_centroids.shape


Out[3]:
(97,)

Compute the time variable for visualization:


In [4]:
frames = range(len(spectral_centroids))
t = librosa.frames_to_time(frames)

Define a helper function to normalize the spectral centroid for visualization:


In [5]:
def normalize(x, axis=0):
    return sklearn.preprocessing.minmax_scale(x, axis=axis)

Plot the spectral centroid along with the waveform:


In [6]:
librosa.display.waveplot(x, sr=sr, alpha=0.4)
plt.plot(t, normalize(spectral_centroids), color='r') # normalize for visualization purposes


Out[6]:
[<matplotlib.lines.Line2D at 0x110d99390>]

Similar to the zero crossing rate, there is a spurious rise in spectral centroid at the beginning of the signal. That is because the silence at the beginning has such small amplitude that high frequency components have a chance to dominate. One hack around this is to add a small constant before computing the spectral centroid, thus shifting the centroid toward zero at quiet portions:


In [7]:
spectral_centroids = librosa.feature.spectral_centroid(x+0.01, sr=sr)[0]
librosa.display.waveplot(x, sr=sr, alpha=0.4)
plt.plot(t, normalize(spectral_centroids), color='r') # normalize for visualization purposes


Out[7]:
[<matplotlib.lines.Line2D at 0x110de6a10>]

Spectral Bandwidth

librosa.feature.spectral_bandwidth computes the order-$p$ spectral bandwidth:

$$ \left( \sum_k S(k) \left(f(k) - f_c \right)^p \right)^{\frac{1}{p}} $$

where $S(k)$ is the spectral magnitude at frequency bin $k$, $f(k)$ is the frequency at bin $k$, and $f_c$ is the spectral centroid. When $p = 2$, this is like a weighted standard deviation.


In [8]:
spectral_bandwidth_2 = librosa.feature.spectral_bandwidth(x+0.01, sr=sr)[0]
spectral_bandwidth_3 = librosa.feature.spectral_bandwidth(x+0.01, sr=sr, p=3)[0]
spectral_bandwidth_4 = librosa.feature.spectral_bandwidth(x+0.01, sr=sr, p=4)[0]
librosa.display.waveplot(x, sr=sr, alpha=0.4)
plt.plot(t, normalize(spectral_bandwidth_2), color='r')
plt.plot(t, normalize(spectral_bandwidth_3), color='g')
plt.plot(t, normalize(spectral_bandwidth_4), color='y')
plt.legend(('p = 2', 'p = 3', 'p = 4'))


Out[8]:
<matplotlib.legend.Legend at 0x1110e3a50>

Spectral Contrast

Spectral contrast considers the spectral peak, the spectral valley, and their difference in each frequency subband. For more information:

librosa.feature.spectral_contrast computes the spectral contrast for six subbands for each time frame:


In [9]:
spectral_contrast = librosa.feature.spectral_contrast(x, sr=sr)
spectral_contrast.shape


Out[9]:
(7, 97)

Display:


In [10]:
plt.imshow(normalize(spectral_contrast, axis=1), aspect='auto', origin='lower', cmap='coolwarm')


Out[10]:
<matplotlib.image.AxesImage at 0x113d29e50>

Spectral Rolloff

Spectral rolloff is the frequency below which a specified percentage of the total spectral energy, e.g. 85%, lies.

librosa.feature.spectral_rolloff computes the rolloff frequency for each frame in a signal:


In [11]:
spectral_rolloff = librosa.feature.spectral_rolloff(x+0.01, sr=sr)[0]
librosa.display.waveplot(x, sr=sr, alpha=0.4)
plt.plot(t, normalize(spectral_rolloff), color='r')


Out[11]:
[<matplotlib.lines.Line2D at 0x1141df690>]