In [1]:
import numpy, scipy, matplotlib.pyplot as plt, pandas, librosa

Segmentation

In audio processing, it is common to operate on one frame at a time using a constant frame size and hop size (i.e. increment). Frames are typically chosen to be 10 to 100 ms in duration.

Let's create an audio sweep signal that is frequency modulated from 110 Hz to 880 Hz. Then, we will segment the signal and compute the zero crossing rate for each frame.

First, set our parameters:


In [2]:
T = 3.0      # duration in seconds
fs = 44100.0 # sampling rate in Hertz
f0 = 440*numpy.logspace(-2, 1, T*fs, endpoint=False, base=2.0) # time-varying frequency
print f0.min(), f0.max() # starts at 110 Hz, ends at 880 Hz


110.0 879.9861686

Create the sweep signal:


In [3]:
t = numpy.linspace(0, T, T*fs, endpoint=False)
x = 0.01*numpy.sin(2*numpy.pi*f0*t)

Listen to the signal:


In [4]:
from IPython.display import Audio
Audio(x, rate=fs)


Out[4]:

Segmentation Using Python List Comprehensions

In Python, you can use a standard list comprehension to perform segmentation of a signal and compute zero-crossing rate at the same time:


In [5]:
import essentia
from essentia.standard import ZeroCrossingRate
zcr = ZeroCrossingRate()
frame_sz = 1024
hop_sz = 512
plt.semilogy([zcr(essentia.array(x[i:i+frame_sz])) for i in range(0, len(x), hop_sz)])


Out[5]:
[<matplotlib.lines.Line2D at 0x100cdda10>]

librosa.util.frame

Given a signal, librosa.util.frame will produce a list of uniformly sized frames:


In [6]:
F = librosa.util.frame(x, frame_sz, hop_sz)
print F.shape


(1024, 257)

(That being said, in librosa, manual segmentation of a signal is often unnecessary, because the feature extraction methods themselves do segmentation for you.)

essentia.standard.FrameGenerator

We can also use essentia.standard.FrameGenerator to segment our audio signal.

For each frame, compute the zero crossing rate, and display:


In [7]:
import essentia
from essentia.standard import FrameGenerator
plt.semilogy([zcr(frame) for frame in FrameGenerator(essentia.array(x), frameSize=frame_sz, hopSize=hop_sz)])


Out[7]:
[<matplotlib.lines.Line2D at 0x1018ada50>]

Example: Spectrogram

Let's create a spectrogram. For each frame in the signal, we will window it by applying a Hamming window, and then compute its spectrum.


In [8]:
from essentia.standard import Spectrum, Windowing, FrameGenerator
hamming_window = Windowing(type='hamming')
spectrum = Spectrum()  # we just want the magnitude spectrum

spectrogram = numpy.array([spectrum(hamming_window(frame))
                     for frame in FrameGenerator(essentia.array(x), frameSize=frame_sz, hopSize=hop_sz)])

This spectrogram has 260 frames, each containing 513 frequency bins.


In [9]:
print spectrogram.shape


(260, 513)

Finally, plot the spectrogram. We must transpose the spectrogram array such that time is displayed along the horizontal axis, and frequency is along the vertical axis.


In [10]:
plt.imshow(spectrogram.T, origin='lower', aspect='auto', interpolation='nearest')
plt.ylabel('Spectral Bin Index')
plt.xlabel('Frame Index')


Out[10]:
<matplotlib.text.Text at 0x106f25790>

(There are easier ways to display a spectrogram, e.g. using Matplotlib or librosa. This example was just used to illustrate segmentation in Essentia.)