In [1]:
import numpy, scipy, matplotlib.pyplot as plt, pandas, librosa
In audio processing, it is common to operate on one frame at a time using a constant frame size and hop size (i.e. increment). Frames are typically chosen to be 10 to 100 ms in duration.
Let's create an audio sweep signal that is frequency modulated from 110 Hz to 880 Hz. Then, we will segment the signal and compute the zero crossing rate for each frame.
First, set our parameters:
In [2]:
T = 3.0 # duration in seconds
fs = 44100.0 # sampling rate in Hertz
f0 = 440*numpy.logspace(-2, 1, T*fs, endpoint=False, base=2.0) # time-varying frequency
print f0.min(), f0.max() # starts at 110 Hz, ends at 880 Hz
Create the sweep signal:
In [3]:
t = numpy.linspace(0, T, T*fs, endpoint=False)
x = 0.01*numpy.sin(2*numpy.pi*f0*t)
Listen to the signal:
In [4]:
from IPython.display import Audio
Audio(x, rate=fs)
Out[4]:
In Python, you can use a standard list comprehension to perform segmentation of a signal and compute zero-crossing rate at the same time:
In [5]:
import essentia
from essentia.standard import ZeroCrossingRate
zcr = ZeroCrossingRate()
frame_sz = 1024
hop_sz = 512
plt.semilogy([zcr(essentia.array(x[i:i+frame_sz])) for i in range(0, len(x), hop_sz)])
Out[5]:
Given a signal, librosa.util.frame
will produce a list of uniformly sized frames:
In [6]:
F = librosa.util.frame(x, frame_sz, hop_sz)
print F.shape
(That being said, in librosa
, manual segmentation of a signal is often unnecessary, because the feature extraction methods themselves do segmentation for you.)
We can also use essentia.standard.FrameGenerator
to segment our audio signal.
For each frame, compute the zero crossing rate, and display:
In [7]:
import essentia
from essentia.standard import FrameGenerator
plt.semilogy([zcr(frame) for frame in FrameGenerator(essentia.array(x), frameSize=frame_sz, hopSize=hop_sz)])
Out[7]:
Let's create a spectrogram. For each frame in the signal, we will window it by applying a Hamming window, and then compute its spectrum.
In [8]:
from essentia.standard import Spectrum, Windowing, FrameGenerator
hamming_window = Windowing(type='hamming')
spectrum = Spectrum() # we just want the magnitude spectrum
spectrogram = numpy.array([spectrum(hamming_window(frame))
for frame in FrameGenerator(essentia.array(x), frameSize=frame_sz, hopSize=hop_sz)])
This spectrogram has 260 frames, each containing 513 frequency bins.
In [9]:
print spectrogram.shape
Finally, plot the spectrogram. We must transpose the spectrogram array such that time is displayed along the horizontal axis, and frequency is along the vertical axis.
In [10]:
plt.imshow(spectrogram.T, origin='lower', aspect='auto', interpolation='nearest')
plt.ylabel('Spectral Bin Index')
plt.xlabel('Frame Index')
Out[10]:
(There are easier ways to display a spectrogram, e.g. using Matplotlib or librosa
. This example was just used to illustrate segmentation in Essentia.)