In [1]:
# first, we need to import our essentia module. It is aptly named 'essentia'!
import essentia
# as there are 2 operating modes in essentia which have the same algorithms,
# these latter are dispatched into 2 submodules:
import essentia.standard
import essentia.streaming
# let's have a look at what is in there
print dir(essentia.standard)
# you can also do it by using autocompletion in IPython, typing "essentia.standard." and pressing Tab
Let's start doing some useful things now!
Before you can use algorithms in Essentia, you first need to instantiate (create) them. When doing so, you can give them parameters which they may need to work properly, such as the filename of the audio file in the case of an audio loader.
Once you have instantiated an algorithm, nothing has happened yet, but your algorithm is ready to be used and works like a function, that is, you have to call it to make stuff happen (technically, it is a function object).
Essentia has a selection of audio loaders:
In [3]:
# we start by instantiating the audio loader:
loader = essentia.standard.MonoLoader(filename='../../../test/audio/recorded/musicbox.wav')
# and then we actually perform the loading:
audio = loader()
By default, the MonoLoader will output audio with 44100Hz samplerate. To make sure that this actually worked, let's plot a 1-second slice of audio, from t = 1sec to t = 2sec:
In [4]:
# pylab contains the plot() function, as well as figure, etc... (same names as Matlab)
from pylab import plot, show, figure, imshow
plot(audio[1*44100:2*44100])
show() # unnecessary if you started "ipython --pylab"
Note that if you have started IPython with the --pylab
option, the call to
show() is not necessary, and you don't have to close the plot to regain control of your terminal.
So let's say that we want to compute the MFCCs for the frames in our audio.
We will need the following algorithms: Windowing, Spectrum, MFCC:
In [5]:
from essentia.standard import *
w = Windowing(type = 'hann')
spectrum = Spectrum() # FFT() would return the complex FFT, here we just want the magnitude spectrum
mfcc = MFCC()
Let's have a look at the inline help using help command (you can also see it by typing "MFCC?" in IPython):
In [5]:
help(MFCC)
Once algorithms have been instantiated, they work like normal functions:
In [6]:
frame = audio[5*44100 : 5*44100 + 1024]
spec = spectrum(w(frame))
plot(spec)
show() # unnecessary if you started "ipython --pylab"
In [7]:
mfccs = []
frameSize = 1024
hopSize = 512
for fstart in range(0, len(audio)-frameSize, hopSize):
frame = audio[fstart:fstart+frameSize]
mfcc_bands, mfcc_coeffs = mfcc(spectrum(w(frame)))
mfccs.append(mfcc_coeffs)
# and plot them...
# as this is a 2D array, we need to use imshow() instead of plot()
imshow(mfccs, aspect = 'auto')
show() # unnecessary if you started "ipython --pylab"
See also that the MFCC algorithm returns 2 values: the band energies and the coefficients, and that you get (unpack) them the same way as in Matlab.
Let's see if we can write this in a nicer way, though.
Essentia has been designed to do audio processing, and as such it has lots of readily available related algorithms; you don't have to chase around lots of toolboxes to be able to achieve what you want. For more details, it is recommended to have a look either at the algorithms_overview or at the complete reference.
In particular, we will use the FrameGenerator here:
In [8]:
mfccs = []
for frame in FrameGenerator(audio, frameSize = 1024, hopSize = 512):
mfcc_bands, mfcc_coeffs = mfcc(spectrum(w(frame)))
mfccs.append(mfcc_coeffs)
# transpose to have it in a better shape
# we need to convert the list to an essentia.array first (== numpy.array of floats)
mfccs = essentia.array(mfccs).T
# and plot
imshow(mfccs[1:,:], aspect = 'auto')
show() # unnecessary if you started "ipython --pylab"
We ignored the first MFCC coefficient to disregard the power of the signal and only plot its spectral shape.
A Pool is a container similar to a C++ map or Python dict which can contain any type of values (easy in Python, not as much in C++...). Values are stored in there using a name which represent the full path to these values; dot ('.') characters are used as separators. You can think of it as a directory tree, or as namespace(s) + local name.
Examples of valid names are: "bpm"
, "lowlevel.mfcc"
, "highlevel.genre.rock.probability"
, etc...
So let's redo the previous computations using a pool:
In [9]:
pool = essentia.Pool()
for frame in FrameGenerator(audio, frameSize = 1024, hopSize = 512):
mfcc_bands, mfcc_coeffs = mfcc(spectrum(w(frame)))
pool.add('lowlevel.mfcc', mfcc_coeffs)
pool.add('lowlevel.mfcc_bands', mfcc_bands)
imshow(pool['lowlevel.mfcc'].T[1:,:], aspect = 'auto')
show() # unnecessary if you started "ipython --pylab"
figure()
imshow(pool['lowlevel.mfcc_bands'].T, aspect = 'auto', interpolation = 'nearest')
Out[9]:
The pool also has the nice advantage that the data you get out of it is already in
an essentia.array
format (which is equal to numpy.array of floats), so you can
call transpose (.T
) directly on it.
Let's finish this tutorial by writing our results to a file. As we are using such a nice language as Python, we could use its facilities for writing data to a file, but for the sake of this tutorial let's do it using the YamlOutput
algorithm,
which writes a pool in a file using the YAML or JSON format.
In [10]:
output = YamlOutput(filename = 'mfcc.sig') # use "format = 'json'" for JSON output
output(pool)
# or as a one-liner:
YamlOutput(filename = 'mfcc.sig')(pool)
This should take a while as we actually write the MFCCs for all the frames, which can be quite heavy depending on the duration of your audio file.
Now let's assume we do not want all the frames but only the mean and variance of those frames. We can do this using the PoolAggregator
algorithm and use it on the pool to get a new pool with the aggregated descriptors:
In [12]:
# compute mean and variance of the frames
aggrPool = PoolAggregator(defaultStats = [ 'mean', 'var' ])(pool)
print 'Original pool descriptor names:'
print pool.descriptorNames()
print
print 'Aggregated pool descriptor names:'
print aggrPool.descriptorNames()
# and ouput those results in a file
YamlOutput(filename = 'mfccaggr.sig')(aggrPool)
In [13]:
!cat mfccaggr.sig
And this closes the tutorial!
There is not much more to know about Essentia for using it in python environment, the basics are:
The big strength of Essentia is that it provides a considerably large collection of algorithms, from low-level to high-level descriptors, which have been thoroughly optimized and tested and which you can rely on to build your own signal analysis.