Predominant Mask - MusicBricks Tutorial

Introduction

This tutorial will guide you through some tools for performing spectral analysis and synthesis using the Essentia library (http://www.essentia.upf.edu). In this case we use a STFT analysis/synthesis workflow together with predominant pitch estimation with the goal to remove or soloing the predominant source. This algorithm uses a binary masking technique, modifying the magnitude values at the frequency bins in the spectrum that correspond to the harmonic series of the predominant pitch. It can be seen as a very primitive approach to 'source separation'.

You should first install the Essentia library with Python bindings. Installation instructions are detailed here: http://essentia.upf.edu/documentation/installing.html .

Processing steps


In [1]:
# import essentia in standard mode
import essentia
import essentia.standard
from essentia.standard import *

After importing Essentia library, let's import other numerical and plotting tools


In [2]:
# import matplotlib for plotting
import matplotlib.pyplot as plt
import numpy as np

Define the parameters of the STFT workflow


In [3]:
# algorithm parameters
framesize = 2048
hopsize = 128 #  PredominantPitchMelodia requires a hopsize of 128
samplerate = 44100.0
attenuation_dB = 100
maskbinwidth = 2

Specify input and output audio filenames


In [4]:
inputFilename = 'flamenco.wav'
outputFilename = 'flamenco_stft.wav'

In [5]:
# create an audio loader and import audio file
loader = essentia.standard.MonoLoader(filename = inputFilename, sampleRate = 44100)
audio = loader()
print("Duration of the audio sample [sec]:")
print(len(audio)/44100.0)


Duration of the audio sample [sec]:
14.22859410430839

Define algorithm chain for frame-by-frame process: FrameCutter -> Windowing -> FFT -> IFFT OverlapAdd -> AudioWriter

Predominant pitch extraction


In [6]:
#extract predominant pitch
# PitchMelodia takes the entire audio signal as input - no frame-wise processing is required here.
pExt = PredominantPitchMelodia(frameSize = framesize, hopSize = hopsize, sampleRate = samplerate)
pitch, pitchConf = pExt(audio)

In [7]:
# algorithm workflow for harmonic mask using the STFT frame-by-frame
fcut = FrameCutter(frameSize = framesize, hopSize = hopsize);
w = Windowing(type = "hann");
fft = FFT(size = framesize);
hmask = HarmonicMask( sampleRate = samplerate, binWidth = maskbinwidth, attenuation = attenuation_dB);
ifft = IFFT(size = framesize);
overl = OverlapAdd (frameSize = framesize, hopSize = hopsize);
awrite = MonoWriter (filename = outputFilename, sampleRate = 44100);

Now we loop over all audio frames and store the processed audio sampels in the output array


In [8]:
audioout = np.array(0) # initialize output array

for idx, frame in enumerate(FrameGenerator(audio, frameSize = framesize, hopSize = hopsize)):
     # STFT analysis
    infft = fft(w(frame))
    # get pitch of current frame
    curpitch = pitch[idx]

    # here we  apply the harmonic mask spectral transformations
    outfft = hmask(infft, pitch[idx]);

    # STFT synthesis
    out = overl(ifft(outfft))
    audioout = np.append(audioout, out)

Finally we write the processed audio array as a WAV file


In [9]:
# write audio output
awrite(audioout.astype(np.float32))