STFT Analysis/Synthesis - MusicBricks Tutorial

Introduction

This tutorial will guide you through some tools for performing spectral analysis and synthesis using the Essentia library (http://www.essentia.upf.edu). STFT stands for Short-Time Fourier Transform and it processes an input audio signal as a sequence of spectral frames. Spectral frames are complex-valued arrays contain the frequency representation of the windowed input signal.

This algorithm shows hwo to analyze the input signal, and resynthesize it again, allowing to apply new transformations directly on the spectral domain.

You should first install the Essentia library with Python bindings. Installation instructions are detailed here: http://essentia.upf.edu/documentation/installing.html .

Processing steps



In [10]:

    
# import essentia in standard mode
import essentia
import essentia.standard
from essentia.standard import *

After importing Essentia library, let's import other numerical and plotting tools



In [11]:

    
# import matplotlib for plotting
import matplotlib.pyplot as plt
import numpy as np

Define the parameters of the STFT workflow



In [16]:

    
# algorithm parameters
framesize = 2048
hopsize = 256

Specify input and output audio filenames



In [17]:

    
inputFilename = 'predom.wav'
outputFilename = 'predom_stft.wav'



In [18]:

    
# create an audio loader and import audio file
loader = essentia.standard.MonoLoader(filename = inputFilename, sampleRate = 44100)
audio = loader()
print("Duration of the audio sample [sec]:")
print(len(audio)/44100.0)









    



Duration of the audio sample [sec]:
14.2285941043

Define algorithm chain for frame-by-frame process: FrameCutter -> Windowing -> FFT -> IFFT OverlapAdd -> AudioWriter



In [19]:

    
# algorithm instantation
fcut = FrameCutter(frameSize = framesize, hopSize = hopsize);
w = Windowing(type = "hann");
fft = FFT(size = framesize);
ifft = IFFT(size = framesize);
overl = OverlapAdd (frameSize = framesize, hopSize = hopsize);
awrite = MonoWriter (filename = outputFilename, sampleRate = 44100);

Now we loop over all audio frames and store the processed audio sampels in the output array



In [20]:

    
audioout = np.array(0) # initialize output array

for frame in FrameGenerator(audio, frameSize = framesize, hopSize = hopsize):
    # STFT analysis
    infft = fft(w(frame))

    # here we could apply spectral transformations
    outfft = infft

    # STFT synthesis
    out = overl(ifft(outfft))
    audioout = np.append(audioout, out)

Finally we write the processed audio array as a WAV file



In [21]:

    
# write audio output
awrite(audio)