In this iPython notebook, we present a real-time Python implementation of the GCC-NMF speech enhancement algorithm presented in:
that built on previous work:
This is essentially a real-time implementation of the Online Speech Enhancement notebook presented previously, with the addition of soft GCC-NMF mask generation.
In addition to numpy and scipy, a few additional dependencies are required:
To run the graphical user interface, we also require:
All dependencies can be installed with pip, either on the command line:
$ pip install numpy scipy theano pyaudio
$ pip install pyqt5 pyqtgraph
or programatically from within this notebook or a Python interpreter:
In [1]:
import pip
In [2]:
# required dependencies
pip.main(['install', 'numpy'])
pip.main(['install', 'scipy'])
pip.main(['install', 'theano'])
pip.main(['install', 'pyaudio'])
print('\nFinished installing required dependencies')
In [3]:
# dependencies for GUI
pip.main(['install', 'pyqt5'])
pip.main(['install', 'pyqtgraph'])
print('\nFinished installing GUI dependencies')
In [4]:
import logging
logging.getLogger().setLevel(logging.INFO)
In [5]:
from gccNMF.realtime.runRealtimeGCCNMF import RealtimeGCCNMF
RealtimeGCCNMF()
print('Done!')
Startup may take a little while the first time it runs, as we pre-learn several NMF dictionaries of varying size. The dictionaries are saved to data/pretrainedW
, so subsequent launches will be much quicker.
Once the window opens, click the Play
button (or hit space) and you should see something like this:
The images roll with the input audio in waterfall-style. At the left, we see the input spectrogram on top, and the output spectrogram on the bottom. At the right, we have the GCC-PHAT angular spectrogram of the input on top (numTDOA x time), and the NMF mask on the bottom (numAtoms x time). In the center-bottom, we see the currently selected NMF dictionary (numAtom x frequency). Finally, at the center-top, we have the controls over the GCC-NMF masking function, dictionary size, number of coefficient inference updates per frame, as well as buttons to control playback, enable/disable enhancement, and toggle the info strings above the images.
In the example above, you have full control over all masking function parameters including the estimated target location, i.e. via the Center
slider in the GCC-NMF Masking Function
panel. You can instead enable real-time localization by clicking the Enable Localization
button and selecting a desired sliding window size for the GCC-PHAT localization. Smaller window sizes will track faster changes in source position but may switch to background noise during short pauses in the speech. Larger window sizes result in a more stable tracking for more slowly moving speakers.
We can use the -i
flag to start GCC-NMF with an input example with a moving speaker:
$ python gccNMF/realtime/runRealtimeGCCNMF.py -i data/dev_A_1_2_3_4_mix.wav
Then, by clicking Enable Localization
, you should see something like the screenshot below. The red trace in the GCC-PHAT Angular Spectrogram
panel shows the history of the estimated source location. Also note that the Center
parameter of the masking function is now disabled as it is controlled automatically by the online localization algorithm.
Real-time GCC-NMF can also be run without the GUI, by passing the --no-gui
as an argument on the command line:
$ python gccNMF/realtime/runRealtimeGCCNMF.py --no-gui
or programatically by instantiating the RealtimeGCCNMFNoGUI
class instead of the RealtimeGCCNMF
class we used above:
In [ ]:
from gccNMF.realtime.runRealtimeGCCNMF import RealtimeGCCNMFNoGUI
RealtimeGCCNMFNoGUI()
print('Done!')
GCC-NMF parameters can be set via a configuration file, with a default config file generated as gccNMF.config
at first launch. This config file can either be modified, or a new config file may be specified. The input wav file may also be specified in a similar manner.
The config and input file paths can either be specified with the following optional command line arguments,
$ python gccNMF/realtime/runRealtimeGCCNMF.py --input <path/to/wav/file> --config <path/to/config/file>
or programatically as,
RealtimeGCCNMF(audioPath='<path/to/wav/file>', configPath='<path/to/config/file>')
$ python gccNMF/realtime/runRealtimeGCCNMF.py --help
usage: runRealtimeGCCNMF.py [-h] [-i INPUT] [-c CONFIG] [--no-gui]
Real-time GCC-NMF Speech Enhancement
optional arguments:
-h, --help
show this help message and exit
-i INPUT, --input INPUT
input wav file path
-c CONFIG, --config CONFIG
config file path
--no-gui
no user interface mode
In [ ]: