Real-time Speech Enhancement with GCC-NMF

Sean UN Wood, August 2017

In this iPython notebook, we present a real-time Python implementation of the GCC-NMF speech enhancement algorithm presented in:

that built on previous work:

This is essentially a real-time implementation of the Online Speech Enhancement notebook presented previously, with the addition of soft GCC-NMF mask generation.

Notebook Overview

  1. Preliminary setup: Python dependencies
  2. Real-time GCC-NMF: Graphical User Interface (GUI)
  3. Real-time GCC-NMF: Command-line Interface (CLI)
  4. Configuration parameters

1. Preliminary setup

Dependencies

In addition to numpy and scipy, a few additional dependencies are required:

  1. Theano for its GPU acceleration and optimizing compiler.
  2. PyAudio for real-time audio playback.

To run the graphical user interface, we also require:

  1. PyQt Python bindings for the Qt application framework.
  2. pyqtgraph scientific graphics and GUI library.

Installing dependencies

All dependencies can be installed with pip, either on the command line:

$ pip install numpy scipy theano pyaudio

$ pip install pyqt5 pyqtgraph

or programatically from within this notebook or a Python interpreter:


In [1]:
import pip

In [2]:
# required dependencies
pip.main(['install', 'numpy'])
pip.main(['install', 'scipy'])
pip.main(['install', 'theano'])
pip.main(['install', 'pyaudio'])
print('\nFinished installing required dependencies')


Requirement already satisfied: numpy in /usr/local/lib/python3.6/site-packages
Requirement already satisfied: scipy in /usr/local/lib/python3.6/site-packages
Requirement already satisfied: numpy>=1.8.2 in /usr/local/lib/python3.6/site-packages (from scipy)
Requirement already satisfied: theano in /usr/local/lib/python3.6/site-packages
Requirement already satisfied: numpy>=1.9.1 in /usr/local/lib/python3.6/site-packages (from theano)
Requirement already satisfied: scipy>=0.14 in /usr/local/lib/python3.6/site-packages (from theano)
Requirement already satisfied: six>=1.9.0 in /usr/local/lib/python3.6/site-packages (from theano)
Requirement already satisfied: pyaudio in /usr/local/lib/python3.6/site-packages

Finished installing required dependencies

In [3]:
# dependencies for GUI
pip.main(['install', 'pyqt5'])
pip.main(['install', 'pyqtgraph'])
print('\nFinished installing GUI dependencies')


Requirement already satisfied: pyqt5 in /usr/local/lib/python3.6/site-packages
Requirement already satisfied: sip<4.20,>=4.19.3 in /usr/local/lib/python3.6/site-packages (from pyqt5)
Requirement already satisfied: pyqtgraph in /usr/local/lib/python3.6/site-packages
Requirement already satisfied: numpy in /usr/local/lib/python3.6/site-packages (from pyqtgraph)

Finished installing GUI dependencies

2. Real-time GCC-NMF: with GUI

Preliminary imports for logging


In [4]:
import logging
logging.getLogger().setLevel(logging.INFO)

Running real-time GCC-NMF

The real-time GCC-NMF executable can be found at gccNMF/realtime/runRealtimeGCCNMF.py. At the root directory of this repo, we can start the program as follows:

$ python gccNMF/realtime/runRealtimeGCCNMF.py

We can also start it programatically:


In [5]:
from gccNMF.realtime.runRealtimeGCCNMF import RealtimeGCCNMF
RealtimeGCCNMF()
print('Done!')


GCCNMFConfig: loading configuration params...
TDOA
    numTDOAs: 64
    numTDOAHistory: 128
    numSpectrogramHistory: 128
    gccPHATNLAlpha: 2.0
    gccPHATNLEnabled: False
    microphoneSeparationInMetres: 0.1
    targetTDOAEpsilon: 5.0
    targetTDOABeta: 2.0
    targetTDOANoiseFloor: 0.0
Audio
    numChannels: 2
    sampleRate: 16000
    deviceIndex: None
STFT
    windowSize: 1024
    hopSize: 512
    blockSize: 512
NMF
    dictionarySize: 64
    dictionarySizes: [64, 128, 256, 512, 1024]
    dictionaryType: Pretrained
    numHUpdates: 0
GCCNMFPretraining: Loading pretrained W (size 64): /gcc-nmf/data/pretrainedW/W_64.npy
GCCNMFPretraining: Pretrained W not found at /gcc-nmf/data/pretrainedW/W_64.npy, creating...
GCCNMFPretraining: Loading pretrained W (size 128): /gcc-nmf/data/pretrainedW/W_128.npy
GCCNMFPretraining: Pretrained W not found at /gcc-nmf/data/pretrainedW/W_128.npy, creating...
GCCNMFPretraining: Loading pretrained W (size 256): /gcc-nmf/data/pretrainedW/W_256.npy
GCCNMFPretraining: Pretrained W not found at /gcc-nmf/data/pretrainedW/W_256.npy, creating...
GCCNMFPretraining: Loading pretrained W (size 512): /gcc-nmf/data/pretrainedW/W_512.npy
GCCNMFPretraining: Pretrained W not found at /gcc-nmf/data/pretrainedW/W_512.npy, creating...
GCCNMFPretraining: Loading pretrained W (size 1024): /gcc-nmf/data/pretrainedW/W_1024.npy
GCCNMFPretraining: Pretrained W not found at /gcc-nmf/data/pretrainedW/W_1024.npy, creating...
RealtimeGCCNMF: Starting with audio path: /gcc-nmf/data/dev_Sq1_Co_A_mix.wav
Loading interface with audio path: /gcc-nmf/data/dev_Sq1_Co_A_mix.wav
GCCNMFInterface: setting dictionarySize: 64
RealtimeGCCNMFInterfaceWindow: closing...
Window closed
GCCNMFProcessor: received terminate
Audio process joined
GCCNMF process joined
Done!

Startup may take a little while the first time it runs, as we pre-learn several NMF dictionaries of varying size. The dictionaries are saved to data/pretrainedW, so subsequent launches will be much quicker.

Once the window opens, click the Play button (or hit space) and you should see something like this:

The images roll with the input audio in waterfall-style. At the left, we see the input spectrogram on top, and the output spectrogram on the bottom. At the right, we have the GCC-PHAT angular spectrogram of the input on top (numTDOA x time), and the NMF mask on the bottom (numAtoms x time). In the center-bottom, we see the currently selected NMF dictionary (numAtom x frequency). Finally, at the center-top, we have the controls over the GCC-NMF masking function, dictionary size, number of coefficient inference updates per frame, as well as buttons to control playback, enable/disable enhancement, and toggle the info strings above the images.

2.1 Real-time localization

In the example above, you have full control over all masking function parameters including the estimated target location, i.e. via the Center slider in the GCC-NMF Masking Function panel. You can instead enable real-time localization by clicking the Enable Localization button and selecting a desired sliding window size for the GCC-PHAT localization. Smaller window sizes will track faster changes in source position but may switch to background noise during short pauses in the speech. Larger window sizes result in a more stable tracking for more slowly moving speakers.

We can use the -i flag to start GCC-NMF with an input example with a moving speaker:

$ python gccNMF/realtime/runRealtimeGCCNMF.py -i data/dev_A_1_2_3_4_mix.wav

Then, by clicking Enable Localization, you should see something like the screenshot below. The red trace in the GCC-PHAT Angular Spectrogram panel shows the history of the estimated source location. Also note that the Center parameter of the masking function is now disabled as it is controlled automatically by the online localization algorithm.

3. Real-time GCC-NMF: no GUI

Real-time GCC-NMF can also be run without the GUI, by passing the --no-gui as an argument on the command line:

$ python gccNMF/realtime/runRealtimeGCCNMF.py --no-gui

or programatically by instantiating the RealtimeGCCNMFNoGUI class instead of the RealtimeGCCNMF class we used above:


In [ ]:
from gccNMF.realtime.runRealtimeGCCNMF import RealtimeGCCNMFNoGUI
RealtimeGCCNMFNoGUI()
print('Done!')

4. Configuration parameters

Input and Config files

GCC-NMF parameters can be set via a configuration file, with a default config file generated as gccNMF.config at first launch. This config file can either be modified, or a new config file may be specified. The input wav file may also be specified in a similar manner.

The config and input file paths can either be specified with the following optional command line arguments,

$ python gccNMF/realtime/runRealtimeGCCNMF.py --input <path/to/wav/file> --config <path/to/config/file>

or programatically as,

RealtimeGCCNMF(audioPath='<path/to/wav/file>', configPath='<path/to/config/file>')

Help

$ python gccNMF/realtime/runRealtimeGCCNMF.py --help

usage: runRealtimeGCCNMF.py [-h] [-i INPUT] [-c CONFIG] [--no-gui]

Real-time GCC-NMF Speech Enhancement

optional arguments:

-h, --help
show this help message and exit

-i INPUT, --input INPUT
input wav file path

-c CONFIG, --config CONFIG
config file path

--no-gui
no user interface mode


In [ ]: