# NMF Source Separation

Piano notes consist of a very stable stack of harmonics for each note, with a smooth decay. This kind of sound event can be well captured by separable time and frequency components, i.e., a single spectral profile modulated in time by the decaying envelope of each note event. NMF (Nonnegative Matrix Factorization) tries to find the best decomposition of a matrix into a fixed number of spectrum/envelope pairs. Under the right circumstances, these components can end up being individual piano notes - although controlling it can be tricky. This practical uses the NMF implementation from `sklearn` (which is wrapped by `librosa`) to decompose a brief excerpt of piano.

``````

In [1]:

%pylab inline
from __future__ import print_function
import os
import time
import IPython

import numpy as np

import librosa

``````
``````

Populating the interactive namespace from numpy and matplotlib

``````
``````

In [2]:

def my_specshow(S, sr, t_max=None, f_max=None, **kwargs):
"""Call librosa.specshow but limit the time and frequency ranges."""
librosa.display.specshow(S, sr=sr, cmap='jet', x_axis='time', y_axis='linear', **kwargs)
if t_max == None:
t_max = librosa.frames_to_time(S.shape[1])
tick_frames = librosa.time_to_frames(np.arange(round(t_max) + 1.0))
plt.xlim([0, tick_frames[-1]])
if f_max == None:
f_max = S.shape[0] * sr / n_fft
freqs = librosa.fft_frequencies()
max_bin = np.flatnonzero(freqs >= f_max)[0]
plt.ylim([0, max_bin])
librosa.display.time_ticks(tick_frames, librosa.frames_to_time(tick_frames), n_ticks=len(tick_frames))
plt.xlabel('Time')
librosa.display.frequency_ticks(freqs[:max_bin], axis='y')
plt.ylabel('Hz');

``````
``````

In [3]:

print(sr, y.shape)

``````
``````

11025 (330750,)

``````
``````

In [4]:

# STFT analysis of the piano excerpt.
n_fft = 1024
S = librosa.stft(y, n_fft=n_fft)
plt.figure(figsize=(10, 4))
# Zoom in on the first few notes, and the lowest few harmonics.
t_max = 5.0
f_max = 2000.0
my_specshow(librosa.logamplitude(S), sr=sr, t_max=t_max, f_max=f_max)
plt.colorbar()
IPython.display.Audio(y, rate=sr)

``````
``````

Out[4]:

Your browser does not support the audio element.

``````
``````

In [5]:

# Filter out the first three notes by a low-pass filter, applied directly
# on the spectrogram.  It looks like cutting at 450 Hz should do it, so
# zero out all the bins above that.
cutoff_bin = int(round(450.0 / sr * n_fft))
S_lpf = S.copy()
S_lpf[cutoff_bin:] = 0
# Convert spectrogram back to audio.
y_lpf = librosa.istft(S_lpf)
plt.figure(figsize=(10, 4))
# Re-analyze back into STFT
my_specshow(librosa.logamplitude(librosa.stft(y_lpf, n_fft=n_fft)), sr=sr, t_max=t_max, f_max=f_max)
plt.colorbar()
plt.clim([-56, 12])
IPython.display.Audio(y_lpf, rate=sr)

``````
``````

Out[5]:

``````