In [1]:

    
import numpy, scipy, matplotlib.pyplot as plt, sklearn, stanford_mir, IPython.display
%matplotlib inline
plt.rcParams['figure.figsize'] = (14, 5)

Harmonic-Percussive Source Separation

Load two files: one harmonic, and one percussive.



In [2]:

    
yh, fs = librosa.load('prelude_cmaj_10s.wav')



In [3]:

    
yp, fs = librosa.load('125_bounce.wav')

Add the two signals together, and rescale:



In [4]:

    
N = min(len(yh), len(yp))
x = yh[:N]/yh.max() + yp[:N]/yp.max()
x = 0.5 * x/x.max()



In [5]:

    
x.max()









    Out[5]:





0.5

Listen to the combined audio signal:



In [6]:

    
IPython.display.Audio(x, rate=fs)









    Out[6]:

Compute the STFT:



In [7]:

    
X = librosa.stft(x)

Take the log-ampllitude for display purposes:



In [8]:

    
Xmag = librosa.logamplitude(X)

Display the log-magnitude spectrogram:



In [9]:

    
librosa.display.specshow(Xmag, sr=fs, x_axis='time', y_axis='log')









    Out[9]:





<matplotlib.image.AxesImage at 0x10dbe9a10>

Perform harmonic-percussive source separation:



In [10]:

    
H, P = librosa.decompose.hpss(X)

Compute the log-amplitudes of the outputs:



In [11]:

    
Hmag = librosa.logamplitude(H)
Pmag = librosa.logamplitude(P)

Display each output:



In [12]:

    
librosa.display.specshow(Hmag, sr=fs, x_axis='time', y_axis='log')









    Out[12]:





<matplotlib.image.AxesImage at 0x10dd75850>



In [13]:

    
librosa.display.specshow(Pmag, sr=fs, x_axis='time', y_axis='log')









    Out[13]:





<matplotlib.image.AxesImage at 0x10dd91250>

Transform the harmonic output back to the time domain:



In [14]:

    
h = librosa.istft(H)

Listen to the harmonic output:



In [15]:

    
IPython.display.Audio(h, rate=fs)









    Out[15]:

Transform the percussive output back to the time domain:



In [16]:

    
p = librosa.istft(P)

Listen to the percussive output:



In [17]:

    
IPython.display.Audio(p, rate=fs)









    Out[17]:

← Back to Index