Sparse Coding of Natural Audio Scenes

Prof. Michael A. Casey, Dartmouth College

License:
Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
http://creativecommons.org/licenses/by-nc/4.0/



In [1]:

    
from bregman.suite import *
import sparseapprox as S

Constant-Q Time-Frequency Analysis

Analyze 10s of a natural scene (field recording) using a log frequency scale (constant-Q transform) and plot the resulting time series and its time-averaged constant-Q spectrum.



In [2]:

    
x,sr,fmt = wavread('chernobyl.wav', last=10*44100) # load the sound fle
x = x.mean(1) # take the average of the channels
F = LogFrequencySpectrum(x, nhop=4096, nfft=16384, wfft=8192, npo=24) # constant-Q transform
F.feature_plot(dbscale=1) # plot the transform
figure(); title('Average Constant-Q Spectrum')
plot((F.X**2).sum(1)) # and the time-averaged constant-Q spectrum









    Out[2]:





[<matplotlib.lines.Line2D at 0xaf016e0c>]

Inverting the analysis back to audio

Invert the constant-Q transform to an audio signal, using inverse constant-Q transform and inverse short-time Fourier transform.



In [3]:

    
xh = F.inverse()
play(balance_signal(xh))









    



Period size is 64 , Buffer size is 22050

Sparse Approximation

Learn sparse codes from data using dictionary learnng on 16x16 patches of the constant-Q transform.



In [4]:

    
SS = S.SparseApproxSpectrum(n_components=9, patch_size=(16,16)) # learn 16 components of 8x8 patches
SS.extract_codes(F.X, standardize=1) # Use standardized patches
SS.plot_codes(cbar=1,cmap=cm.hot) # show the learned codes









    



(7440, 256)
Dictionary learning from data...

Reconstruct the constant-Q spectrogram using each learned patch basis. Do this for each patch separately.



In [5]:

    
# Reconstruct the spectra from sparse dictionary
SS.reconstruct_individual_spectra(plotting=1)









    Out[5]:





<sparseapprox.SparseApproxSpectrum at 0xaee058ac>



In [10]:

    
# Reconstruct signal from approximated dictionary atom spectrum
x_hat = F.inverse(SS.X_hat_l[0]) # <- change the reconstruct patch index here
feature_plot(SS.X_hat,dbscale=0,normalize=1)
feature_plot(F.X,dbscale=0,normalize=1)
play(balance_signal(F.x_hat))









    



Period size is 64 , Buffer size is 22050



In [7]:

    
# Make random phase map for signal reconstruction
Phi_hat=rand(*F.STFT.shape)*2*pi-pi
print Phi_hat.shape









    



(8193, 108)



In [9]:

    
# Reconstruct signal from random phases and approximated dictionary atom spectrum
x_hat = F.inverse(SS.X_hat_l[0], Phi_hat=Phi_hat, pvoc=1) # <- change the reconstruct patch index here
play(balance_signal(F.x_hat))









    



Phase Vocoder Resynthesis... 16384 8192 4096 1
Period size is 64 , Buffer size is 22050



In [ ]: