Sparse Coding of Natural Audio Scenes

Prof. Michael A. Casey, Dartmouth College

License:
Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
http://creativecommons.org/licenses/by-nc/4.0/


In [1]:
from bregman.suite import *
import sparseapprox as S

Constant-Q Time-Frequency Analysis

Analyze 10s of a natural scene (field recording) using a log frequency scale (constant-Q transform) and plot the resulting time series and its time-averaged constant-Q spectrum.


In [2]:
x,sr,fmt = wavread('chernobyl.wav', last=10*44100) # load the sound fle
x = x.mean(1) # take the average of the channels
F = LogFrequencySpectrum(x, nhop=4096, nfft=16384, wfft=8192, npo=24) # constant-Q transform
F.feature_plot(dbscale=1) # plot the transform
figure(); title('Average Constant-Q Spectrum')
plot((F.X**2).sum(1)) # and the time-averaged constant-Q spectrum


Out[2]:
[<matplotlib.lines.Line2D at 0xaf016e0c>]

Inverting the analysis back to audio

Invert the constant-Q transform to an audio signal, using inverse constant-Q transform and inverse short-time Fourier transform.


In [3]:
xh = F.inverse()
play(balance_signal(xh))


Period size is 64 , Buffer size is 22050

Sparse Approximation

Learn sparse codes from data using dictionary learnng on 16x16 patches of the constant-Q transform.


In [4]:
SS = S.SparseApproxSpectrum(n_components=9, patch_size=(16,16)) # learn 16 components of 8x8 patches
SS.extract_codes(F.X, standardize=1) # Use standardized patches
SS.plot_codes(cbar=1,cmap=cm.hot) # show the learned codes


(7440, 256)
Dictionary learning from data...

Reconstruct the constant-Q spectrogram using each learned patch basis. Do this for each patch separately.


In [5]:
# Reconstruct the spectra from sparse dictionary
SS.reconstruct_individual_spectra(plotting=1)


Out[5]:
<sparseapprox.SparseApproxSpectrum at 0xaee058ac>

In [10]:
# Reconstruct signal from approximated dictionary atom spectrum
x_hat = F.inverse(SS.X_hat_l[0]) # <- change the reconstruct patch index here
feature_plot(SS.X_hat,dbscale=0,normalize=1)
feature_plot(F.X,dbscale=0,normalize=1)
play(balance_signal(F.x_hat))


Period size is 64 , Buffer size is 22050

In [7]:
# Make random phase map for signal reconstruction
Phi_hat=rand(*F.STFT.shape)*2*pi-pi
print Phi_hat.shape


(8193, 108)

In [9]:
# Reconstruct signal from random phases and approximated dictionary atom spectrum
x_hat = F.inverse(SS.X_hat_l[0], Phi_hat=Phi_hat, pvoc=1) # <- change the reconstruct patch index here
play(balance_signal(F.x_hat))


Phase Vocoder Resynthesis... 16384 8192 4096 1
Period size is 64 , Buffer size is 22050

In [ ]: