Dynamic Programming (DP) is a general algorithm for finding the optimal solutions to a limited class of exponential problems in linear time. It is commonly used for aligning two sequences that may have slightly inconsistent time bases - this is known as Dynamic Time Warping (DTW) and was an early technique used in speech recognition; it is now widely applied in analogous problems in matching nucleotide sequences in bioinformatics.
The core algorithm is a implemented via nested loops, which typically defeat automatic vectorization. Thus, naive implementations in high-level languages such as Python and Matlab tend to be very inefficient. The code below shows 500x speedup for the core DP operation by replacing the Python with C code.
First, we set up the problem by loading two soundfiles and calculating their spectrograms. We then calculate the cosine similarity between each spectrum (columns of the spectrogram) as a local cost matrix, then use DP to find the lowest-cost path through the cost matrix, giving the optimal DTW alignment.
In [3]:
%matplotlib inline
In [7]:
cd /Users/dpwe/projects/dtw/dp_python
In [57]:
import numpy as np
import matplotlib.pyplot as plt
import librosa # soundfile access library - see https://github.com/bmcfee/librosa
import dpcore # the current library
In [93]:
# Mirror matlab example from http://www.ee.columbia.edu/ln/rosa/matlab/dtw/
# Two different speakers saying "Cottage cheese with chives is delicious"
d1, sr = librosa.load('sm1_cln.wav', sr=None)
d2, sr = librosa.load('sm2_cln.wav', sr=None)
# Calculate their short-time Fourier transforms
D1C = librosa.stft(d1, n_fft=512, hop_length=128)
D2C = librosa.stft(d2, n_fft=512, hop_length=128)
# We'll use the magnitudes to calculate similarity (ignore phase)
D1 = np.abs(D1C)
D2 = np.abs(D2C)
In [94]:
# Plot the spectrograms. You see a similar sequence of sounds, but different timing details
plt.subplot(2,1,1)
librosa.display.specshow(20*np.log10(D1), sr=sr, hop_length=128, y_axis="linear", x_axis="time", cmap='jet')
plt.subplot(2,1,2)
librosa.display.specshow(20*np.log10(D2), sr=sr, hop_length=128, y_axis="linear", x_axis="time", cmap='jet')
Out[94]:
In [95]:
# Cosine similarity matrix (slow one-liner)
# Each cell SM[i,j] is the cosine similarity between D1[:,i] and D2[:,j]
# i.e. the inner product of the two unit-normalized feature vectors
SM = np.array([[np.sum(a*b)/np.sqrt(np.sum(a**2)*np.sum(b**2)) for b in D2.T] for a in D1.T])
In [96]:
# Find the minimum-cost path. We use 1-SM so that cosine similarity == 1 -> cost = 0
# penalty is the additional cost incurred by non-diagonal steps (promotes diagonality)
localcost = np.array(1.0-SM, order='C', dtype=float)
p, q, C, phi = dpcore.dp(localcost, penalty=0.1)
# p and q are vectors giving the row and column indices along the best path
# C returns the full minimum-cost matrix, and phi is the full traceback matrix
In [97]:
# Plot the best path on top of local similarity matrix
fig, ax = plt.subplots()
ax.imshow(SM, interpolation='none', cmap='binary')
ax.hold(True)
ax.plot(q,p,'-r')
ax.hold(False)
ax.axis([0, np.size(D2, axis=1), 0, np.size(D1, axis=1)])
Out[97]:
In [98]:
# We can use the best-aligned columns to plot aligned spectrograms
# Notice how the features in the two spectrograms are now lined up horizontally
plt.subplot(2,1,1)
librosa.display.specshow(20*np.log10(D1[:, p]), sr=sr, hop_length=128, y_axis="linear", x_axis="time", cmap='jet')
plt.subplot(2,1,2)
librosa.display.specshow(20*np.log10(D2[:, q]), sr=sr, hop_length=128, y_axis="linear", x_axis="time", cmap='jet')
Out[98]:
In [90]:
# We can compare the timings of the C external implementation
# to the pure Python version of the core DP routine
# with the use_extension argument:
%timeit C, phi = dpcore.dpcore(localcost, 0.1, use_extension=True)
%timeit C, phi = dpcore.dpcore(localcost, 0.1, use_extension=False)
In [101]:
# Just to check, they do give the same result (within numerical precision)
Cext, phiext = dpcore.dpcore(localcost, 0.1, use_extension=True)
Cpyt, phipyt = dpcore.dpcore(localcost, 0.1, use_extension=False)
print np.max(np.abs(Cext-Cpyt))
In [ ]: