Speech recognition

Simple sppeech recognition system can be implemented using DTW + MFCC.

We will use a simple database composed of 12 french words pronounced about 25 times by different speakers.



In [1]:

    
%pylab inline









    



Populating the interactive namespace from numpy and matplotlib



In [2]:

    
with open('sounds/wavToTag.txt') as f:
    labels = array([l.replace('\n', '') for l in f.readlines()])
    
print set(labels)









    



set(['chaussure', 'manette', 'stade', 'jeuvideo', 'beckham', 'zidane', 'ballon', 'gants', 'sofoot', 'girondins', 'cocacola', 'biere'])

Precompute all MFCCs



In [3]:

    
import librosa

mfccs = {}

for i in range(len(labels)):
    y, sr = librosa.load('sounds/{}.wav'.format(i))
    mfcc = librosa.feature.mfcc(y, sr, n_mfcc=13)
    mfccs[i] = mfcc.T

Leave P Out Cross Validation



In [4]:

    
def generate_train_test_set(P):
    train = []
    test = []

    for s in set(labels):
        all = find(labels == s)
        shuffle(all)
        train += all[:-P].tolist()
        test += all[-P:].tolist()
        
    return train, test



In [5]:

    
from dtw import dtw

# We use DP to speed up multiple tests
D = ones((len(labels), len(labels))) * -1

def cross_validation(train, test):
    score = 0.0

    for i in test:
        x = mfccs[i]

        dmin, jmin = inf, -1
        for j in train:
            y = mfccs[j]
            
            d = D[i, j]
            if d == -1:
                d, _, _, _ = dtw(x, y, dist=lambda x, y: norm(x - y, ord=1))
                D[i, j] = d                

            if d < dmin:
                dmin = d
                jmin = j

        score += 1.0 if (labels[i] == labels[jmin]) else 0.0
        
    return score / len(test)



In [6]:

    
train, test = generate_train_test_set(P=1)
rec_rate = cross_validation(train, test)
print 'Recognition rate {}%'.format(100. * rec_rate)









    



Recognition rate 83.3333333333%

The next plot may take a while to compute!



In [12]:

    
P = arange(1, 10)
N = 5

rec = []

for p in P:
    r = [cross_validation(*generate_train_test_set(p)) for _ in range(N)]
    rec.append(r)
    
rec = array(rec)
rec = rec.reshape((N, -1))

errorbar(P - 0.5, mean(rec, axis=0), yerr=std(rec, axis=0))
xticks(P - 0.5, P)
ylim(0, 1)









    Out[12]:





(0, 1)



In [ ]: