Acoustic Unit Discovery

Settings

Beside the basic imports, we use Bokeh for plotting and ipyparallel for parallelization of the training. Bokeh can be replaced with your favorite plotting tool but ipyparallel is a required dependency for AMDTK.



In [1]:

    
# Basic imports
import glob
import os
import numpy as np

from bokeh.plotting import show
from bokeh.io import output_notebook
from bokeh.plotting import figure
from bokeh.layouts import gridplot

from ipyparallel import Client

import amdtk

output_notebook()









    





    
        
        Loading BokehJS ...

We expect the features to be stored in HTK format with the ".fea" extension. Sometimes, it is convenient to load only a portion of the features (to remove the silence at the beginning and at the end of the utterance for instance). This can be achieved by appending "[start_frame: end_frame]" to the path of the file as in this example:

fname = '/path/to/features.fea'
fname_vad = fname + '[10:100]'
data = amdtk.read_htk(data)



In [2]:

    
train_fea = []
train_fea_mask = '/export/b01/londel/data/timit/train/mfcc_d_dd/*fea'
train_fea = [fname for fname in glob.glob(train_fea_mask)]

The next cell is a bash command to start the ipyparallel cluster. Please refer to the ipyparallel documentation on how to setup the cluster for your environment. Also do not forget to shutdown the server once your experiment is done (see the last cell). The sleep 10 command is just to give some time to the server to start. If the next command failed, it might be because the server didn't finish its initialization though 10 seconds should be enough for most of the configurations.



In [3]:

    
%%bash
ipcluster start --profile default -n 4 --daemonize
sleep 10

Connect a client to the ipyparallel cluster.



In [4]:

    
profile = 'default'
rc = Client(profile=profile)
dview = rc[:]
print('Connected to', len(dview), 'jobs.')









    



Connected to 4 jobs.

Estimate the mean and the variance (per dimension) of the database. We need this statistics to perform mean/variance normalization during the training.



In [5]:

    
def collect_data_stats(filename):
    """Job to collect the statistics."""
    # We  re-import this module here because this code will run
    # remotely.
    import amdtk
    data = amdtk.read_htk(filename)
    stats_0 = data.shape[0]
    stats_1 = data.sum(axis=0)
    stats_2 = (data**2).sum(axis=0)
    retval = (
        stats_0,
        stats_1,
        stats_2
    )
    return retval

data_stats = dview.map_sync(collect_data_stats, train_fea)

# Accumulate the statistics over all the utterances.
n_frames = data_stats[0][0]
mean = data_stats[0][1]
var = data_stats[0][2]
for stats_0, stats_1, stats_2 in data_stats[1:]:
    n_frames += stats_0
    mean += stats_1
    var += stats_2
mean /= n_frames
var = (var / n_frames) - mean**2

data_stats = {
    'count': n_frames,
    'mean': mean,
    'var': var
}

Training

Now everything is ready for the training. First we need to create the phone-loop model. Currently, AMDTK also support Bayesian GMM though this model is usually less accurate.



In [12]:

    
model = amdtk.PhoneLoop.create(
    50,  # number of acoustic units
    3,   # number of state per units
    4,   # number of Gaussian per emission
    np.zeros_like(data_stats['mean']), 
    np.ones_like(data_stats['var'])
)

#model = amdtk.Mixture.create(
#    200, # Number of Gaussian in the mixture.
#    np.zeros_like(data_stats['mean']), 
#    np.ones_like(data_stats['var'])
#)

For the phone-loop and the GMM model optimization is done with the natural gradient descent.



In [13]:

    
elbo = []
time = []
def callback(args):
    elbo.append(args['objective'])
    time.append(args['time'])
    print('elbo=' + str(elbo[-1]), 'time=' + str(time[-1]))
    
optimizer = amdtk.StochasticVBOptimizer(
    dview, 
    data_stats, 
    {'epochs': 2,
     'batch_size': 400,
     'lrate': 0.01},
    model
)
optimizer.run(train_fea, callback)

fig1 = figure(
    x_axis_label='time (s)', 
    y_axis_label='ELBO',
    width=400, 
    height=400
)
x = np.arange(0, len(elbo), 1)
fig1.line(x, elbo)

show(fig1)









    



importing numpy on engine(s)
importing read_htk from amdtk on engine(s)
elbo=-56.5566328365 time=8.593787908554077
elbo=-14.1031220954 time=20.582109212875366
elbo=-12.5753157747 time=30.459877490997314
elbo=-11.5505482394 time=39.285534620285034
elbo=-11.0342283998 time=48.01591348648071
elbo=-10.8456383645 time=56.67512345314026
elbo=-10.4921731025 time=66.60939908027649
elbo=-10.2476414589 time=78.52076363563538
elbo=-10.129937982 time=89.22719550132751
elbo=-9.86354920331 time=98.42179369926453
elbo=-9.89226565155 time=107.38587403297424
elbo=-9.90614775256 time=116.20546579360962
elbo=-9.81106819924 time=124.36762809753418
elbo=-9.66167876416 time=133.76361846923828
elbo=-9.47948858995 time=142.89569425582886
elbo=-9.83214248345 time=152.61892127990723
elbo=-9.61092792308 time=162.17830276489258
elbo=-9.51989204334 time=171.72391533851624
elbo=-9.67699826227 time=181.61188173294067
elbo=-9.34195037291 time=190.10213232040405
elbo=-9.71515063454 time=198.66300630569458
elbo=-9.49702321922 time=207.46203780174255
elbo=-9.30862369391 time=218.7180643081665
elbo=-9.62885212511 time=227.0126006603241

Decoding

Once the model trained, the most likely sequence of units can be found as follow:



In [20]:

    
data = amdtk.read_htk(train_fea[6])
print(model.decode(data))









    



[21, 8, 21, 14, 35, 8, 17, 10, 44, 35, 21, 24, 21, 6, 35]

It is also possible to output the most likely sequence of state of the phone-loop:



In [21]:

    
print(model.decode(data, state_path=True))









    



[60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 61, 62, 21, 22, 22, 22, 22, 22, 23, 60, 60, 60, 60, 60, 61, 62, 39, 39, 39, 39, 39, 39, 39, 39, 40, 41, 41, 41, 41, 41, 41, 41, 102, 102, 102, 102, 102, 102, 102, 102, 102, 103, 104, 21, 22, 22, 22, 22, 22, 22, 22, 23, 48, 49, 50, 50, 50, 50, 50, 50, 50, 50, 50, 27, 28, 29, 29, 129, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 130, 131, 102, 103, 104, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 61, 62, 69, 69, 69, 69, 70, 71, 60, 60, 60, 60, 60, 60, 61, 62, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 17, 102, 103, 104, 104, 104, 104, 104, 104, 104, 104]

Everything is finished we shutdown the ipyparallel cluster.



In [ ]:

    
%%bash
ipcluster stop --profile default