Beside the basic imports, we use Bokeh for plotting and ipyparallel for parallelization of the training. Bokeh can be replaced with your favorite plotting tool but ipyparallel is a required dependency for AMDTK.
In [1]:
# Basic imports
import glob
import os
import numpy as np
from bokeh.plotting import show
from bokeh.io import output_notebook
from bokeh.plotting import figure
from bokeh.layouts import gridplot
from ipyparallel import Client
import amdtk
output_notebook()
We expect the features to be stored in HTK format with the ".fea" extension. Sometimes, it is convenient to load only a portion of the features (to remove the silence at the beginning and at the end of the utterance for instance). This can be achieved by appending "[start_frame: end_frame]" to the path of the file as in this example:
fname = '/path/to/features.fea'
fname_vad = fname + '[10:100]'
data = amdtk.read_htk(data)
In [2]:
train_fea = []
train_fea_mask = '/export/b01/londel/data/timit/train/mfcc_d_dd/*fea'
train_fea = [fname for fname in glob.glob(train_fea_mask)]
The next cell is a bash command to start the ipyparallel cluster. Please refer to the ipyparallel documentation on how to setup the cluster for your environment. Also do not forget to shutdown the server once your experiment is done (see the last cell). The sleep 10
command is just to give some time to the server to start. If the next command failed, it might be because the server didn't finish its initialization though 10 seconds should be enough for most of the configurations.
In [3]:
%%bash
ipcluster start --profile default -n 4 --daemonize
sleep 10
Connect a client to the ipyparallel cluster.
In [4]:
profile = 'default'
rc = Client(profile=profile)
dview = rc[:]
print('Connected to', len(dview), 'jobs.')
Estimate the mean and the variance (per dimension) of the database. We need this statistics to perform mean/variance normalization during the training.
In [5]:
def collect_data_stats(filename):
"""Job to collect the statistics."""
# We re-import this module here because this code will run
# remotely.
import amdtk
data = amdtk.read_htk(filename)
stats_0 = data.shape[0]
stats_1 = data.sum(axis=0)
stats_2 = (data**2).sum(axis=0)
retval = (
stats_0,
stats_1,
stats_2
)
return retval
data_stats = dview.map_sync(collect_data_stats, train_fea)
# Accumulate the statistics over all the utterances.
n_frames = data_stats[0][0]
mean = data_stats[0][1]
var = data_stats[0][2]
for stats_0, stats_1, stats_2 in data_stats[1:]:
n_frames += stats_0
mean += stats_1
var += stats_2
mean /= n_frames
var = (var / n_frames) - mean**2
data_stats = {
'count': n_frames,
'mean': mean,
'var': var
}
In [12]:
model = amdtk.PhoneLoop.create(
50, # number of acoustic units
3, # number of state per units
4, # number of Gaussian per emission
np.zeros_like(data_stats['mean']),
np.ones_like(data_stats['var'])
)
#model = amdtk.Mixture.create(
# 200, # Number of Gaussian in the mixture.
# np.zeros_like(data_stats['mean']),
# np.ones_like(data_stats['var'])
#)
For the phone-loop and the GMM model optimization is done with the natural gradient descent.
In [13]:
elbo = []
time = []
def callback(args):
elbo.append(args['objective'])
time.append(args['time'])
print('elbo=' + str(elbo[-1]), 'time=' + str(time[-1]))
optimizer = amdtk.StochasticVBOptimizer(
dview,
data_stats,
{'epochs': 2,
'batch_size': 400,
'lrate': 0.01},
model
)
optimizer.run(train_fea, callback)
fig1 = figure(
x_axis_label='time (s)',
y_axis_label='ELBO',
width=400,
height=400
)
x = np.arange(0, len(elbo), 1)
fig1.line(x, elbo)
show(fig1)
In [20]:
data = amdtk.read_htk(train_fea[6])
print(model.decode(data))
It is also possible to output the most likely sequence of state of the phone-loop:
In [21]:
print(model.decode(data, state_path=True))
Everything is finished we shutdown the ipyparallel cluster.
In [ ]:
%%bash
ipcluster stop --profile default