This notebook is a tutorial which tries to explain the mechanism of designing ANNs using the nnet3 library included in the Kaldi project.
Kaldi is a very elaborate speech recognition toolkit with a long history and plenty of tools and models apart from ANNs.
nnet3 is a part of the project that deals with the latest implementation of various DNN architectures. The main difference being the easy configurability of the C++-based library, without having to do any actual C++ programming.
As with most Kaldi setups, the usage is based around many small binaries that perform simple, atomic operations. These can be run using shell scripts or from other programming languages with the help of "exec" or "popen" syscalls.
As a convinience, a python library is included with the project that has a few methods to make life easier. As of writing this, everything there is still very much work in progress, so there is no guarantees this notebook will work out-of-the-box in a few months time...
Formal documentation to the toolkit and it's ANN portion can be found here: http://kaldi-asr.org/doc/dnn3.html
If you don't have it, you need to download Kaldi. Detailed explanation can be found on http://kaldi-ast.org, but the TL;DR is as follow:
git clone https://github.com/kaldi-asr/kaldi
kaldi/tools
and simply do make
in there - if the script mentions you need to install anything, simply install the missing tools/librarieskaldi/src
and do ./configure
in there - it would be good to have CUDA installed and recognized in this step, as you can benefit from GPU acceleration a lotmake
and wait - if you have a very new version of gcc (>6) it may be a good idea to install something like 4.9 and simply modify the kaldi.mk file One thing about Kaldi is that it is completely contained in a single directory, which makes it easy to move and locate. That is why most scripts utilized with Kaldi rely on the following path. You need to set this to wherever you stored Kaldi on your computer:
In [1]:
KALDI_ROOT='~/apps/kaldi'
Following that, we create a small work directory as a subdirectory of this notebook and add to it some files from the Kaldi project:
The path.sh file is also modified to include the actual path to Kaldi.
In [2]:
import os
from shutil import copyfile
import fileinput
import stat
KALDI_ROOT=os.path.expanduser(KALDI_ROOT)
KALDI_ROOT=os.path.abspath(KALDI_ROOT)
if not os.getcwd().endswith('/work'):
if not os.path.exists('work'):
os.mkdir('work')
os.chdir('work')
if not os.path.exists('steps'):
os.symlink(KALDI_ROOT+'/egs/wsj/s5/steps','steps')
if not os.path.exists('path.sh'):
copyfile(KALDI_ROOT+'/egs/wsj/s5/path.sh','path.sh')
for line in fileinput.input('path.sh',inplace=True):
if line.startswith('export KALDI_ROOT='):
print 'export KALDI_ROOT='+KALDI_ROOT
else:
print line[:-1]
os.chmod('path.sh',0755)
The python library is located in the steps/nnet3 subfolder:
In [3]:
import sys
sys.path.append('steps/nnet3')
import nnet3_train_lib as ntl
Once we load the library, we have access to all sorts of methods, but RunKaldiCommand should be sufficient for now.
The next problem is that while path.sh script is useful if you run programs locally from your terminal, it doesn't work in the Notebook environemt, since the notebook process is already running and we can't modify its environment by running the script in any way. This simple hack prints out the whole path string after running the script and simply modifies the environemnt manually. Make sure you don't run this cell more than once:
In [4]:
path = ntl.RunKaldiCommand('source ./path.sh ; printenv | grep ^PATH=')[0]
print 'Setting '+path
os.environ['PATH']=path.split('=')[1]
Linear regression is probably the simplest problem you can implment using an ANN library. The problem is stated like this. Say we define a function using a linear equation, for example:
\begin{equation} y = 0.3 x_1 + 0.1 x_2 + 0.2 \end{equation}Now let's ask our friend to guess the formula of the function, but all we can tell him is what the output is given any input. To him, it's essentially a black-box.
To make his life easier, we tell him that the function is linear and that it takes two inputs and gives one output. He may decide to describe his problem thusly:
\begin{equation} y = w_1 \cdot x_1 + w_2 \cdot x_2 + b \end{equation}For any $x_1$ and $x_2$ given to the formula above, with the $w_1$, $w_2$, and $b$ such that the $y$ computed using that function is as close to the one in the unknown, black-box function.
To help him solve the proble, we will provide our buddy with a 100 examples of random input-output pairs. First let's define our function:
In [5]:
def problem(x):
return x[0]*0.3+x[1]*0.1+0.2
Now let generate 100 inputs and outputs. You can modify these two cells to see if the rest of the notebook will adjust accordingly.
In [6]:
import numpy as np
np.random.seed(1234)
input_dim=2
data_num=100
inputs=np.random.random((data_num,input_dim))
outputs=np.array([problem(x) for x in inputs])
if outputs.ndim==1:
outputs=outputs.reshape(outputs.shape[0],1)
In [7]:
def write_simple_egs(filename,inputs,outputs):
input_dim=inputs.shape[1]
output_dim=outputs.shape[1]
with open(filename,'w') as f:
for i,l in enumerate(inputs):
f.write('data-{} '.format(i))
f.write('<Nnet3Eg> ')
f.write('<NumIo> {} '.format(input_dim))
f.write('<NnetIo> input ')
f.write('<I1V> 1 <I1> 0 0 0 ')
f.write('[\n ')
for d in l:
f.write('{} '.format(d))
f.write(']\n')
f.write('</NnetIo> ')
f.write('<NnetIo> output ')
f.write('<I1V> 1 <I1> 0 0 0 ')
f.write('[\n ')
for d in outputs[i]:
f.write('{} '.format(d))
f.write(']\n')
f.write('</NnetIo> ')
f.write('</Nnet3Eg> ')
write_simple_egs('nnet.egs',inputs,outputs)
We can check out a few example lines:
In [8]:
!head -n 10 nnet.egs
It may look complicated, but it's not that bad:
<Nnet3Eg>
xml-linke block - this example creates one value per input sequence, but normally you would create many values per sequence, where one sequence corresponds to one file or utterance<Nnet3Eg>
is followed by a sequence of observation (in this case there is only one per file) which is opened by the <NumIo>
block informing us what is the number of values stored in this observation<NnetIO>
blocks with the actual data - in this case we have two such blocks per observation: input and output<NnetIo>
block is the name of the data point<I1V>
is the number of index values and <I1>
is the actual index values - you can read more about indexes here: http://kaldi-asr.org/doc/dnn3_code_data_types.html#dnn3_dt_datastruct_indexSM
token followed by row=
and dim=
- it won't be described in this tutorialThe whole nnet3 system is based around the confguration files, which allow creating almost any topology (feed-forward or recurrent) in a graphical manner. This file consists of two sets of information:
In our simple example we create a single component describing the dense weights of the model in an affine operation (in other words weighted sum with bias). It has two dimension parameters (input and output) and you can also set the learning rate here. Note that the name of the component is later used in the graph description.
Other types of components include different type of fixed (ie constant) and dynamic weight containers and various activation functions. You can also define your own components, if you need to.
The second portion of the file include the connections between the individual components. These connections are represented by graph nodes. Two nodes are defined for any model: input-node and output-node. They are linked to input and output values in our EGS file above (note that the names of these nodes match the names of the matrices in the EGS file).
The input-node also needs to define the dimension of the input data (for graph construction purposes). Also note that you can have many input nodes per model.
All other nodes apart from the input node have an input parameter denoting what they are connected to. Component nodes also have a component attribute linking it to the component from the list above the nodes. The output node can also define an objective function - currently only quadratic
is available (for MSE) and linear
(for cross-entropy, but computed on likelihoods).
In [9]:
%%writefile nnet.config
# First the components
component name=wts type=AffineComponent input-dim=2 output-dim=1 learning-rate=0.6
# Next the nodes
input-node name=input dim=2
component-node name=wts_node component=wts input=input
output-node name=output input=wts_node objective=quadratic
In [10]:
print ntl.RunKaldiCommand('nnet3-init {} {}'.format('nnet.config','nnet.init'))[1]
We can see the description of the network with the nnet3-info
program:
In [11]:
print ntl.RunKaldiCommand('nnet3-info {}'.format('nnet.init'))[0]
Or even draw it to a DOT file (this looks far more impressive for bigger models):
In [12]:
from IPython.display import SVG, Image, display
from subprocess import check_call
ntl.RunKaldiCommand('nnet3-info {} | python steps/nnet3/dot/nnet3_to_dot.py {}'.format('nnet.init','nnet.dot'))
#my installation of dot doesn't support PNG, so I have to resort to SVG
check_call(['dot','-Tsvg','nnet.dot','-o','nnet.svg'])
#SVG can't be scaled in notebook, but I can use imagemagick to convert to PNG
check_call(['convert','nnet.svg','nnet.png'])
display(Image('nnet.png'))
The networks are stored in a binary (comressed) format, but we can see their contents (eg. weights) using the nnet3-copy
program:
In [13]:
print ntl.RunKaldiCommand('nnet3-copy --binary=false {} {}'.format('nnet.init','-'))[0]
In [14]:
print ntl.RunKaldiCommand('nnet3-train {} ark,t:{} {}'.format('nnet.init','nnet.egs','nnet.out'))[1]
Obviously, to get a better result, it's a good idea to run this several times for a couple of epochs. With such a simple problem and high learning rate, only a few steps is enough to reach equilibrium:
In [15]:
print ntl.RunKaldiCommand('nnet3-copy --binary=false {} {}'.format('nnet.out','-'))[0]
for i in range(3):
ntl.RunKaldiCommand('nnet3-train {} ark,t:{} {}'.format('nnet.out','nnet.egs','nnet.out'))
print ntl.RunKaldiCommand('nnet3-copy --binary=false {} {}'.format('nnet.out','-'))[0]
In [16]:
test_num=10
test=np.random.random((test_num,input_dim))
print test
These are the values we should get:
In [17]:
for x in test:
print problem(x)
Let's store the data in a matrix file, for Kaldi to be able to process:
In [18]:
with open('test.mat','w') as f:
f.write('test [')
for row in test:
f.write('\n ')
f.write(' '.join([`num` for num in row]))
f.write(' ]\n')
%cat test.mat
Now let's use the trained net to compute the results:
In [19]:
print ntl.RunKaldiCommand('nnet3-compute {} ark,t:{} ark,t:{}'.format('nnet.out','test.mat','-'))[0]
In [ ]: