Theano implementation of HMM algorithms

This notebook implements some of the HMM algorithms using Theano. These may be helpful for incorporating them in the graphs of some other models in Theano. These are implemented in the same manner as in the other notebook about HMMs.

Discrete HMM

We start with a toy HMM example from Rabiner's paper.

Many improvements can be added to this (and may be added in the future):

  • everything should be moved to the log-domain - for computation stability. This is especially important for GPUs which usually have a lower floating-point precision. This example here works only for the smallest examples (a few observations).

  • continous density models - this discrete example cannot be applied to most practical examples.

  • custom topologies - this is a simple ergodic example. In practice, we'd want to be able to design a specific transition graph for a particular use.

  • combining models - real applications rely on being able to combine several models to be able to train them for specific problems. E.g. combining many tri-phone models for speech recognition. This should also include the ability to do state-tying.

  • better training algorithms - only Baum-Welch and the simplest gradient descent are demonstrated here. There are other methods that oculd work better/faster.

The example below will demonstrate the code using a similar notation and equations from Rabiner's paper. First we import the required stuff:


In [1]:
import numpy as np
import theano
import theano.tensor as T
from theano.ifelse import ifelse
import pickle
from collections import OrderedDict


Using gpu device 0: GeForce GTX TITAN (CNMeM is enabled)

HMM class code

Here we create the HMM class. Everything is compiled in the constructor. We also provide methods for all the individual algorithms:


In [2]:
class DiscreteHMM:
    
    def __init__(self, N=3, M=4):        
        
        updates={}
                
        pi = theano.shared((np.ones(N)/N).astype(theano.config.floatX))
        a = theano.shared((np.ones((N,N))/(N*np.ones(N))).astype(theano.config.floatX))
        b = theano.shared((np.ones((N,M))/(N*np.ones(M))).astype(theano.config.floatX))
        N = theano.shared(N)
        M = theano.shared(M)
        
        self.pi=pi
        self.a=a
        self.b=b
        self.N=N
        self.M=M
        
        O = T.ivector()
        TT = O.shape[0]
        
        
        #forward algorithm:
        
        alpha0=pi*b[:,O[0]]
        
        alpha_scan,upd = theano.scan(fn=lambda O,alpha_p: T.dot(alpha_p,a)*b[:,O],
                               sequences=O[1:],
                               outputs_info=alpha0)
        
        updates.update(upd)
        
        alpha=T.concatenate((alpha0.reshape((1,N)),alpha_scan))                
        
        #backward algorithm:
        
        beta0=T.ones(N).astype(theano.config.floatX)
        
        beta_scan,upd = theano.scan(fn=lambda O,beta_p: T.dot(beta_p*b[:,O],a.T),
                               sequences=O[1:],
                               outputs_info=beta0,
                               go_backwards=True)
        updates.update(upd)
        
        beta=T.concatenate((beta_scan[::-1],beta0.reshape((1,N))))        
        
        #full model probability:
        
        full_prob = alpha_scan[-1].sum()        
        
        #forward-backward probabilities:
        
        gamma=alpha*beta/full_prob        
        
        #viterbi algorithm:
        
        def viterbi_rec_step(O, delta_p, phi_p):
            m=delta_p*a.T
            phi=m.argmax(axis=1)
            delta=m[T.arange(N),phi]*b[:,O]
            return delta,phi
        
        phi0=T.zeros(N).astype('int64')

        [delta_scan, phi_scan], upd = theano.scan(fn=viterbi_rec_step,
                                                  sequences=O[1:],
                                                  outputs_info=[alpha0,phi0])        
        
        updates.update(upd)
        
        QT=phi_scan[-1].argmax()        
        vite_prob = delta_scan[-1,QT]
        
        Q_scan, upd = theano.scan(fn=lambda phi, Q: phi[Q],
                             sequences=phi_scan,
                             outputs_info=QT,
                             go_backwards=True)
        
        updates.update(upd)
                                                  
        Q=T.concatenate((Q_scan[::-1],QT.reshape((1,))))
        
        #transition probabilities
        
        xi=alpha[:-1].reshape((TT-1,N,1))*a.reshape((1,N,N))*b[:,O[1:]].T.reshape((TT-1,1,N))*beta[1:].reshape((TT-1,1,N))/full_prob
        
        #expected values
        
        exp_pi=gamma[0]
        
        exp_a=xi.sum(axis=0)/gamma[:-1].sum(axis=0).reshape((N,1))
        
        exp_b_map, upd = theano.map(fn=lambda k: T.sum(gamma[T.eq(O,k).nonzero()],axis=0)/T.sum(gamma,axis=0), 
                         sequences=T.arange(M))
        
        updates.update(upd)
        
        exp_b = exp_b_map.T
        
        exp_err = T.concatenate(((pi-exp_pi).ravel(),(a-exp_a).ravel(),(b-exp_b).ravel()))
        
        exp_mean_err = T.mean(exp_err**2)
        
        #Baum-Welch updates:
        
        baum_welch_updates=OrderedDict()
        exp_updates={pi:exp_pi,a:exp_a,b:exp_b}
        baum_welch_updates.update(updates)
        baum_welch_updates.update(exp_updates)
        
        #Gradient descent:
        
        pi_grad=T.grad(cost=full_prob,wrt=pi)
        a_grad=T.grad(cost=full_prob,wrt=a)
        b_grad=T.grad(cost=full_prob,wrt=b)
                
        lr=T.scalar()
        
        pi_upd=pi*(pi_grad**lr)
        norm_pi_upd=pi_upd/pi_upd.sum()
        
        a_upd=a*(a_grad**lr)
        norm_a_upd=(a_upd.T/a_upd.sum(axis=1)).T
        
        b_upd=b*(b_grad**lr)
        norm_b_upd=b_upd/b_upd.sum(axis=0)
        
        gd_updates=OrderedDict()
        grad_updates={pi:norm_pi_upd,
                      a:norm_a_upd,
                      b:norm_b_upd}
        gd_updates.update(updates)
        gd_updates.update(grad_updates)            
        
        #function definitions
        
        self.forward_fun = theano.function(inputs=[O], outputs=alpha, updates=updates)
        
        self.backward_fun = theano.function(inputs=[O], outputs=beta, updates=updates)
        
        self.full_prob_fun = theano.function(inputs=[O], outputs=full_prob, updates=updates)
        
        self.gamma_fun = theano.function(inputs=[O], outputs=gamma, updates=updates)
        
        self.viterbi_fun = theano.function(inputs=[O], outputs=[Q,vite_prob], updates=updates)
    
        self.xi_fun = theano.function(inputs=[O], outputs=xi, updates=updates)            
        
        self.exp_fun = theano.function(inputs=[O], outputs=[exp_pi,exp_a,exp_b], updates=updates)
        
        self.baum_welch_fun = theano.function(inputs=[O], outputs=[full_prob,exp_mean_err], updates=baum_welch_updates)
        
        self.gd_fun = theano.function(inputs=[O,lr], outputs=full_prob, updates=gd_updates)        
    
    def setModel(self,pi,a,b,N,M):
        
        self.pi.set_value(pi.astype(theano.config.floatX))
        self.a.set_value(a.astype(theano.config.floatX))
        self.b.set_value(b.astype(theano.config.floatX))
        self.N.set_value(N)
        self.M.set_value(M)
        
    def getModel(self):
        
        return self.pi.get_value(),self.a.get_value(),self.b.get_value(),self.N.get_value(),self.M.get_value()
    
    def forward(self, O):
        
        return self.forward_fun(O.astype('int32')) 
    
    
    def backward(self, O):
        
        return self.backward_fun(O.astype('int32'))        
    
    def full_prob(self, O):
        
        return self.full_prob_fun(O.astype('int32'))
    
    def gamma(self, O):
        
        return self.gamma_fun(O.astype('int32'))
    
    def viterbi(self, O):
        
        return self.viterbi_fun(O.astype('int32'))
    
    def xi(self, O):
        
        return self.xi_fun(O.astype('int32'))
    
    def exp_values(self, O):
        
        return self.exp_fun(O.astype('int32'))
    
    def baum_welch(self,O):
        
        return self.baum_welch_fun(O.astype('int32'))
    
    def gradient_descent(self,O,lr=0.01):
        
        return self.gd_fun(O.astype('int32'),lr)

Model creation

We can either use the default (all equally probable) or some other random values to begin with. Here we will read the model parameters from a file created in my other notebook. It will allow us to make sure all the calculations match the ones there:


In [3]:
with open('../data/hmm.pkl') as f:
    O,pi,a,b,N,M,Time=pickle.load(f)
    
print 'Number of states: {}'.format(N)
print 'Number of observation classes: {}'.format(M)
print 'Number of time steps: {}'.format(Time) #T is taken by theano.tensor
print 'Observation sequence: {}'.format(O)
print 'Priors: {}'.format(pi)
print 'Transition matrix:\n{}'.format(a)
print 'Observation probability matrix:\n{}'.format(b)


Number of states: 3
Number of observation classes: 4
Number of time steps: 5
Observation sequence: [0 1 2 1 0]
Priors: [ 0.13060479  0.574861    0.29453421]
Transition matrix:
[[ 0.44056368  0.08175695  0.47767937]
 [ 0.4364632   0.40292622  0.16061058]
 [ 0.00281623  0.88340507  0.1137787 ]]
Observation probability matrix:
[[ 0.53375578  0.10345127  0.33152859  0.66017725]
 [ 0.26331038  0.65987904  0.40656356  0.11079022]
 [ 0.20293384  0.23666969  0.26190785  0.22903253]]

Here we will contruct the HMM object. The constructor needs to compile everything and since we have a few functions, it may take a little while:


In [4]:
%time hmm=DiscreteHMM()

#we can also set the model parameters
hmm.setModel(pi,a,b,N,M)


CPU times: user 20.9 s, sys: 272 ms, total: 21.2 s
Wall time: 21.2 s

Algorithms

Let's test the methots now. You can compare the values with the ones from my other notebook:


In [5]:
print 'Forward probabilities:\n{}'.format(hmm.forward(O))
print 'Backward probabilities:\n{}'.format(hmm.backward(O))
print 'Full model probability: {}'.format(hmm.full_prob(O))
print 'Complete state probability:\n{}'.format(hmm.gamma(O))
seq,vite_prob=hmm.viterbi(O)
print 'Viterbi sequence: {} its probability {}'.format(seq,vite_prob)
print 'State transition probability:\n{}'.format(hmm.xi(O))


Forward probabilities:
[[ 0.06971106  0.15136686  0.05977096]
 [ 0.01002924  0.07884961  0.01524421]
 [ 0.01288864  0.01872524  0.00502583]
 [ 0.00143438  0.00860381  0.0023042 ]
 [ 0.00234515  0.00147968  0.00047267]]
Backward probabilities:
[[ 0.00983429  0.01427416  0.02428114]
 [ 0.04165408  0.03925478  0.05146331]
 [ 0.06524467  0.12455948  0.22368039]
 [ 0.35361814  0.37165272  0.25720245]
 [ 1.          1.          1.        ]]
Full model probability: 0.0042975009419
Complete state probability:
[[ 0.15952496  0.50276548  0.33770952]
 [ 0.09720974  0.72023821  0.18255197]
 [ 0.19567539  0.5427354   0.2615892 ]
 [ 0.11802761  0.74406749  0.1379049 ]
 [ 0.54570061  0.34431106  0.10998834]]
Viterbi sequence: [1 1 1 1 0] its probability 0.000408370891819
State transition probability:
[[[  3.07955407e-02   3.43532078e-02   9.43762064e-02]
  [  6.62454143e-02   3.67618442e-01   6.89015985e-02]
  [  1.68785889e-04   3.18266600e-01   1.92741621e-02]]

 [[  2.22395975e-02   9.66233574e-03   6.53078109e-02]
  [  1.73219696e-01   3.74381483e-01   1.72637105e-01]
  [  2.16084343e-04   1.58691600e-01   2.36442853e-02]]

 [[  4.83359434e-02   6.01336285e-02   8.72057900e-02]
  [  6.95711821e-02   4.30564851e-01   4.25993763e-02]
  [  1.20484372e-04   2.53368974e-01   8.09973013e-03]]

 [[  7.84874782e-02   7.18524726e-03   3.23548988e-02]
  [  4.66407150e-01   2.12406844e-01   6.52534664e-02]
  [  8.05964053e-04   1.24718964e-01   1.23799816e-02]]]

Expected values


In [6]:
exp_pi,exp_a,exp_b=hmm.exp_values(O)
print 'Expected priors: {}'.format(exp_pi)
print 'Expected transitions:\n{}'.format(exp_a)
print 'Expected observations:\n{}'.format(exp_b)


Expected priors: [ 0.15952496  0.50276548  0.33770952]
Expected transitions:
[[ 0.31529921  0.19517367  0.48952708]
 [ 0.30896547  0.55182409  0.13921055]
 [ 0.00142573  0.929645    0.06892936]]
Expected observations:
[[ 0.63184422  0.1928411   0.17531464  0.        ]
 [ 0.29679105  0.5130502   0.19015874  0.        ]
 [ 0.4347662   0.31120053  0.25403327  0.        ]]

Baum-Welch

We will run 15 iterations of the Baum-Welch EM reestimation here. We will also output the model probability (which should increase with each iteration) and also the mean difference between the model parameters and their expected values (which will decrease to 0 as the model converges on the optimum).


In [7]:
hmm.setModel(pi,a,b,N,M)
for i in range(15):    
    prob,exp_err=hmm.baum_welch(O)
    print 'Iteration #{} P={} delta_exp={}'.format(i+1,prob,exp_err)


Iteration #1 P=0.0042975009419 delta_exp=0.0312950722873
Iteration #2 P=0.00509988935664 delta_exp=0.00250484491698
Iteration #3 P=0.00619259942323 delta_exp=0.00354749825783
Iteration #4 P=0.00862175133079 delta_exp=0.00476954458281
Iteration #5 P=0.0136389173567 delta_exp=0.00249158055522
Iteration #6 P=0.0189215112478 delta_exp=0.00156689342111
Iteration #7 P=0.0234132837504 delta_exp=0.00485259993002
Iteration #8 P=0.0374571271241 delta_exp=0.00896292272955
Iteration #9 P=0.109155744314 delta_exp=0.00389885343611
Iteration #10 P=0.218594044447 delta_exp=0.000297478662105
Iteration #11 P=0.238835394382 delta_exp=7.89283149061e-05
Iteration #12 P=0.244890481234 delta_exp=1.83135434781e-05
Iteration #13 P=0.247751742601 delta_exp=3.72286672246e-06
Iteration #14 P=0.24902997911 delta_exp=7.08700042651e-07
Iteration #15 P=0.249585151672 delta_exp=1.30926537167e-07

Gradient Descent

Since this is Theano, we can easily implement GD using the built-in grad method. The parameters are updated by multiplying them with their gradients. The updated values have to also be renormalized to keep the stochasticity of the parameters.


In [8]:
hmm.setModel(pi,a,b,N,M)

for i in range(20):
    prob=hmm.gradient_descent(O,0.2)
    print 'Iteration #{} P={}'.format(i+1,prob)

print hmm.full_prob(O)

pi_n,a_n,b_n,N_n,M_n=hmm.getModel()

np.set_printoptions(suppress=True)
print 'PI: {}'.format(pi_n)
print 'A:\n{}'.format(a_n)
print 'B:\n{}'.format(b_n)
np.set_printoptions(suppress=False)


Iteration #1 P=0.0042975009419
Iteration #2 P=0.0051160780713
Iteration #3 P=0.00623863283545
Iteration #4 P=0.0079468395561
Iteration #5 P=0.0107951052487
Iteration #6 P=0.0159269031137
Iteration #7 P=0.0257520414889
Iteration #8 P=0.0452238433063
Iteration #9 P=0.0835462510586
Iteration #10 P=0.154380306602
Iteration #11 P=0.270117342472
Iteration #12 P=0.428860872984
Iteration #13 P=0.605754315853
Iteration #14 P=0.764089465141
Iteration #15 P=0.87856990099
Iteration #16 P=0.946424365044
Iteration #17 P=0.979877769947
Iteration #18 P=0.993670403957
Iteration #19 P=0.998380482197
Iteration #20 P=0.999676942825
0.999952375889
PI: [ 0.00000006  0.99997818  0.00002175]
A:
[[ 0.          0.99999964  0.00000029]
 [ 0.00000001  0.99999952  0.00000041]
 [ 0.          1.          0.        ]]
B:
[[ 0.00000005  0.          0.00000001         nan]
 [ 0.99998808  1.          0.99999994         nan]
 [ 0.00001184  0.00000004  0.00000008         nan]]

This method quickly converges to the optimum, although in this example the optimum is not a very useful model because it stays mostly in one state all the time. Having several different sequences would probably serve as a better test for this method...

Log model

This is the same class above, but moved into the log domain. All the paramaters and calculations are done in the log domain.

Some log arithmetic hints can be found here.

In log domain, things like multiplication and division are trivial, but simple addition, subtraction and sum become a nuisance. That is why they need to be reimplemented by pulling the values back into the normal linear domain and then taking them back after the operation is completed. So add becomes LogAddExp and sum becomes LogSumExp and so on...


In [9]:
from pylearn2.expr.basic import log_sum_exp


def LogDot(a,b):
    return log_sum_exp(a + b.T, axis=1)

def LogSum(a,axis=None):
    return log_sum_exp(a,axis)

def LogAdd(a,b):
    return T.log(T.exp(a)+T.exp(b))

def LogSub(a,b):
    return T.log(T.exp(a)-T.exp(b))

Here is the actual class in the LogDomain:


In [10]:
class LogDiscreteHMM:
    
    def __init__(self, N=3, M=4):        
        
        updates={}
                
        pi = theano.shared((np.zeros(N)/N).astype(theano.config.floatX))
        a = theano.shared((np.zeros((N,N))/(N*np.ones(N))).astype(theano.config.floatX))
        b = theano.shared((np.zeros((N,M))/(N*np.ones(M))).astype(theano.config.floatX))
        N = theano.shared(N)
        M = theano.shared(M)
        
        self.pi=pi
        self.a=a
        self.b=b
        self.N=N
        self.M=M
        
        O = T.ivector()
        TT = O.shape[0]
        
        
        #forward algorithm:
        
        alpha0=pi+b[:,O[0]]
        
        alpha_scan,upd = theano.scan(fn=lambda O,alpha_p: LogDot(alpha_p,a)+b[:,O],
                               sequences=O[1:],
                               outputs_info=alpha0)
        
        updates.update(upd)
        
        alpha=T.concatenate((alpha0.reshape((1,N)),alpha_scan))                
        
        #backward algorithm:
        
        beta0=T.zeros(N).astype(theano.config.floatX)
        
        beta_scan,upd = theano.scan(fn=lambda O,beta_p: LogDot(beta_p+b[:,O],a.T),
                               sequences=O[1:],
                               outputs_info=beta0,
                               go_backwards=True)
        updates.update(upd)
        
        beta=T.concatenate((beta_scan[::-1],beta0.reshape((1,N))))        
        
        #full model probability:
        
        full_prob = LogSum(alpha_scan[-1])
        
        #forward-backward probabilities:
        
        gamma=alpha+beta-full_prob        
        
        #viterbi algorithm:
        
        def viterbi_rec_step(O, delta_p, phi_p):
            m=delta_p+a.T
            phi=m.argmax(axis=1)
            delta=m[T.arange(N),phi]+b[:,O]
            return delta,phi
        
        phi0=T.zeros(N).astype('int64')

        [delta_scan, phi_scan], upd = theano.scan(fn=viterbi_rec_step,
                                                  sequences=O[1:],
                                                  outputs_info=[alpha0,phi0])        
        
        updates.update(upd)
        
        QT=phi_scan[-1].argmax()        
        vite_prob = delta_scan[-1,QT]
        
        Q_scan, upd = theano.scan(fn=lambda phi, Q: phi[Q],
                             sequences=phi_scan,
                             outputs_info=QT,
                             go_backwards=True)
        
        updates.update(upd)
                                                  
        Q=T.concatenate((Q_scan[::-1],QT.reshape((1,))))
        
        #transition probabilities
        
        xi=alpha[:-1].reshape((TT-1,N,1))+a.reshape((1,N,N))+b[:,O[1:]].T.reshape((TT-1,1,N))+beta[1:].reshape((TT-1,1,N))-full_prob
        
        #expected values
        
        exp_pi=gamma[0]
        
        exp_a=LogSum(xi,axis=0)-LogSum(gamma[:-1],axis=0).reshape((N,1))
        
        def exp_b_fun(k):
            return ifelse(T.eq(gamma[T.eq(O,k).nonzero()].shape[0],0),
                          T.ones((a.shape[1],))*(-99),
                          LogSum(gamma[T.eq(O,k).nonzero()],axis=0)-LogSum(gamma,axis=0))
        
        exp_b_map, upd = theano.map(fn=exp_b_fun, sequences=T.arange(M))
        
        updates.update(upd)
        
        exp_b = exp_b_map.T
        
        exp_err = T.concatenate(((np.exp(pi)-np.exp(exp_pi)).ravel(),
                                 (np.exp(a)-np.exp(exp_a)).ravel(),
                                 (np.exp(b)-np.exp(exp_b)).ravel()))
        
        exp_mean_err = T.mean(exp_err**2)
        
        #Baum-Welch updates:
        
        baum_welch_updates=OrderedDict()
        exp_updates={pi:exp_pi,a:exp_a,b:exp_b}
        baum_welch_updates.update(updates)
        baum_welch_updates.update(exp_updates)
        
        #Gradient descent:
        
        pi_grad=T.grad(cost=full_prob,wrt=pi)
        a_grad=T.grad(cost=full_prob,wrt=a)
        b_grad=T.grad(cost=full_prob,wrt=b)
                
        lr=T.scalar()
        
        pi_upd=pi+(pi_grad*lr)
        norm_pi_upd=pi_upd-LogSum(pi_upd)
        
        a_upd=a+(a_grad*lr)
        norm_a_upd=(a_upd.T-LogSum(a_upd,axis=1)).T
        
        b_upd=b+(b_grad*lr)
        norm_b_upd=b_upd-LogSum(b_upd,axis=0)
        
        gd_updates=OrderedDict()
        grad_updates={pi:norm_pi_upd,
                      a:norm_a_upd,
                      b:norm_b_upd}
        gd_updates.update(updates)
        gd_updates.update(grad_updates)            
        
        #function definitions
        
        self.forward_fun = theano.function(inputs=[O], outputs=alpha, updates=updates)
        
        self.backward_fun = theano.function(inputs=[O], outputs=beta, updates=updates)
        
        self.full_prob_fun = theano.function(inputs=[O], outputs=full_prob, updates=updates)
        
        self.gamma_fun = theano.function(inputs=[O], outputs=gamma, updates=updates)
        
        self.viterbi_fun = theano.function(inputs=[O], outputs=[Q,vite_prob], updates=updates)
    
        self.xi_fun = theano.function(inputs=[O], outputs=xi, updates=updates)          
        
        self.exp_fun = theano.function(inputs=[O], outputs=[exp_pi,exp_a,exp_b], updates=updates)
        
        self.baum_welch_fun = theano.function(inputs=[O], outputs=[full_prob,exp_mean_err], updates=baum_welch_updates)
        
        self.gd_fun = theano.function(inputs=[O,lr], outputs=full_prob, updates=gd_updates)
    
    def setModel(self,pi,a,b,N,M):
        
        self.pi.set_value(pi.astype(theano.config.floatX))
        self.a.set_value(a.astype(theano.config.floatX))
        self.b.set_value(b.astype(theano.config.floatX))
        self.N.set_value(N)
        self.M.set_value(M)
    
    def getModel(self):
        
        return self.pi.get_value(),self.a.get_value(),self.b.get_value(),self.N.get_value(),self.M.get_value()
    
    
    def forward(self, O):
        
        return self.forward_fun(O.astype('int32')) 
    
    
    def backward(self, O):
        
        return self.backward_fun(O.astype('int32'))        
    
    def full_prob(self, O):
        
        return self.full_prob_fun(O.astype('int32'))
    
    def gamma(self, O):
        
        return self.gamma_fun(O.astype('int32'))
    
    def viterbi(self, O):
        
        return self.viterbi_fun(O.astype('int32'))
    
    def xi(self, O):
        
        return self.xi_fun(O.astype('int32'))
    
    def exp_values(self, O):
        
        return self.exp_fun(O.astype('int32'))
    
    def baum_welch(self,O):
        
        return self.baum_welch_fun(O.astype('int32'))
    
    def gradient_descent(self,O,lr=0.01):
        
        return self.gd_fun(O.astype('int32'),lr)

Here we construct the object. It is not much more complicated than the one above:


In [11]:
%time loghmm=LogDiscreteHMM()


CPU times: user 23.3 s, sys: 232 ms, total: 23.5 s
Wall time: 23.5 s

Since all the parameters are in the log domain, we have to take logarithms of all the values that were used above:


In [12]:
loghmm.setModel(np.log(pi),np.log(a),np.log(b),N,M)

And we have to compute the exponential of the results to get back into the normal domain. Nevertheless, the results are the same as above:


In [13]:
print 'Forward probabilities:\n{}'.format(np.exp(loghmm.forward(O)))
print 'Backward probabilities:\n{}'.format(np.exp(loghmm.backward(O)))
print 'Full model probability: {}'.format(np.exp(loghmm.full_prob(O)))
print 'Complete state probability:\n{}'.format(np.exp(loghmm.gamma(O)))
seq,vite_prob=loghmm.viterbi(O)
print 'Viterbi sequence: {} its probability {}'.format(seq,np.exp(vite_prob))
print 'State transition probability:\n{}'.format(np.exp(loghmm.xi(O)))


Forward probabilities:
[[ 0.06971105  0.15136687  0.05977095]
 [ 0.01002924  0.07884961  0.01524421]
 [ 0.01288864  0.01872524  0.00502583]
 [ 0.00143438  0.00860382  0.0023042 ]
 [ 0.00234515  0.00147968  0.00047268]]
Backward probabilities:
[[ 0.00983429  0.01427416  0.02428114]
 [ 0.04165408  0.03925479  0.0514633 ]
 [ 0.06524467  0.12455946  0.22368038]
 [ 0.35361817  0.37165272  0.25720248]
 [ 1.          1.          1.        ]]
Full model probability: 0.00429750420153
Complete state probability:
[[ 0.15952486  0.50276518  0.33770928]
 [ 0.09720964  0.72023761  0.1825518 ]
 [ 0.19567522  0.54273504  0.26158893]
 [ 0.11802752  0.74406731  0.13790481]
 [ 0.54570061  0.34431088  0.1099883 ]]
Viterbi sequence: [1 1 1 1 0] its probability 0.000408371095546
State transition probability:
[[[  3.07955146e-02   3.43531743e-02   9.43760946e-02]
  [  6.62453920e-02   3.67618322e-01   6.89015239e-02]
  [  1.68785802e-04   3.18266422e-01   1.92741342e-02]]

 [[  2.22395957e-02   9.66233294e-03   6.53077289e-02]
  [  1.73219576e-01   3.74381095e-01   1.72636926e-01]
  [  2.16084241e-04   1.58691511e-01   2.36442778e-02]]

 [[  4.83359061e-02   6.01336136e-02   8.72057825e-02]
  [  6.95711821e-02   4.30564761e-01   4.25993316e-02]
  [  1.20484357e-04   2.53368825e-01   8.09971802e-03]]

 [[  7.84873813e-02   7.18523795e-03   3.23548727e-02]
  [  4.66406941e-01   2.12406784e-01   6.52534440e-02]
  [  8.05963005e-04   1.24718860e-01   1.23799695e-02]]]

The expected values for Baum-Welch are also correct:


In [14]:
exp_pi,exp_a,exp_b=loghmm.exp_values(O)
print 'Expected priors: {}'.format(np.exp(exp_pi))
print 'Expected transitions:\n{}'.format(np.exp(exp_a))
print 'Expected observations:\n{}'.format(np.exp(exp_b))


Expected priors: [ 0.15952486  0.50276518  0.33770928]
Expected transitions:
[[ 0.31529918  0.19517373  0.48952711]
 [ 0.30896547  0.55182409  0.13921049]
 [ 0.00142572  0.92964518  0.06892936]]
Expected observations:
[[  6.31844342e-01   1.92840993e-01   1.75314546e-01   1.00893489e-43]
 [  2.96791017e-01   5.13050258e-01   1.90158710e-01   1.00893489e-43]
 [  4.34766293e-01   3.11200529e-01   2.54033178e-01   1.00893489e-43]]

And the Baum-Welch procedure works the same as well. The only exception here is that the exp_err value is not retrieved in the log domain, since it's more convinient this way:


In [15]:
loghmm.setModel(np.log(pi),np.log(a),np.log(b),N,M)
for i in range(15):    
    prob,exp_err=loghmm.baum_welch(O)
    print 'Iteration #{} P={} delta_exp={}'.format(i+1,np.exp(prob),exp_err)


Iteration #1 P=0.00429750420153 delta_exp=0.0312950797379
Iteration #2 P=0.00509988516569 delta_exp=0.0025048465468
Iteration #3 P=0.00619260035455 delta_exp=0.00354749895632
Iteration #4 P=0.00862175505608 delta_exp=0.00476953713223
Iteration #5 P=0.0136389117688 delta_exp=0.00249158591032
Iteration #6 P=0.0189214926213 delta_exp=0.00156688096467
Iteration #7 P=0.0234132502228 delta_exp=0.00485255988315
Iteration #8 P=0.0374569296837 delta_exp=0.00896290969104
Iteration #9 P=0.109154902399 delta_exp=0.00389891047962
Iteration #10 P=0.218593791127 delta_exp=0.000297478283755
Iteration #11 P=0.238835379481 delta_exp=7.89276600699e-05
Iteration #12 P=0.244890540838 delta_exp=1.83132779057e-05
Iteration #13 P=0.247751682997 delta_exp=3.72287468053e-06
Iteration #14 P=0.249030068517 delta_exp=7.08665595539e-07
Iteration #15 P=0.249585136771 delta_exp=1.30945835508e-07

Finally, gradient descent works similarly to the one above:


In [16]:
loghmm.setModel(np.log(pi),np.log(a),np.log(b),N,M)

for i in range(20):
    prob=loghmm.gradient_descent(O,0.2)
    print 'Iteration #{} P={}'.format(i+1,np.exp(prob))


Iteration #1 P=0.00429750420153
Iteration #2 P=0.0048121935688
Iteration #3 P=0.00581054808572
Iteration #4 P=0.0079429903999
Iteration #5 P=0.0128070320934
Iteration #6 P=0.0240163747221
Iteration #7 P=0.0483203493059
Iteration #8 P=0.0939218401909
Iteration #9 P=0.164299041033
Iteration #10 P=0.254913568497
Iteration #11 P=0.356227010489
Iteration #12 P=0.458135515451
Iteration #13 P=0.552913546562
Iteration #14 P=0.636195361614
Iteration #15 P=0.706535339355
Iteration #16 P=0.764397203922
Iteration #17 P=0.811213552952
Iteration #18 P=0.84873342514
Iteration #19 P=0.878661632538
Iteration #20 P=0.902498006821

In [ ]: