$$ \LaTeX \text{ command declarations here.} \newcommand{\R}{\mathbb{R}} \renewcommand{\vec}[1]{\mathbf{#1}} \newcommand{\X}{\mathcal{X}} \newcommand{\D}{\mathcal{D}} \newcommand{\G}{\mathcal{G}} \newcommand{\L}{\mathcal{L}} \newcommand{\X}{\mathcal{X}} \newcommand{\Parents}{\mathrm{Parents}} \newcommand{\NonDesc}{\mathrm{NonDesc}} \newcommand{\I}{\mathcal{I}} \newcommand{\dsep}{\text{d-sep}} \newcommand{\Cat}{\mathrm{Categorical}} \newcommand{\Bin}{\mathrm{Binomial}} $$

HMMs and the Baum-Welch Algorithm

As covered in lecture, the Baum-Welch Algorithm is a derivation of the EM algorithm for HMMs where we learn the paramaters A, B and $\pi$ given a set of observations.

In this hands-on exercise we will build upon the forward and backward algorithms from last exercise, which can be used for the E-step, and implement Baum-Welch ourselves!

Like last time, we'll work with an example where we observe a sequence of words backed by a latent part of speech variable.

$X$: discrete distribution over bag of words

$Z$: discrete distribution over parts of speech

$A$: the probability of a part of speech given a previous part of speech, e.g, what do we expect to see after a noun?

$B$: the distribution of words given a particular part of speech, e.g, what words are we likely to see if we know it is a verb?

$x_{i}s$ a sequence of observed words (a sentence). Note: in for both variables we have a special "end" outcome that signals the end of a sentence. This makes sense as a part of speech tagger would like to have a sense of sentence boundaries.


In [4]:
import numpy as np
np.set_printoptions(suppress=True)

parts_of_speech = DETERMINER, NOUN, VERB, END = 0, 1, 2, 3
words = THE, DOG, CAT, WALKED, RAN, IN, PARK, END = 0, 1, 2, 3, 4, 5, 6, 7

# transition probabilities
A = np.array([
        # D     N   V   E
        [0.1, 0.8, 0.1, 0.0],  # D: determiner most likely to go to noun
        [0.1, 0.1, 0.6, 0.2],  # N: noun most likely to go to verb
        [0.4, 0.3, 0.2, 0.1],  # V 
        [0.0, 0.0, 0.0, 1.0]]) # E: end always goes to end

# distribution of parts of speech for the first word of a sentence
pi = np.array([0.4, 0.3, 0.3, 0.0])

# emission probabilities
B = np.array([
        # D     N     V     E
        [ 0.8,  0.1,  0.1,  0. ],  # the
        [ 0.1,  0.8,  0.1,  0. ],  # dog
        [ 0.1,  0.8,  0.1,  0. ],  # cat
        [ 0. ,  0. ,  1. ,  0. ],  # walked
        [ 0. ,  0.2 , 0.8 ,  0. ], # ran
        [ 1. ,  0. ,  0. ,  0. ],  # in
        [ 0. ,  0.1,  0.9,  0. ],  # park
        [ 0. ,  0. ,  0. ,  1. ]]) # end

B = B / np.sum(B, axis=0)


[[ 0.4         0.05        0.03333333  0.        ]
 [ 0.05        0.4         0.03333333  0.        ]
 [ 0.05        0.4         0.03333333  0.        ]
 [ 0.          0.          0.33333333  0.        ]
 [ 0.          0.1         0.26666667  0.        ]
 [ 0.5         0.          0.          0.        ]
 [ 0.          0.05        0.3         0.        ]
 [ 0.          0.          0.          1.        ]]

In [2]:
# utilties for printing out parameters of HMM

import pandas as pd

pos_labels = ["D", "N", "V", "E"]
word_labels = ["the", "dog", "cat", "walked", "ran", "in", "park", "end"]

def print_B(B):
    print(pd.DataFrame(B, columns=pos_labels, index=word_labels))
    
def print_A(A):
    print(pd.DataFrame(A, columns=pos_labels, index=pos_labels))
        
print_A(A)
print_B(B)


     D    N    V    E
D  0.1  0.8  0.1  0.0
N  0.1  0.1  0.6  0.2
V  0.4  0.3  0.2  0.1
E  0.0  0.0  0.0  1.0
          D    N    V    E
the     0.8  0.1  0.1  0.0
dog     0.1  0.8  0.1  0.0
cat     0.1  0.8  0.1  0.0
walked  0.0  0.0  1.0  0.0
ran     0.0  0.2  0.8  0.0
in      1.0  0.0  0.0  0.0
park    0.0  0.1  0.9  0.0
end     0.0  0.0  0.0  1.0

Review: Forward / Backward

Here are solutions to last hands-on lecture's coding problems along with example uses with a pre-defined A and B matrices.

$\alpha_t(z_t) = B_{z_t,x_t} \sum_{z_{t-1}} \alpha_{t-1}(z_{t-1}) A_{z_{t-1}, z_t} $

$\beta(z_t) = \sum_{z_{t+1}} A_{z_t, z_{t+1}} B_{z_{t+1}, x_{t+1}} \beta_{t+1}(z_{t+1})$


In [3]:
def forward(params, observations):
    pi, A, B = params
    N = len(observations)
    S = pi.shape[0]
    
    alpha = np.zeros((N, S))
    
    # base case
    alpha[0, :] = pi * B[observations[0], :]
    
    # recursive case
    for i in range(1, N):
        for s2 in range(S):
            for s1 in range(S):
                alpha[i, s2] += alpha[i-1, s1] * A[s1, s2] * B[observations[i], s2]    
    
    return (alpha, np.sum(alpha[N-1,:]))

def print_forward(params, observations):
    alpha, za = forward(params, observations)
    print(pd.DataFrame(
            alpha, 
            columns=pos_labels, 
            index=[word_labels[i] for i in observations]))

print_forward((pi, A, B), [THE, DOG, WALKED, IN, THE, PARK, END])
print_forward((pi, A, B), [THE, CAT, RAN, IN, THE, PARK, END])


               D         N         V        E
the     0.320000  0.030000  0.030000  0.00000
dog     0.004700  0.214400  0.005600  0.00000
walked  0.000000  0.000000  0.130230  0.00000
in      0.052092  0.000000  0.000000  0.00000
the     0.004167  0.004167  0.000521  0.00000
park    0.000000  0.000391  0.002719  0.00000
end     0.000000  0.000000  0.000000  0.00035
             D         N         V         E
the   0.320000  0.030000  0.030000  0.000000
cat   0.004700  0.214400  0.005600  0.000000
ran   0.000000  0.005376  0.104184  0.000000
in    0.042211  0.000000  0.000000  0.000000
the   0.003377  0.003377  0.000422  0.000000
park  0.000000  0.000317  0.002203  0.000000
end   0.000000  0.000000  0.000000  0.000284

In [4]:
def backward(params, observations):
    pi, A, B = params
    N = len(observations)
    S = pi.shape[0]
    
    beta = np.zeros((N, S))
    
    # base case
    beta[N-1, :] = 1
    
    # recursive case
    for i in range(N-2, -1, -1):
        for s1 in range(S):
            for s2 in range(S):
                beta[i, s1] += beta[i+1, s2] * A[s1, s2] * B[observations[i+1], s2]
    
    return (beta, np.sum(pi * B[observations[0], :] * beta[0,:]))

backward((pi, A, B), [THE, DOG, WALKED, IN, THE, PARK, END])


Out[4]:
(array([[ 0.00104026,  0.00016397,  0.00040858,  0.        ],
        [ 0.0002688 ,  0.0016128 ,  0.0005376 ,  0.        ],
        [ 0.000672  ,  0.000672  ,  0.002688  ,  0.        ],
        [ 0.00672   ,  0.004     ,  0.01016   ,  0.        ],
        [ 0.025     ,  0.056     ,  0.024     ,  0.        ],
        [ 0.        ,  0.2       ,  0.1       ,  1.        ],
        [ 1.        ,  1.        ,  1.        ,  1.        ]]),
 0.00035005824000000022)

Implementing Baum-welch

With the forward and backward algorithm implementions ready, let's use them to implement baum-welch, EM for HMMs.

In the M step, here's the parameters are updated:

$ p(z_{t-1}, z_t | \X, \theta) = \frac{\alpha_{t-1}(z_{t-1}) \beta_t(z_t) A_{z_{t-1}, z_t} B_{z_t, x_t}}{\sum_k \alpha_t(k)\beta_t(k)} $

First, let's look at an implementation of this below and see how it works when applied to some training data.


In [5]:
# Some utitlities for tracing our implementation below

def left_pad(i, s):
    return "\n".join(["{}{}".format(' '*i, l) for l in s.split("\n")])

def pad_print(i, s):
    print(left_pad(i, s))
    
def pad_print_args(i, **kwargs):
    pad_print(i, "\n".join(["{}:\n{}".format(k, kwargs[k]) for k in sorted(kwargs.keys())]))

In [6]:
def baum_welch(training, pi, A, B, iterations, trace=False):
    pi, A, B = np.copy(pi), np.copy(A), np.copy(B)  # take copies, as we modify them
    S = pi.shape[0]

    # iterations of EM
    for it in range(iterations):
        if trace:
            pad_print(0, "for it={} in range(iterations)".format(it))
            pad_print_args(2, A=A, B=B, pi=pi, S=S)
        pi1 = np.zeros_like(pi)
        A1 = np.zeros_like(A)
        B1 = np.zeros_like(B)

        for observations in training:
            if trace:
                pad_print(2, "for observations={} in training".format(observations))

            # 
            # E-Step: compute forward-backward matrices
            # 
                
            alpha, za = forward((pi, A, B), observations)
            beta, zb = backward((pi, A, B), observations)
            if trace:
                pad_print(4, """alpha, za = forward((pi, A, B), observations)\nbeta, zb = backward((pi, A, B), observations)""")
                pad_print_args(4, alpha=alpha, beta=beta, za=za, zb=zb)

            assert abs(za - zb) < 1e-6, "it's badness 10000 if the marginals don't agree ({} vs {})".format(za, zb)

            #
            # M-step: calculating the frequency of starting state, transitions and (state, obs) pairs
            #
            
            # Update PI: 
            pi1 += alpha[0, :] * beta[0, :] / za

            if trace:
                pad_print(4, "pi1 += alpha[0, :] * beta[0, :] / za")
                pad_print_args(4, pi1=pi1)
                pad_print(4, "for i in range(0, len(observations)):")
            
            # Update B (transition) matrix
            for i in range(0, len(observations)):
                # Hint: B1 can be updated similarly to PI for each row 1 
            if trace:
                pad_print_args(4, B1=B1)
                pad_print(4, "for i in range(1, len(observations)):")
                
            # Update A (emission) matrix
            for i in range(1, len(observations)):
                if trace: 
                    pad_print(6, "for s1 in range(S={})".format(S))
                for s1 in range(S):
                    if trace: pad_print(8, "for s2 in range(S={})".format(S))
                    for s2 in range(S):
            if trace: pad_print_args(4, A1=A1)

        # normalise pi1, A1, B1
        
    return pi, A, B

Training with examples

Let's try producing updated parameters to our HMM using a few examples. How did the A and B matrixes get updated with data? Was any confidence gained in the emission probabilities of nouns? Verbs?


In [7]:
pi2, A2, B2 = baum_welch([
        [THE, DOG, WALKED, IN, THE, PARK, END, END], # END -> END needs at least one transition example
        [THE, DOG, RAN, IN, THE, PARK, END],
        [THE, CAT, WALKED, IN, THE, PARK, END],
        [THE, DOG, RAN, IN, THE, PARK, END]], pi, A, B, 10, trace=False)

print("original A")
print_A(A)

print("updated A")
print_A(A2)

print("\noriginal B")
print_B(B)

print("updated B")
print_B(B2)

print("\nForward probabilities of sample using updated params:")
print_forward((pi2, A2, B2), [THE, DOG, WALKED, IN, THE, PARK, END])


original A
     D    N    V    E
D  0.1  0.8  0.1  0.0
N  0.1  0.1  0.6  0.2
V  0.4  0.3  0.2  0.1
E  0.0  0.0  0.0  1.0
updated A
     D    N    V    E
D  0.0  1.0  0.0  0.0
N  0.0  0.0  1.0  0.0
V  0.5  0.0  0.0  0.5
E  0.0  0.0  0.0  1.0

original B
          D    N    V    E
the     0.8  0.1  0.1  0.0
dog     0.1  0.8  0.1  0.0
cat     0.1  0.8  0.1  0.0
walked  0.0  0.0  1.0  0.0
ran     0.0  0.2  0.8  0.0
in      1.0  0.0  0.0  0.0
park    0.0  0.1  0.9  0.0
end     0.0  0.0  0.0  1.0
updated B
          D    N    V    E
the     0.5  0.5  0.0  0.0
dog     0.0  1.0  0.0  0.0
cat     0.0  1.0  0.0  0.0
walked  0.0  0.0  1.0  0.0
ran     0.0  0.2  0.8  0.0
in      1.0  0.0  0.0  0.0
park    0.0  0.1  0.9  0.0
end     0.0  0.0  0.0  1.0

Forward probabilities of sample using updated params:
           D      N       V        E
the     0.50  0.000  0.0000  0.00000
dog     0.00  0.500  0.0000  0.00000
walked  0.00  0.000  0.5000  0.00000
in      0.25  0.000  0.0000  0.00000
the     0.00  0.125  0.0000  0.00000
park    0.00  0.000  0.1125  0.00000
end     0.00  0.000  0.0000  0.05625

Tracing through the implementation

Let's look at a trace of one iteration. Study the steps carefully and make sure you understand how we are updating the parameters, corresponding to these updates:

$ p(z_{t-1}, z_t | \X, \theta) = \frac{\alpha_{t-1}(z_{t-1}) \beta_t(z_t) A_{z_{t-1}, z_t} B_{z_t, x_t}}{\sum_k \alpha_t(k)\beta_t(k)} $


In [8]:
pi3, A3, B3 = baum_welch([
        [THE, DOG, WALKED, IN, THE, PARK, END, END], 
        [THE, CAT, RAN, IN, THE, PARK, END, END]], pi, A, B, 1, trace=True)

print("\n\n")

print_A(A3)

print_B(B3)


for it=0 in range(iterations)
  A:
  [[ 0.1  0.8  0.1  0. ]
   [ 0.1  0.1  0.6  0.2]
   [ 0.4  0.3  0.2  0.1]
   [ 0.   0.   0.   1. ]]
  B:
  [[ 0.8  0.1  0.1  0. ]
   [ 0.1  0.8  0.1  0. ]
   [ 0.1  0.8  0.1  0. ]
   [ 0.   0.   1.   0. ]
   [ 0.   0.2  0.8  0. ]
   [ 1.   0.   0.   0. ]
   [ 0.   0.1  0.9  0. ]
   [ 0.   0.   0.   1. ]]
  S:
  4
  pi:
  [ 0.4  0.3  0.3  0. ]
  for observations=[0, 1, 3, 5, 0, 6, 7, 7] in training
    alpha, za = forward((pi, A, B), observations)
    beta, zb = backward((pi, A, B), observations)
    alpha:
    [[ 0.32        0.03        0.03        0.        ]
     [ 0.0047      0.2144      0.0056      0.        ]
     [ 0.          0.          0.13023     0.        ]
     [ 0.052092    0.          0.          0.        ]
     [ 0.00416736  0.00416736  0.00052092  0.        ]
     [ 0.          0.00039069  0.0027192   0.        ]
     [ 0.          0.          0.          0.00035006]
     [ 0.          0.          0.          0.00035006]]
    beta:
    [[ 0.00104026  0.00016397  0.00040858  0.        ]
     [ 0.0002688   0.0016128   0.0005376   0.        ]
     [ 0.000672    0.000672    0.002688    0.        ]
     [ 0.00672     0.004       0.01016     0.        ]
     [ 0.025       0.056       0.024       0.        ]
     [ 0.          0.2         0.1         1.        ]
     [ 0.          0.2         0.1         1.        ]
     [ 1.          1.          1.          1.        ]]
    za:
    0.0003500582400000003
    zb:
    0.0003500582400000002
    pi1 += alpha[0, :] * beta[0, :] / za
    pi1:
    [ 0.95093296  0.01405206  0.03501497  0.        ]
    for i in range(0, len(observations)):
      B1[observations[0], :] += alpha[0, :] * beta[0, :] / za
      B1[observations[1], :] += alpha[1, :] * beta[1, :] / za
      B1[observations[2], :] += alpha[2, :] * beta[2, :] / za
      B1[observations[3], :] += alpha[3, :] * beta[3, :] / za
      B1[observations[4], :] += alpha[4, :] * beta[4, :] / za
      B1[observations[5], :] += alpha[5, :] * beta[5, :] / za
      B1[observations[6], :] += alpha[6, :] * beta[6, :] / za
      B1[observations[7], :] += alpha[7, :] * beta[7, :] / za
    B1:
    [[ 1.24855201  0.68071873  0.07072926  0.        ]
     [ 0.003609    0.98779083  0.00860017  0.        ]
     [ 0.          0.          0.          0.        ]
     [ 0.          0.          1.          0.        ]
     [ 0.          0.          0.          0.        ]
     [ 1.          0.          0.          0.        ]
     [ 0.          0.22321429  0.77678571  0.        ]
     [ 0.          0.          0.          2.        ]]
    for i in range(1, len(observations)):
      for s1 in range(S=4)
        for s2 in range(S=4)
          A1[0, 0] += alpha[0, 0] * A[0, 0] * B[observations[1], 0] * beta[1, 0] / za
          A1[0, 1] += alpha[0, 0] * A[0, 1] * B[observations[1], 1] * beta[1, 1] / za
          A1[0, 2] += alpha[0, 0] * A[0, 2] * B[observations[1], 2] * beta[1, 2] / za
          A1[0, 3] += alpha[0, 0] * A[0, 3] * B[observations[1], 3] * beta[1, 3] / za
        for s2 in range(S=4)
          A1[1, 0] += alpha[0, 1] * A[1, 0] * B[observations[1], 0] * beta[1, 0] / za
          A1[1, 1] += alpha[0, 1] * A[1, 1] * B[observations[1], 1] * beta[1, 1] / za
          A1[1, 2] += alpha[0, 1] * A[1, 2] * B[observations[1], 2] * beta[1, 2] / za
          A1[1, 3] += alpha[0, 1] * A[1, 3] * B[observations[1], 3] * beta[1, 3] / za
        for s2 in range(S=4)
          A1[2, 0] += alpha[0, 2] * A[2, 0] * B[observations[1], 0] * beta[1, 0] / za
          A1[2, 1] += alpha[0, 2] * A[2, 1] * B[observations[1], 1] * beta[1, 1] / za
          A1[2, 2] += alpha[0, 2] * A[2, 2] * B[observations[1], 2] * beta[1, 2] / za
          A1[2, 3] += alpha[0, 2] * A[2, 3] * B[observations[1], 3] * beta[1, 3] / za
        for s2 in range(S=4)
          A1[3, 0] += alpha[0, 3] * A[3, 0] * B[observations[1], 0] * beta[1, 0] / za
          A1[3, 1] += alpha[0, 3] * A[3, 1] * B[observations[1], 1] * beta[1, 1] / za
          A1[3, 2] += alpha[0, 3] * A[3, 2] * B[observations[1], 2] * beta[1, 2] / za
          A1[3, 3] += alpha[0, 3] * A[3, 3] * B[observations[1], 3] * beta[1, 3] / za
      for s1 in range(S=4)
        for s2 in range(S=4)
          A1[0, 0] += alpha[1, 0] * A[0, 0] * B[observations[2], 0] * beta[2, 0] / za
          A1[0, 1] += alpha[1, 0] * A[0, 1] * B[observations[2], 1] * beta[2, 1] / za
          A1[0, 2] += alpha[1, 0] * A[0, 2] * B[observations[2], 2] * beta[2, 2] / za
          A1[0, 3] += alpha[1, 0] * A[0, 3] * B[observations[2], 3] * beta[2, 3] / za
        for s2 in range(S=4)
          A1[1, 0] += alpha[1, 1] * A[1, 0] * B[observations[2], 0] * beta[2, 0] / za
          A1[1, 1] += alpha[1, 1] * A[1, 1] * B[observations[2], 1] * beta[2, 1] / za
          A1[1, 2] += alpha[1, 1] * A[1, 2] * B[observations[2], 2] * beta[2, 2] / za
          A1[1, 3] += alpha[1, 1] * A[1, 3] * B[observations[2], 3] * beta[2, 3] / za
        for s2 in range(S=4)
          A1[2, 0] += alpha[1, 2] * A[2, 0] * B[observations[2], 0] * beta[2, 0] / za
          A1[2, 1] += alpha[1, 2] * A[2, 1] * B[observations[2], 1] * beta[2, 1] / za
          A1[2, 2] += alpha[1, 2] * A[2, 2] * B[observations[2], 2] * beta[2, 2] / za
          A1[2, 3] += alpha[1, 2] * A[2, 3] * B[observations[2], 3] * beta[2, 3] / za
        for s2 in range(S=4)
          A1[3, 0] += alpha[1, 3] * A[3, 0] * B[observations[2], 0] * beta[2, 0] / za
          A1[3, 1] += alpha[1, 3] * A[3, 1] * B[observations[2], 1] * beta[2, 1] / za
          A1[3, 2] += alpha[1, 3] * A[3, 2] * B[observations[2], 2] * beta[2, 2] / za
          A1[3, 3] += alpha[1, 3] * A[3, 3] * B[observations[2], 3] * beta[2, 3] / za
      for s1 in range(S=4)
        for s2 in range(S=4)
          A1[0, 0] += alpha[2, 0] * A[0, 0] * B[observations[3], 0] * beta[3, 0] / za
          A1[0, 1] += alpha[2, 0] * A[0, 1] * B[observations[3], 1] * beta[3, 1] / za
          A1[0, 2] += alpha[2, 0] * A[0, 2] * B[observations[3], 2] * beta[3, 2] / za
          A1[0, 3] += alpha[2, 0] * A[0, 3] * B[observations[3], 3] * beta[3, 3] / za
        for s2 in range(S=4)
          A1[1, 0] += alpha[2, 1] * A[1, 0] * B[observations[3], 0] * beta[3, 0] / za
          A1[1, 1] += alpha[2, 1] * A[1, 1] * B[observations[3], 1] * beta[3, 1] / za
          A1[1, 2] += alpha[2, 1] * A[1, 2] * B[observations[3], 2] * beta[3, 2] / za
          A1[1, 3] += alpha[2, 1] * A[1, 3] * B[observations[3], 3] * beta[3, 3] / za
        for s2 in range(S=4)
          A1[2, 0] += alpha[2, 2] * A[2, 0] * B[observations[3], 0] * beta[3, 0] / za
          A1[2, 1] += alpha[2, 2] * A[2, 1] * B[observations[3], 1] * beta[3, 1] / za
          A1[2, 2] += alpha[2, 2] * A[2, 2] * B[observations[3], 2] * beta[3, 2] / za
          A1[2, 3] += alpha[2, 2] * A[2, 3] * B[observations[3], 3] * beta[3, 3] / za
        for s2 in range(S=4)
          A1[3, 0] += alpha[2, 3] * A[3, 0] * B[observations[3], 0] * beta[3, 0] / za
          A1[3, 1] += alpha[2, 3] * A[3, 1] * B[observations[3], 1] * beta[3, 1] / za
          A1[3, 2] += alpha[2, 3] * A[3, 2] * B[observations[3], 2] * beta[3, 2] / za
          A1[3, 3] += alpha[2, 3] * A[3, 3] * B[observations[3], 3] * beta[3, 3] / za
      for s1 in range(S=4)
        for s2 in range(S=4)
          A1[0, 0] += alpha[3, 0] * A[0, 0] * B[observations[4], 0] * beta[4, 0] / za
          A1[0, 1] += alpha[3, 0] * A[0, 1] * B[observations[4], 1] * beta[4, 1] / za
          A1[0, 2] += alpha[3, 0] * A[0, 2] * B[observations[4], 2] * beta[4, 2] / za
          A1[0, 3] += alpha[3, 0] * A[0, 3] * B[observations[4], 3] * beta[4, 3] / za
        for s2 in range(S=4)
          A1[1, 0] += alpha[3, 1] * A[1, 0] * B[observations[4], 0] * beta[4, 0] / za
          A1[1, 1] += alpha[3, 1] * A[1, 1] * B[observations[4], 1] * beta[4, 1] / za
          A1[1, 2] += alpha[3, 1] * A[1, 2] * B[observations[4], 2] * beta[4, 2] / za
          A1[1, 3] += alpha[3, 1] * A[1, 3] * B[observations[4], 3] * beta[4, 3] / za
        for s2 in range(S=4)
          A1[2, 0] += alpha[3, 2] * A[2, 0] * B[observations[4], 0] * beta[4, 0] / za
          A1[2, 1] += alpha[3, 2] * A[2, 1] * B[observations[4], 1] * beta[4, 1] / za
          A1[2, 2] += alpha[3, 2] * A[2, 2] * B[observations[4], 2] * beta[4, 2] / za
          A1[2, 3] += alpha[3, 2] * A[2, 3] * B[observations[4], 3] * beta[4, 3] / za
        for s2 in range(S=4)
          A1[3, 0] += alpha[3, 3] * A[3, 0] * B[observations[4], 0] * beta[4, 0] / za
          A1[3, 1] += alpha[3, 3] * A[3, 1] * B[observations[4], 1] * beta[4, 1] / za
          A1[3, 2] += alpha[3, 3] * A[3, 2] * B[observations[4], 2] * beta[4, 2] / za
          A1[3, 3] += alpha[3, 3] * A[3, 3] * B[observations[4], 3] * beta[4, 3] / za
      for s1 in range(S=4)
        for s2 in range(S=4)
          A1[0, 0] += alpha[4, 0] * A[0, 0] * B[observations[5], 0] * beta[5, 0] / za
          A1[0, 1] += alpha[4, 0] * A[0, 1] * B[observations[5], 1] * beta[5, 1] / za
          A1[0, 2] += alpha[4, 0] * A[0, 2] * B[observations[5], 2] * beta[5, 2] / za
          A1[0, 3] += alpha[4, 0] * A[0, 3] * B[observations[5], 3] * beta[5, 3] / za
        for s2 in range(S=4)
          A1[1, 0] += alpha[4, 1] * A[1, 0] * B[observations[5], 0] * beta[5, 0] / za
          A1[1, 1] += alpha[4, 1] * A[1, 1] * B[observations[5], 1] * beta[5, 1] / za
          A1[1, 2] += alpha[4, 1] * A[1, 2] * B[observations[5], 2] * beta[5, 2] / za
          A1[1, 3] += alpha[4, 1] * A[1, 3] * B[observations[5], 3] * beta[5, 3] / za
        for s2 in range(S=4)
          A1[2, 0] += alpha[4, 2] * A[2, 0] * B[observations[5], 0] * beta[5, 0] / za
          A1[2, 1] += alpha[4, 2] * A[2, 1] * B[observations[5], 1] * beta[5, 1] / za
          A1[2, 2] += alpha[4, 2] * A[2, 2] * B[observations[5], 2] * beta[5, 2] / za
          A1[2, 3] += alpha[4, 2] * A[2, 3] * B[observations[5], 3] * beta[5, 3] / za
        for s2 in range(S=4)
          A1[3, 0] += alpha[4, 3] * A[3, 0] * B[observations[5], 0] * beta[5, 0] / za
          A1[3, 1] += alpha[4, 3] * A[3, 1] * B[observations[5], 1] * beta[5, 1] / za
          A1[3, 2] += alpha[4, 3] * A[3, 2] * B[observations[5], 2] * beta[5, 2] / za
          A1[3, 3] += alpha[4, 3] * A[3, 3] * B[observations[5], 3] * beta[5, 3] / za
      for s1 in range(S=4)
        for s2 in range(S=4)
          A1[0, 0] += alpha[5, 0] * A[0, 0] * B[observations[6], 0] * beta[6, 0] / za
          A1[0, 1] += alpha[5, 0] * A[0, 1] * B[observations[6], 1] * beta[6, 1] / za
          A1[0, 2] += alpha[5, 0] * A[0, 2] * B[observations[6], 2] * beta[6, 2] / za
          A1[0, 3] += alpha[5, 0] * A[0, 3] * B[observations[6], 3] * beta[6, 3] / za
        for s2 in range(S=4)
          A1[1, 0] += alpha[5, 1] * A[1, 0] * B[observations[6], 0] * beta[6, 0] / za
          A1[1, 1] += alpha[5, 1] * A[1, 1] * B[observations[6], 1] * beta[6, 1] / za
          A1[1, 2] += alpha[5, 1] * A[1, 2] * B[observations[6], 2] * beta[6, 2] / za
          A1[1, 3] += alpha[5, 1] * A[1, 3] * B[observations[6], 3] * beta[6, 3] / za
        for s2 in range(S=4)
          A1[2, 0] += alpha[5, 2] * A[2, 0] * B[observations[6], 0] * beta[6, 0] / za
          A1[2, 1] += alpha[5, 2] * A[2, 1] * B[observations[6], 1] * beta[6, 1] / za
          A1[2, 2] += alpha[5, 2] * A[2, 2] * B[observations[6], 2] * beta[6, 2] / za
          A1[2, 3] += alpha[5, 2] * A[2, 3] * B[observations[6], 3] * beta[6, 3] / za
        for s2 in range(S=4)
          A1[3, 0] += alpha[5, 3] * A[3, 0] * B[observations[6], 0] * beta[6, 0] / za
          A1[3, 1] += alpha[5, 3] * A[3, 1] * B[observations[6], 1] * beta[6, 1] / za
          A1[3, 2] += alpha[5, 3] * A[3, 2] * B[observations[6], 2] * beta[6, 2] / za
          A1[3, 3] += alpha[5, 3] * A[3, 3] * B[observations[6], 3] * beta[6, 3] / za
      for s1 in range(S=4)
        for s2 in range(S=4)
          A1[0, 0] += alpha[6, 0] * A[0, 0] * B[observations[7], 0] * beta[7, 0] / za
          A1[0, 1] += alpha[6, 0] * A[0, 1] * B[observations[7], 1] * beta[7, 1] / za
          A1[0, 2] += alpha[6, 0] * A[0, 2] * B[observations[7], 2] * beta[7, 2] / za
          A1[0, 3] += alpha[6, 0] * A[0, 3] * B[observations[7], 3] * beta[7, 3] / za
        for s2 in range(S=4)
          A1[1, 0] += alpha[6, 1] * A[1, 0] * B[observations[7], 0] * beta[7, 0] / za
          A1[1, 1] += alpha[6, 1] * A[1, 1] * B[observations[7], 1] * beta[7, 1] / za
          A1[1, 2] += alpha[6, 1] * A[1, 2] * B[observations[7], 2] * beta[7, 2] / za
          A1[1, 3] += alpha[6, 1] * A[1, 3] * B[observations[7], 3] * beta[7, 3] / za
        for s2 in range(S=4)
          A1[2, 0] += alpha[6, 2] * A[2, 0] * B[observations[7], 0] * beta[7, 0] / za
          A1[2, 1] += alpha[6, 2] * A[2, 1] * B[observations[7], 1] * beta[7, 1] / za
          A1[2, 2] += alpha[6, 2] * A[2, 2] * B[observations[7], 2] * beta[7, 2] / za
          A1[2, 3] += alpha[6, 2] * A[2, 3] * B[observations[7], 3] * beta[7, 3] / za
        for s2 in range(S=4)
          A1[3, 0] += alpha[6, 3] * A[3, 0] * B[observations[7], 0] * beta[7, 0] / za
          A1[3, 1] += alpha[6, 3] * A[3, 1] * B[observations[7], 1] * beta[7, 1] / za
          A1[3, 2] += alpha[6, 3] * A[3, 2] * B[observations[7], 2] * beta[7, 2] / za
          A1[3, 3] += alpha[6, 3] * A[3, 3] * B[observations[7], 3] * beta[7, 3] / za
    A1:
    [[ 0.30007624  1.80070425  0.15138052  0.        ]
     [ 0.00023036  0.03486688  1.63341231  0.22321429]
     [ 1.00092145  0.04210065  0.03630733  0.77678571]
     [ 0.          0.          0.          1.        ]]
  for observations=[0, 2, 4, 5, 0, 6, 7, 7] in training
    alpha, za = forward((pi, A, B), observations)
    beta, zb = backward((pi, A, B), observations)
    alpha:
    [[ 0.32        0.03        0.03        0.        ]
     [ 0.0047      0.2144      0.0056      0.        ]
     [ 0.          0.005376    0.104184    0.        ]
     [ 0.0422112   0.          0.          0.        ]
     [ 0.0033769   0.0033769   0.00042211  0.        ]
     [ 0.          0.00031658  0.00220342  0.        ]
     [ 0.          0.          0.          0.00028366]
     [ 0.          0.          0.          0.00028366]]
    beta:
    [[ 0.00084228  0.00013574  0.00033519  0.        ]
     [ 0.00032256  0.00130368  0.0004704   0.        ]
     [ 0.000672    0.000672    0.002688    0.        ]
     [ 0.00672     0.004       0.01016     0.        ]
     [ 0.025       0.056       0.024       0.        ]
     [ 0.          0.2         0.1         1.        ]
     [ 0.          0.2         0.1         1.        ]
     [ 1.          1.          1.          1.        ]]
    za:
    0.0002836592640000002
    zb:
    0.00028365926400000015
    pi1 += alpha[0, :] * beta[0, :] / za
    pi1:
    [ 1.90112628  0.02840844  0.07046528  0.        ]
    for i in range(0, len(observations)):
      B1[observations[0], :] += alpha[0, :] * beta[0, :] / za
      B1[observations[1], :] += alpha[1, :] * beta[1, :] / za
      B1[observations[2], :] += alpha[2, :] * beta[2, :] / za
      B1[observations[3], :] += alpha[3, :] * beta[3, :] / za
      B1[observations[4], :] += alpha[4, :] * beta[4, :] / za
      B1[observations[5], :] += alpha[5, :] * beta[5, :] / za
      B1[observations[6], :] += alpha[6, :] * beta[6, :] / za
      B1[observations[7], :] += alpha[7, :] * beta[7, :] / za
    B1:
    [[ 2.49636437  1.36174177  0.14189385  0.        ]
     [ 0.003609    0.98779083  0.00860017  0.        ]
     [ 0.00534455  0.98536881  0.00928663  0.        ]
     [ 0.          0.          1.          0.        ]
     [ 0.          0.01273596  0.98726404  0.        ]
     [ 2.          0.          0.          0.        ]
     [ 0.          0.44642857  1.55357143  0.        ]
     [ 0.          0.          0.          4.        ]]
    for i in range(1, len(observations)):
      for s1 in range(S=4)
        for s2 in range(S=4)
          A1[0, 0] += alpha[0, 0] * A[0, 0] * B[observations[1], 0] * beta[1, 0] / za
          A1[0, 1] += alpha[0, 0] * A[0, 1] * B[observations[1], 1] * beta[1, 1] / za
          A1[0, 2] += alpha[0, 0] * A[0, 2] * B[observations[1], 2] * beta[1, 2] / za
          A1[0, 3] += alpha[0, 0] * A[0, 3] * B[observations[1], 3] * beta[1, 3] / za
        for s2 in range(S=4)
          A1[1, 0] += alpha[0, 1] * A[1, 0] * B[observations[1], 0] * beta[1, 0] / za
          A1[1, 1] += alpha[0, 1] * A[1, 1] * B[observations[1], 1] * beta[1, 1] / za
          A1[1, 2] += alpha[0, 1] * A[1, 2] * B[observations[1], 2] * beta[1, 2] / za
          A1[1, 3] += alpha[0, 1] * A[1, 3] * B[observations[1], 3] * beta[1, 3] / za
        for s2 in range(S=4)
          A1[2, 0] += alpha[0, 2] * A[2, 0] * B[observations[1], 0] * beta[1, 0] / za
          A1[2, 1] += alpha[0, 2] * A[2, 1] * B[observations[1], 1] * beta[1, 1] / za
          A1[2, 2] += alpha[0, 2] * A[2, 2] * B[observations[1], 2] * beta[1, 2] / za
          A1[2, 3] += alpha[0, 2] * A[2, 3] * B[observations[1], 3] * beta[1, 3] / za
        for s2 in range(S=4)
          A1[3, 0] += alpha[0, 3] * A[3, 0] * B[observations[1], 0] * beta[1, 0] / za
          A1[3, 1] += alpha[0, 3] * A[3, 1] * B[observations[1], 1] * beta[1, 1] / za
          A1[3, 2] += alpha[0, 3] * A[3, 2] * B[observations[1], 2] * beta[1, 2] / za
          A1[3, 3] += alpha[0, 3] * A[3, 3] * B[observations[1], 3] * beta[1, 3] / za
      for s1 in range(S=4)
        for s2 in range(S=4)
          A1[0, 0] += alpha[1, 0] * A[0, 0] * B[observations[2], 0] * beta[2, 0] / za
          A1[0, 1] += alpha[1, 0] * A[0, 1] * B[observations[2], 1] * beta[2, 1] / za
          A1[0, 2] += alpha[1, 0] * A[0, 2] * B[observations[2], 2] * beta[2, 2] / za
          A1[0, 3] += alpha[1, 0] * A[0, 3] * B[observations[2], 3] * beta[2, 3] / za
        for s2 in range(S=4)
          A1[1, 0] += alpha[1, 1] * A[1, 0] * B[observations[2], 0] * beta[2, 0] / za
          A1[1, 1] += alpha[1, 1] * A[1, 1] * B[observations[2], 1] * beta[2, 1] / za
          A1[1, 2] += alpha[1, 1] * A[1, 2] * B[observations[2], 2] * beta[2, 2] / za
          A1[1, 3] += alpha[1, 1] * A[1, 3] * B[observations[2], 3] * beta[2, 3] / za
        for s2 in range(S=4)
          A1[2, 0] += alpha[1, 2] * A[2, 0] * B[observations[2], 0] * beta[2, 0] / za
          A1[2, 1] += alpha[1, 2] * A[2, 1] * B[observations[2], 1] * beta[2, 1] / za
          A1[2, 2] += alpha[1, 2] * A[2, 2] * B[observations[2], 2] * beta[2, 2] / za
          A1[2, 3] += alpha[1, 2] * A[2, 3] * B[observations[2], 3] * beta[2, 3] / za
        for s2 in range(S=4)
          A1[3, 0] += alpha[1, 3] * A[3, 0] * B[observations[2], 0] * beta[2, 0] / za
          A1[3, 1] += alpha[1, 3] * A[3, 1] * B[observations[2], 1] * beta[2, 1] / za
          A1[3, 2] += alpha[1, 3] * A[3, 2] * B[observations[2], 2] * beta[2, 2] / za
          A1[3, 3] += alpha[1, 3] * A[3, 3] * B[observations[2], 3] * beta[2, 3] / za
      for s1 in range(S=4)
        for s2 in range(S=4)
          A1[0, 0] += alpha[2, 0] * A[0, 0] * B[observations[3], 0] * beta[3, 0] / za
          A1[0, 1] += alpha[2, 0] * A[0, 1] * B[observations[3], 1] * beta[3, 1] / za
          A1[0, 2] += alpha[2, 0] * A[0, 2] * B[observations[3], 2] * beta[3, 2] / za
          A1[0, 3] += alpha[2, 0] * A[0, 3] * B[observations[3], 3] * beta[3, 3] / za
        for s2 in range(S=4)
          A1[1, 0] += alpha[2, 1] * A[1, 0] * B[observations[3], 0] * beta[3, 0] / za
          A1[1, 1] += alpha[2, 1] * A[1, 1] * B[observations[3], 1] * beta[3, 1] / za
          A1[1, 2] += alpha[2, 1] * A[1, 2] * B[observations[3], 2] * beta[3, 2] / za
          A1[1, 3] += alpha[2, 1] * A[1, 3] * B[observations[3], 3] * beta[3, 3] / za
        for s2 in range(S=4)
          A1[2, 0] += alpha[2, 2] * A[2, 0] * B[observations[3], 0] * beta[3, 0] / za
          A1[2, 1] += alpha[2, 2] * A[2, 1] * B[observations[3], 1] * beta[3, 1] / za
          A1[2, 2] += alpha[2, 2] * A[2, 2] * B[observations[3], 2] * beta[3, 2] / za
          A1[2, 3] += alpha[2, 2] * A[2, 3] * B[observations[3], 3] * beta[3, 3] / za
        for s2 in range(S=4)
          A1[3, 0] += alpha[2, 3] * A[3, 0] * B[observations[3], 0] * beta[3, 0] / za
          A1[3, 1] += alpha[2, 3] * A[3, 1] * B[observations[3], 1] * beta[3, 1] / za
          A1[3, 2] += alpha[2, 3] * A[3, 2] * B[observations[3], 2] * beta[3, 2] / za
          A1[3, 3] += alpha[2, 3] * A[3, 3] * B[observations[3], 3] * beta[3, 3] / za
      for s1 in range(S=4)
        for s2 in range(S=4)
          A1[0, 0] += alpha[3, 0] * A[0, 0] * B[observations[4], 0] * beta[4, 0] / za
          A1[0, 1] += alpha[3, 0] * A[0, 1] * B[observations[4], 1] * beta[4, 1] / za
          A1[0, 2] += alpha[3, 0] * A[0, 2] * B[observations[4], 2] * beta[4, 2] / za
          A1[0, 3] += alpha[3, 0] * A[0, 3] * B[observations[4], 3] * beta[4, 3] / za
        for s2 in range(S=4)
          A1[1, 0] += alpha[3, 1] * A[1, 0] * B[observations[4], 0] * beta[4, 0] / za
          A1[1, 1] += alpha[3, 1] * A[1, 1] * B[observations[4], 1] * beta[4, 1] / za
          A1[1, 2] += alpha[3, 1] * A[1, 2] * B[observations[4], 2] * beta[4, 2] / za
          A1[1, 3] += alpha[3, 1] * A[1, 3] * B[observations[4], 3] * beta[4, 3] / za
        for s2 in range(S=4)
          A1[2, 0] += alpha[3, 2] * A[2, 0] * B[observations[4], 0] * beta[4, 0] / za
          A1[2, 1] += alpha[3, 2] * A[2, 1] * B[observations[4], 1] * beta[4, 1] / za
          A1[2, 2] += alpha[3, 2] * A[2, 2] * B[observations[4], 2] * beta[4, 2] / za
          A1[2, 3] += alpha[3, 2] * A[2, 3] * B[observations[4], 3] * beta[4, 3] / za
        for s2 in range(S=4)
          A1[3, 0] += alpha[3, 3] * A[3, 0] * B[observations[4], 0] * beta[4, 0] / za
          A1[3, 1] += alpha[3, 3] * A[3, 1] * B[observations[4], 1] * beta[4, 1] / za
          A1[3, 2] += alpha[3, 3] * A[3, 2] * B[observations[4], 2] * beta[4, 2] / za
          A1[3, 3] += alpha[3, 3] * A[3, 3] * B[observations[4], 3] * beta[4, 3] / za
      for s1 in range(S=4)
        for s2 in range(S=4)
          A1[0, 0] += alpha[4, 0] * A[0, 0] * B[observations[5], 0] * beta[5, 0] / za
          A1[0, 1] += alpha[4, 0] * A[0, 1] * B[observations[5], 1] * beta[5, 1] / za
          A1[0, 2] += alpha[4, 0] * A[0, 2] * B[observations[5], 2] * beta[5, 2] / za
          A1[0, 3] += alpha[4, 0] * A[0, 3] * B[observations[5], 3] * beta[5, 3] / za
        for s2 in range(S=4)
          A1[1, 0] += alpha[4, 1] * A[1, 0] * B[observations[5], 0] * beta[5, 0] / za
          A1[1, 1] += alpha[4, 1] * A[1, 1] * B[observations[5], 1] * beta[5, 1] / za
          A1[1, 2] += alpha[4, 1] * A[1, 2] * B[observations[5], 2] * beta[5, 2] / za
          A1[1, 3] += alpha[4, 1] * A[1, 3] * B[observations[5], 3] * beta[5, 3] / za
        for s2 in range(S=4)
          A1[2, 0] += alpha[4, 2] * A[2, 0] * B[observations[5], 0] * beta[5, 0] / za
          A1[2, 1] += alpha[4, 2] * A[2, 1] * B[observations[5], 1] * beta[5, 1] / za
          A1[2, 2] += alpha[4, 2] * A[2, 2] * B[observations[5], 2] * beta[5, 2] / za
          A1[2, 3] += alpha[4, 2] * A[2, 3] * B[observations[5], 3] * beta[5, 3] / za
        for s2 in range(S=4)
          A1[3, 0] += alpha[4, 3] * A[3, 0] * B[observations[5], 0] * beta[5, 0] / za
          A1[3, 1] += alpha[4, 3] * A[3, 1] * B[observations[5], 1] * beta[5, 1] / za
          A1[3, 2] += alpha[4, 3] * A[3, 2] * B[observations[5], 2] * beta[5, 2] / za
          A1[3, 3] += alpha[4, 3] * A[3, 3] * B[observations[5], 3] * beta[5, 3] / za
      for s1 in range(S=4)
        for s2 in range(S=4)
          A1[0, 0] += alpha[5, 0] * A[0, 0] * B[observations[6], 0] * beta[6, 0] / za
          A1[0, 1] += alpha[5, 0] * A[0, 1] * B[observations[6], 1] * beta[6, 1] / za
          A1[0, 2] += alpha[5, 0] * A[0, 2] * B[observations[6], 2] * beta[6, 2] / za
          A1[0, 3] += alpha[5, 0] * A[0, 3] * B[observations[6], 3] * beta[6, 3] / za
        for s2 in range(S=4)
          A1[1, 0] += alpha[5, 1] * A[1, 0] * B[observations[6], 0] * beta[6, 0] / za
          A1[1, 1] += alpha[5, 1] * A[1, 1] * B[observations[6], 1] * beta[6, 1] / za
          A1[1, 2] += alpha[5, 1] * A[1, 2] * B[observations[6], 2] * beta[6, 2] / za
          A1[1, 3] += alpha[5, 1] * A[1, 3] * B[observations[6], 3] * beta[6, 3] / za
        for s2 in range(S=4)
          A1[2, 0] += alpha[5, 2] * A[2, 0] * B[observations[6], 0] * beta[6, 0] / za
          A1[2, 1] += alpha[5, 2] * A[2, 1] * B[observations[6], 1] * beta[6, 1] / za
          A1[2, 2] += alpha[5, 2] * A[2, 2] * B[observations[6], 2] * beta[6, 2] / za
          A1[2, 3] += alpha[5, 2] * A[2, 3] * B[observations[6], 3] * beta[6, 3] / za
        for s2 in range(S=4)
          A1[3, 0] += alpha[5, 3] * A[3, 0] * B[observations[6], 0] * beta[6, 0] / za
          A1[3, 1] += alpha[5, 3] * A[3, 1] * B[observations[6], 1] * beta[6, 1] / za
          A1[3, 2] += alpha[5, 3] * A[3, 2] * B[observations[6], 2] * beta[6, 2] / za
          A1[3, 3] += alpha[5, 3] * A[3, 3] * B[observations[6], 3] * beta[6, 3] / za
      for s1 in range(S=4)
        for s2 in range(S=4)
          A1[0, 0] += alpha[6, 0] * A[0, 0] * B[observations[7], 0] * beta[7, 0] / za
          A1[0, 1] += alpha[6, 0] * A[0, 1] * B[observations[7], 1] * beta[7, 1] / za
          A1[0, 2] += alpha[6, 0] * A[0, 2] * B[observations[7], 2] * beta[7, 2] / za
          A1[0, 3] += alpha[6, 0] * A[0, 3] * B[observations[7], 3] * beta[7, 3] / za
        for s2 in range(S=4)
          A1[1, 0] += alpha[6, 1] * A[1, 0] * B[observations[7], 0] * beta[7, 0] / za
          A1[1, 1] += alpha[6, 1] * A[1, 1] * B[observations[7], 1] * beta[7, 1] / za
          A1[1, 2] += alpha[6, 1] * A[1, 2] * B[observations[7], 2] * beta[7, 2] / za
          A1[1, 3] += alpha[6, 1] * A[1, 3] * B[observations[7], 3] * beta[7, 3] / za
        for s2 in range(S=4)
          A1[2, 0] += alpha[6, 2] * A[2, 0] * B[observations[7], 0] * beta[7, 0] / za
          A1[2, 1] += alpha[6, 2] * A[2, 1] * B[observations[7], 1] * beta[7, 1] / za
          A1[2, 2] += alpha[6, 2] * A[2, 2] * B[observations[7], 2] * beta[7, 2] / za
          A1[2, 3] += alpha[6, 2] * A[2, 3] * B[observations[7], 3] * beta[7, 3] / za
        for s2 in range(S=4)
          A1[3, 0] += alpha[6, 3] * A[3, 0] * B[observations[7], 0] * beta[7, 0] / za
          A1[3, 1] += alpha[6, 3] * A[3, 1] * B[observations[7], 1] * beta[7, 1] / za
          A1[3, 2] += alpha[6, 3] * A[3, 2] * B[observations[7], 2] * beta[7, 2] / za
          A1[3, 3] += alpha[6, 3] * A[3, 3] * B[observations[7], 3] * beta[7, 3] / za
    A1:
    [[ 0.60133413  3.60087644  0.30310735  0.        ]
     [ 0.01330746  0.0798651   3.25446482  0.44642857]
     [ 1.98955006  0.08491596  0.07257868  1.55357143]
     [ 0.          0.          0.          2.        ]]



          D         N         V         E
D  0.133472  0.799250  0.067278  0.000000
N  0.003507  0.021050  0.857778  0.117665
V  0.537627  0.022946  0.019613  0.419814
E  0.000000  0.000000  0.000000  1.000000
               D         N         V    E
the     0.624091  0.340435  0.035473  0.0
dog     0.003609  0.987791  0.008600  0.0
cat     0.005345  0.985369  0.009287  0.0
walked  0.000000  0.000000  1.000000  0.0
ran     0.000000  0.200000  0.800000  0.0
in      1.000000  0.000000  0.000000  0.0
park    0.000000  0.100000  0.900000  0.0
end     0.000000  0.000000  0.000000  1.0