Artificial Intelligence Engineer Nanodegree - Probabilistic Models

Project: Sign Language Recognition System

Introduction
Part 1 Feature Selection
Part 2 Train the models
Part 3 Build a Recognizer
Part 4 (OPTIONAL) Improve the WER with Language Models

Introduction

The overall goal of this project is to build a word recognizer for American Sign Language video sequences, demonstrating the power of probabalistic models. In particular, this project employs hidden Markov models (HMM's) to analyze a series of measurements taken from videos of American Sign Language (ASL) collected for research (see the RWTH-BOSTON-104 Database). In this video, the right-hand x and y locations are plotted as the speaker signs the sentence.

The raw data, train, and test sets are pre-defined. You will derive a variety of feature sets (explored in Part 1), as well as implement three different model selection criterion to determine the optimal number of hidden states for each word model (explored in Part 2). Finally, in Part 3 you will implement the recognizer and compare the effects the different combinations of feature sets and model selection criteria.

At the end of each Part, complete the submission cells with implementations, answer all questions, and pass the unit tests. Then submit the completed notebook for review!

PART 1: Data

Features Tutorial

Load the initial database

A data handler designed for this database is provided in the student codebase as the AslDb class in the asl_data module. This handler creates the initial pandas dataframe from the corpus of data included in the data directory as well as dictionaries suitable for extracting data in a format friendly to the hmmlearn library. We'll use those to create models in Part 2.

To start, let's set up the initial database and select an example set of features for the training set. At the end of Part 1, you will create additional feature sets for experimentation.



In [53]:

    
import numpy as np
import pandas as pd
from asl_data import AslDb


asl = AslDb() # initializes the database
asl.df.head() # displays the first five rows of the asl database, indexed by video and frame



In [54]:

    
asl.df.ix[98,1]  # look at the data available for an individual frame









    Out[54]:





left-x         149
left-y         181
right-x        170
right-y        175
nose-x         161
nose-y          62
speaker    woman-1
Name: (98, 1), dtype: object

The frame represented by video 98, frame 1 is shown here:

Feature selection for training the model

The objective of feature selection when training a model is to choose the most relevant variables while keeping the model as simple as possible, thus reducing training time. We can use the raw features already provided or derive our own and add columns to the pandas dataframe asl.df for selection. As an example, in the next cell a feature named 'grnd-ry' is added. This feature is the difference between the right-hand y value and the nose y value, which serves as the "ground" right y value.



In [55]:

    
asl.df['grnd-ry'] = asl.df['right-y'] - asl.df['nose-y']
asl.df.head()  # the new feature 'grnd-ry' is now in the frames dictionary

Try it!



In [56]:

    
from asl_utils import test_features_tryit
# TODO add df columns for 'grnd-rx', 'grnd-ly', 'grnd-lx' representing differences between hand and nose locations
asl.df['grnd-rx'] = asl.df['right-x'] - asl.df['nose-x']
asl.df['grnd-ly'] = asl.df['left-y'] - asl.df['nose-y']
asl.df['grnd-lx'] = asl.df['left-x'] - asl.df['nose-x']

# test the code
test_features_tryit(asl)









    



asl.df sample






    






  
    
      
      
      left-x
      left-y
      right-x
      right-y
      nose-x
      nose-y
      speaker
      grnd-ry
      grnd-rx
      grnd-ly
      grnd-lx
    
    
      video
      frame
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      98
      0
      149
      181
      170
      175
      161
      62
      woman-1
      113
      9
      119
      -12
    
    
      1
      149
      181
      170
      175
      161
      62
      woman-1
      113
      9
      119
      -12
    
    
      2
      149
      181
      170
      175
      161
      62
      woman-1
      113
      9
      119
      -12
    
    
      3
      149
      181
      170
      175
      161
      62
      woman-1
      113
      9
      119
      -12
    
    
      4
      149
      181
      170
      175
      161
      62
      woman-1
      113
      9
      119
      -12
    
  








    Out[56]:




Correct!



In [57]:

    
# collect the features into a list
features_ground = ['grnd-rx','grnd-ry','grnd-lx','grnd-ly']
 #show a single set of features for a given (video, frame) tuple
[asl.df.ix[98,1][v] for v in features_ground]









    Out[57]:





[9, 113, -12, 119]

Build the training set

Now that we have a feature list defined, we can pass that list to the build_training method to collect the features for all the words in the training set. Each word in the training set has multiple examples from various videos. Below we can see the unique words that have been loaded into the training set:



In [58]:

    
training = asl.build_training(features_ground)
print("Training words: {}".format(training.words))









    



Training words: ['JOHN', 'WRITE', 'HOMEWORK', 'IX-1P', 'SEE', 'YESTERDAY', 'IX', 'LOVE', 'MARY', 'CAN', 'GO', 'GO1', 'FUTURE', 'GO2', 'PARTY', 'FUTURE1', 'HIT', 'BLAME', 'FRED', 'FISH', 'WONT', 'EAT', 'BUT', 'CHICKEN', 'VEGETABLE', 'CHINA', 'PEOPLE', 'PREFER', 'BROCCOLI', 'LIKE', 'LEAVE', 'SAY', 'BUY', 'HOUSE', 'KNOW', 'CORN', 'CORN1', 'THINK', 'NOT', 'PAST', 'LIVE', 'CHICAGO', 'CAR', 'SHOULD', 'DECIDE', 'VISIT', 'MOVIE', 'WANT', 'SELL', 'TOMORROW', 'NEXT-WEEK', 'NEW-YORK', 'LAST-WEEK', 'WILL', 'FINISH', 'ANN', 'READ', 'BOOK', 'CHOCOLATE', 'FIND', 'SOMETHING-ONE', 'POSS', 'BROTHER', 'ARRIVE', 'HERE', 'GIVE', 'MAN', 'NEW', 'COAT', 'WOMAN', 'GIVE1', 'HAVE', 'FRANK', 'BREAK-DOWN', 'SEARCH-FOR', 'WHO', 'WHAT', 'LEG', 'FRIEND', 'CANDY', 'BLUE', 'SUE', 'BUY1', 'STOLEN', 'OLD', 'STUDENT', 'VIDEOTAPE', 'BORROW', 'MOTHER', 'POTATO', 'TELL', 'BILL', 'THROW', 'APPLE', 'NAME', 'SHOOT', 'SAY-1P', 'SELF', 'GROUP', 'JANA', 'TOY1', 'MANY', 'TOY', 'ALL', 'BOY', 'TEACHER', 'GIRL', 'BOX', 'GIVE2', 'GIVE3', 'GET', 'PUTASIDE']

The training data in training is an object of class WordsData defined in the asl_data module. in addition to the words list, data can be accessed with the get_all_sequences, get_all_Xlengths, get_word_sequences, and get_word_Xlengths methods. We need the get_word_Xlengths method to train multiple sequences with the hmmlearn library. In the following example, notice that there are two lists; the first is a concatenation of all the sequences(the X portion) and the second is a list of the sequence lengths(the Lengths portion).



In [59]:

    
training.get_word_Xlengths('CHOCOLATE')









    Out[59]:





(array([[-11,  48,   7, 120],
        [-11,  48,   8, 109],
        [ -8,  49,  11,  98],
        [ -7,  50,   7,  87],
        [ -4,  54,   7,  77],
        [ -4,  54,   6,  69],
        [ -4,  54,   6,  69],
        [-13,  52,   6,  69],
        [-13,  52,   6,  69],
        [ -8,  51,   6,  69],
        [ -8,  51,   6,  69],
        [ -8,  51,   6,  69],
        [ -8,  51,   6,  69],
        [ -8,  51,   6,  69],
        [-10,  59,   7,  71],
        [-15,  64,   9,  77],
        [-17,  75,  13,  81],
        [ -4,  48,  -4, 113],
        [ -2,  53,  -4, 113],
        [ -4,  55,   2,  98],
        [ -4,  58,   2,  98],
        [ -1,  59,   2,  89],
        [ -1,  59,  -1,  84],
        [ -1,  59,  -1,  84],
        [ -7,  63,  -1,  84],
        [ -7,  63,  -1,  84],
        [ -7,  63,   3,  83],
        [ -7,  63,   3,  83],
        [ -7,  63,   3,  83],
        [ -7,  63,   3,  83],
        [ -7,  63,   3,  83],
        [ -7,  63,   3,  83],
        [ -7,  63,   3,  83],
        [ -4,  70,   3,  83],
        [ -4,  70,   3,  83],
        [ -2,  73,   5,  90],
        [ -3,  79,  -4,  96],
        [-15,  98,  13, 135],
        [ -6,  93,  12, 128],
        [ -2,  89,  14, 118],
        [  5,  90,  10, 108],
        [  4,  86,   7, 105],
        [  4,  86,   7, 105],
        [  4,  86,  13, 100],
        [ -3,  82,  14,  96],
        [ -3,  82,  14,  96],
        [  6,  89,  16, 100],
        [  6,  89,  16, 100],
        [  7,  85,  17, 111]]), [17, 20, 12])

More feature sets

So far we have a simple feature set that is enough to get started modeling. However, we might get better results if we manipulate the raw values a bit more, so we will go ahead and set up some other options now for experimentation later. For example, we could normalize each speaker's range of motion with grouped statistics using Pandas stats functions and pandas groupby. Below is an example for finding the means of all speaker subgroups.



In [60]:

    
df_means = asl.df.groupby('speaker').mean()
df_means

To select a mean that matches by speaker, use the pandas map method:



In [61]:

    
asl.df['left-x-mean']= asl.df['speaker'].map(df_means['left-x'])
asl.df.head()









    Out[61]:






  
    
      
      
      left-x
      left-y
      right-x
      right-y
      nose-x
      nose-y
      speaker
      grnd-ry
      grnd-rx
      grnd-ly
      grnd-lx
      left-x-mean
    
    
      video
      frame
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      98
      0
      149
      181
      170
      175
      161
      62
      woman-1
      113
      9
      119
      -12
      164.661438
    
    
      1
      149
      181
      170
      175
      161
      62
      woman-1
      113
      9
      119
      -12
      164.661438
    
    
      2
      149
      181
      170
      175
      161
      62
      woman-1
      113
      9
      119
      -12
      164.661438
    
    
      3
      149
      181
      170
      175
      161
      62
      woman-1
      113
      9
      119
      -12
      164.661438
    
    
      4
      149
      181
      170
      175
      161
      62
      woman-1
      113
      9
      119
      -12
      164.661438

Try it!



In [62]:

    
from asl_utils import test_std_tryit
# TODO Create a dataframe named `df_std` with standard deviations grouped by speaker
df_std = asl.df.groupby('speaker').std()

# test the code
test_std_tryit(df_std)









    



df_std






    






  
    
      
      left-x
      left-y
      right-x
      right-y
      nose-x
      nose-y
      grnd-ry
      grnd-rx
      grnd-ly
      grnd-lx
      left-x-mean
    
    
      speaker
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      man-1
      15.154425
      36.328485
      18.901917
      54.902340
      6.654573
      5.520045
      53.487999
      20.269032
      36.572749
      15.080360
      0.0
    
    
      woman-1
      17.573442
      26.594521
      16.459943
      34.667787
      3.549392
      3.538330
      33.972660
      16.764706
      27.117393
      17.328941
      0.0
    
    
      woman-2
      15.388711
      28.825025
      14.890288
      39.649111
      4.099760
      3.416167
      39.128572
      16.191324
      29.320655
      15.050938
      0.0
    
  








    Out[62]:




Correct!

Features Implementation Submission

Implement four feature sets and answer the question that follows.

normalized Cartesian coordinates
- use mean and standard deviation statistics and the standard score equation to account for speakers with different heights and arm length
polar coordinates
- calculate polar coordinates with Cartesian to polar equations
- use the np.arctan2 function and swap the x and y axes to move the $0$ to $2\pi$ discontinuity to 12 o'clock instead of 3 o'clock; in other words, the normal break in radians value from $0$ to $2\pi$ occurs directly to the left of the speaker's nose, which may be in the signing area and interfere with results. By swapping the x and y axes, that discontinuity move to directly above the speaker's head, an area not generally used in signing.
delta difference
- as described in Thad's lecture, use the difference in values between one frame and the next frames as features
- pandas diff method and fillna method will be helpful for this one
custom features
- These are your own design; combine techniques used above or come up with something else entirely. We look forward to seeing what you come up with! Some ideas to get you started:
  - normalize using a feature scaling equation
  - normalize the polar coordinates
  - adding additional deltas



In [91]:

    
# TODO add features for normalized by speaker values of left, right, x, y
# Name these 'norm-rx', 'norm-ry', 'norm-lx', and 'norm-ly'
# using Z-score scaling (X-Xmean)/Xstd

features_norm = ['norm-rx', 'norm-ry', 'norm-lx','norm-ly']

# Mean matched by speaker
asl.df['right-x-mean']= asl.df['speaker'].map(df_means['right-x'])
asl.df['right-y-mean']= asl.df['speaker'].map(df_means['right-y'])
asl.df['left-x-mean']= asl.df['speaker'].map(df_means['left-x'])
asl.df['left-y-mean']= asl.df['speaker'].map(df_means['left-y'])

# Std dev matched by speaker
asl.df['right-x-std']= asl.df['speaker'].map(df_std['right-x'])
asl.df['right-y-std']= asl.df['speaker'].map(df_std['right-y'])
asl.df['left-x-std']= asl.df['speaker'].map(df_std['left-x'])
asl.df['left-y-std']= asl.df['speaker'].map(df_std['left-y'])

# Add the actual normalized scores
asl.df['norm-rx'] = (asl.df['right-x'] - asl.df['right-x-mean']) / asl.df['right-x-std']
asl.df['norm-ry'] = (asl.df['right-y'] - asl.df['right-y-mean']) / asl.df['right-y-std']
asl.df['norm-lx'] = (asl.df['left-x'] - asl.df['left-x-mean']) / asl.df['left-x-std']
asl.df['norm-ly'] = (asl.df['left-y'] - asl.df['left-y-mean']) / asl.df['left-y-std']



In [92]:

    
# TODO add features for polar coordinate values where the nose is the origin
# Name these 'polar-rr', 'polar-rtheta', 'polar-lr', and 'polar-ltheta'
# Note that 'polar-rr' and 'polar-rtheta' refer to the radius and angle

'''
calculate polar coordinates with Cartesian to polar equations
use the np.arctan2 function and swap the x and y axes to move the  00  to  2π2π  discontinuity 
to 12 o'clock instead of 3 o'clock; in other words, the normal break in radians value from  00  to  2π2π  
occurs directly to the left of the speaker's nose, which may be in the signing area and interfere with results. 
By swapping the x and y axes, that discontinuity move to directly above the speaker's head, an area not generally
used in signing.
'''

features_polar = ['polar-rr', 'polar-rtheta', 'polar-lr', 'polar-ltheta']

asl.df['polar-rr'] = np.sqrt(asl.df['grnd-rx']**2 + asl.df['grnd-ry']**2)
asl.df['polar-rtheta'] = np.arctan2(asl.df['grnd-rx'], asl.df['grnd-ry'])
asl.df['polar-lr'] = np.sqrt(asl.df['grnd-lx']**2 + asl.df['grnd-ly']**2)
asl.df['polar-ltheta'] = np.arctan2(asl.df['grnd-lx'], asl.df['grnd-ly'])



In [93]:

    
# TODO add features for left, right, x, y differences by one time step, i.e. the "delta" values discussed in the lecture
# Name these 'delta-rx', 'delta-ry', 'delta-lx', and 'delta-ly'

features_delta = ['delta-rx', 'delta-ry', 'delta-lx', 'delta-ly']

asl.df['delta-rx'] = asl.df['grnd-rx'].diff()
asl.df['delta-ry'] = asl.df['grnd-ry'].diff()
asl.df['delta-lx'] = asl.df['grnd-lx'].diff()
asl.df['delta-ly'] = asl.df['grnd-ly'].diff()
# Fill with 0 values
asl.df = asl.df.fillna(0)



In [94]:

    
# TODO add features of your own design, which may be a combination of the above or something else
# Name these whatever you would like

# TODO define a list named 'features_custom' for building the training set

# Normalize polar coordinates
features_polar_norm = ['pnorm-rx', 'pnorm-ry', 'pnorm-lx','pnorm-ly']

df_means = asl.df.groupby('speaker').mean()
df_std = asl.df.groupby('speaker').std()

# Mean matched by speaker
asl.df['polar-rr-mean']= asl.df['speaker'].map(df_means['polar-rr'])
asl.df['polar-rtheta-mean']= asl.df['speaker'].map(df_means['polar-rtheta'])
asl.df['polar-lr-mean']= asl.df['speaker'].map(df_means['polar-lr'])
asl.df['polar-ltheta-mean']= asl.df['speaker'].map(df_means['polar-ltheta'])

# Std dev matched by speaker
asl.df['polar-rr-std']= asl.df['speaker'].map(df_std['polar-rr'])
asl.df['polar-rtheta-std']= asl.df['speaker'].map(df_std['polar-rtheta'])
asl.df['polar-lr-std']= asl.df['speaker'].map(df_std['polar-lr'])
asl.df['polar-ltheta-std']= asl.df['speaker'].map(df_std['polar-ltheta'])

# Add the actual normalized scores
asl.df['pnorm-rx'] = (asl.df['polar-rr'] - asl.df['polar-rr-mean']) / asl.df['polar-rr-std']
asl.df['pnorm-ry'] = (asl.df['polar-rtheta'] - asl.df['polar-rtheta-mean']) / asl.df['polar-rtheta-std']
asl.df['pnorm-lx'] = (asl.df['polar-lr'] - asl.df['polar-lr-mean']) / asl.df['polar-lr-std']
asl.df['pnorm-ly'] = (asl.df['polar-ltheta'] - asl.df['polar-ltheta-mean']) / asl.df['polar-ltheta-std']

Question 1: What custom features did you choose for the features_custom set and why?

Answer 1: I chose to normalize the polar coordinate values where the nose is the origin. This ensures that the training data is independent from the speaker's height or body proportions.

Features Unit Testing

Run the following unit tests as a sanity check on the defined "ground", "norm", "polar", and 'delta" feature sets. The test simply looks for some valid values but is not exhaustive. However, the project should not be submitted if these tests don't pass.



In [67]:

    
import unittest
# import numpy as np

class TestFeatures(unittest.TestCase):

    def test_features_ground(self):
        sample = (asl.df.ix[98, 1][features_ground]).tolist()
        self.assertEqual(sample, [9, 113, -12, 119])

    def test_features_norm(self):
        sample = (asl.df.ix[98, 1][features_norm]).tolist()
        np.testing.assert_almost_equal(sample, [ 1.153,  1.663, -0.891,  0.742], 3)

    def test_features_polar(self):
        sample = (asl.df.ix[98,1][features_polar]).tolist()
        np.testing.assert_almost_equal(sample, [113.3578, 0.0794, 119.603, -0.1005], 3)

    def test_features_delta(self):
        sample = (asl.df.ix[98, 0][features_delta]).tolist()
        self.assertEqual(sample, [0, 0, 0, 0])
        sample = (asl.df.ix[98, 18][features_delta]).tolist()
        self.assertTrue(sample in [[-16, -5, -2, 4], [-14, -9, 0, 0]], "Sample value found was {}".format(sample))
                         
suite = unittest.TestLoader().loadTestsFromModule(TestFeatures())
unittest.TextTestRunner().run(suite)









    



....
----------------------------------------------------------------------
Ran 4 tests in 0.016s

OK






    Out[67]:





<unittest.runner.TextTestResult run=4 errors=0 failures=0>

PART 2: Model Selection

Model Selection Tutorial

The objective of Model Selection is to tune the number of states for each word HMM prior to testing on unseen data. In this section you will explore three methods:

Log likelihood using cross-validation folds (CV)
Bayesian Information Criterion (BIC)
Discriminative Information Criterion (DIC)

Train a single word

Now that we have built a training set with sequence data, we can "train" models for each word. As a simple starting example, we train a single word using Gaussian hidden Markov models (HMM). By using the fit method during training, the Baum-Welch Expectation-Maximization (EM) algorithm is invoked iteratively to find the best estimate for the model for the number of hidden states specified from a group of sample seequences. For this example, we assume the correct number of hidden states is 3, but that is just a guess. How do we know what the "best" number of states for training is? We will need to find some model selection technique to choose the best parameter.



In [68]:

    
import warnings
from hmmlearn.hmm import GaussianHMM

def train_a_word(word, num_hidden_states, features):
    
    warnings.filterwarnings("ignore", category=DeprecationWarning)
    training = asl.build_training(features)  
    X, lengths = training.get_word_Xlengths(word)
    model = GaussianHMM(n_components=num_hidden_states, n_iter=1000).fit(X, lengths)
    logL = model.score(X, lengths)
    return model, logL

demoword = 'BOOK'
model, logL = train_a_word(demoword, 3, features_ground)
print("Number of states trained in model for {} is {}".format(demoword, model.n_components))
print("logL = {}".format(logL))









    



Number of states trained in model for BOOK is 3
logL = -2331.1138127433196

The HMM model has been trained and information can be pulled from the model, including means and variances for each feature and hidden state. The log likelihood for any individual sample or group of samples can also be calculated with the score method.



In [69]:

    
def show_model_stats(word, model):
    print("Number of states trained in model for {} is {}".format(word, model.n_components))    
    variance=np.array([np.diag(model.covars_[i]) for i in range(model.n_components)])    
    for i in range(model.n_components):  # for each hidden state
        print("hidden state #{}".format(i))
        print("mean = ", model.means_[i])
        print("variance = ", variance[i])
        print()
    
show_model_stats(demoword, model)









    



Number of states trained in model for BOOK is 3
hidden state #0
mean =  [ -1.12415027  69.44164191  17.02866283  77.7231196 ]
variance =  [ 19.70434594  16.83041492  30.51552305  11.03678246]

hidden state #1
mean =  [ -11.45300909   94.109178     19.03512475  102.2030162 ]
variance =  [  77.403668    203.35441965   26.68898447  156.12444034]

hidden state #2
mean =  [ -3.46504869  50.66686933  14.02391587  52.04731066]
variance =  [ 49.12346305  43.04799144  39.35109609  47.24195772]

Try it!

Experiment by changing the feature set, word, and/or num_hidden_states values in the next cell to see changes in values.



In [70]:

    
my_testword = 'CHOCOLATE'
model, logL = train_a_word(my_testword, 3, features_ground) # Experiment here with different parameters
show_model_stats(my_testword, model)
print("logL = {}".format(logL))









    



Number of states trained in model for CHOCOLATE is 3
hidden state #0
mean =  [   0.58333333   87.91666667   12.75        108.5       ]
variance =  [  39.41055556   18.74388889    9.855       144.4175    ]

hidden state #1
mean =  [ -9.30211403  55.32333876   6.92259936  71.24057775]
variance =  [ 16.16920957  46.50917372   3.81388185  15.79446427]

hidden state #2
mean =  [ -5.40587658  60.1652424    2.32479599  91.3095432 ]
variance =  [   7.95073876   64.13103127   13.68077479  129.5912395 ]

logL = -601.3291470028619

Visualize the hidden states

We can plot the means and variances for each state and feature. Try varying the number of states trained for the HMM model and examine the variances. Are there some models that are "better" than others? How can you tell? We would like to hear what you think in the classroom online.



In [71]:

    
%matplotlib inline



In [72]:

    
import math
from matplotlib import (cm, pyplot as plt, mlab)

def visualize(word, model):
    """ visualize the input model for a particular word """
    variance=np.array([np.diag(model.covars_[i]) for i in range(model.n_components)])
    figures = []
    for parm_idx in range(len(model.means_[0])):
        xmin = int(min(model.means_[:,parm_idx]) - max(variance[:,parm_idx]))
        xmax = int(max(model.means_[:,parm_idx]) + max(variance[:,parm_idx]))
        fig, axs = plt.subplots(model.n_components, sharex=True, sharey=False)
        colours = cm.rainbow(np.linspace(0, 1, model.n_components))
        for i, (ax, colour) in enumerate(zip(axs, colours)):
            x = np.linspace(xmin, xmax, 100)
            mu = model.means_[i,parm_idx]
            sigma = math.sqrt(np.diag(model.covars_[i])[parm_idx])
            ax.plot(x, mlab.normpdf(x, mu, sigma), c=colour)
            ax.set_title("{} feature {} hidden state #{}".format(word, parm_idx, i))

            ax.grid(True)
        figures.append(plt)
    for p in figures:
        p.show()
        
visualize(my_testword, model)

ModelSelector class

Review the ModelSelector class from the codebase found in the my_model_selectors.py module. It is designed to be a strategy pattern for choosing different model selectors. For the project submission in this section, subclass SelectorModel to implement the following model selectors. In other words, you will write your own classes/functions in the my_model_selectors.py module and run them from this notebook:

SelectorCV: Log likelihood with CV
SelectorBIC: BIC
SelectorDIC: DIC

You will train each word in the training set with a range of values for the number of hidden states, and then score these alternatives with the model selector, choosing the "best" according to each strategy. The simple case of training with a constant value for n_components can be called using the provided SelectorConstant subclass as follow:



In [73]:

    
from my_model_selectors import SelectorConstant

training = asl.build_training(features_ground)  # Experiment here with different feature sets defined in part 1
word = 'VEGETABLE' # Experiment here with different words
model = SelectorConstant(training.get_all_sequences(), training.get_all_Xlengths(), word, n_constant=3).select()
print("Number of states trained in model for {} is {}".format(word, model.n_components))









    



Number of states trained in model for VEGETABLE is 3

Cross-validation folds

If we simply score the model with the Log Likelihood calculated from the feature sequences it has been trained on, we should expect that more complex models will have higher likelihoods. However, that doesn't tell us which would have a better likelihood score on unseen data. The model will likely be overfit as complexity is added. To estimate which topology model is better using only the training data, we can compare scores using cross-validation. One technique for cross-validation is to break the training set into "folds" and rotate which fold is left out of training. The "left out" fold scored. This gives us a proxy method of finding the best model to use on "unseen data". In the following example, a set of word sequences is broken into three folds using the scikit-learn Kfold class object. When you implement SelectorCV, you will use this technique.



In [74]:

    
from sklearn.model_selection import KFold

training = asl.build_training(features_ground) # Experiment here with different feature sets
word = 'VEGETABLE' # Experiment here with different words
word_sequences = training.get_word_sequences(word)
split_method = KFold()
for cv_train_idx, cv_test_idx in split_method.split(word_sequences):
    print("Train fold indices:{} Test fold indices:{}".format(cv_train_idx, cv_test_idx))  # view indices of the folds









    



Train fold indices:[2 3 4 5] Test fold indices:[0 1]
Train fold indices:[0 1 4 5] Test fold indices:[2 3]
Train fold indices:[0 1 2 3] Test fold indices:[4 5]

Tip: In order to run hmmlearn training using the X,lengths tuples on the new folds, subsets must be combined based on the indices given for the folds. A helper utility has been provided in the asl_utils module named combine_sequences for this purpose.

Scoring models with other criterion

Scoring model topologies with BIC balances fit and complexity within the training set for each word. In the BIC equation, a penalty term penalizes complexity to avoid overfitting, so that it is not necessary to also use cross-validation in the selection process. There are a number of references on the internet for this criterion. These slides include a formula you may find helpful for your implementation.

The advantages of scoring model topologies with DIC over BIC are presented by Alain Biem in this reference (also found here). DIC scores the discriminant ability of a training set for one word against competing words. Instead of a penalty term for complexity, it provides a penalty if model liklihoods for non-matching words are too similar to model likelihoods for the correct word in the word set.

Model Selection Implementation Submission

Implement SelectorCV, SelectorBIC, and SelectorDIC classes in the my_model_selectors.py module. Run the selectors on the following five words. Then answer the questions about your results.

Tip: The hmmlearn library may not be able to train or score all models. Implement try/except contructs as necessary to eliminate non-viable models from consideration.



In [75]:

    
words_to_train = ['FISH', 'BOOK', 'VEGETABLE', 'FUTURE', 'JOHN']
import timeit



In [76]:

    
# TODO: Implement SelectorCV in my_model_selector.py
%load_ext autoreload
%autoreload 2
from my_model_selectors import SelectorCV

training = asl.build_training(features_ground)  # Experiment here with different feature sets defined in part 1
sequences = training.get_all_sequences()
Xlengths = training.get_all_Xlengths()
for word in words_to_train:
    start = timeit.default_timer()
    model = SelectorCV(sequences, Xlengths, word, 
                    min_n_components=2, max_n_components=15, random_state = 14).select()
    end = timeit.default_timer()-start
    if model is not None:
        print("Training complete for {} with {} states with time {} seconds".format(word, model.n_components, end))
    else:
        print("Training failed for {}".format(word))









    



The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
Training complete for FISH with 11 states with time 0.3672332529677078 seconds
Training complete for BOOK with 6 states with time 3.8963487760629505 seconds
Training complete for VEGETABLE with 2 states with time 1.7080269129946828 seconds
Training complete for FUTURE with 2 states with time 3.7040580259636045 seconds
Training complete for JOHN with 12 states with time 40.28217240597587 seconds



In [77]:

    
# TODO: Implement SelectorBIC in module my_model_selectors.py
%load_ext autoreload
%autoreload 2
from my_model_selectors import SelectorBIC

training = asl.build_training(features_ground)  # Experiment here with different feature sets defined in part 1
sequences = training.get_all_sequences()
Xlengths = training.get_all_Xlengths()
for word in words_to_train:
    start = timeit.default_timer()
    model = SelectorBIC(sequences, Xlengths, word, 
                    min_n_components=2, max_n_components=15, random_state = 14).select()
    end = timeit.default_timer()-start
    if model is not None:
        print("Training complete for {} with {} states with time {} seconds".format(word, model.n_components, end))
    else:
        print("Training failed for {}".format(word))









    



The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
Training complete for FISH with 5 states with time 0.34091994003392756 seconds
Training complete for BOOK with 8 states with time 2.053617767058313 seconds
Training complete for VEGETABLE with 9 states with time 0.6795851830393076 seconds
Training complete for FUTURE with 9 states with time 2.0689358439994976 seconds
Training complete for JOHN with 13 states with time 18.892435212037526 seconds



In [78]:

    
# TODO: Implement SelectorDIC in module my_model_selectors.py
%load_ext autoreload
%autoreload 2
from my_model_selectors import SelectorDIC

training = asl.build_training(features_ground)  # Experiment here with different feature sets defined in part 1
sequences = training.get_all_sequences()
Xlengths = training.get_all_Xlengths()
for word in words_to_train:
    start = timeit.default_timer()
    model = SelectorDIC(sequences, Xlengths, word, 
                    min_n_components=2, max_n_components=15, random_state = 14).select()
    end = timeit.default_timer()-start
    if model is not None:
        print("Training complete for {} with {} states with time {} seconds".format(word, model.n_components, end))
    else:
        print("Training failed for {}".format(word))









    



The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
Training complete for FISH with 3 states with time 0.8208568809786811 seconds
Training complete for BOOK with 15 states with time 3.841317162034102 seconds
Training complete for VEGETABLE with 15 states with time 3.0880453570280224 seconds
Training complete for FUTURE with 15 states with time 3.9322286419337615 seconds
Training complete for JOHN with 15 states with time 21.35386147990357 seconds

Question 2: Compare and contrast the possible advantages and disadvantages of the various model selectors implemented.

Answer 2: Cross validation advantages: it creates models that generalize well to an unknown dataset, thus reducing overfitting. This is done by splitting the data and using each fold as validation while the remaining folds form the training set. Cross validation disadvatages: it needs a big enough dataset.

BIC advantages: penalizes complexity (big number of free parameters) in an effort to combat overfitting.

DIC advantages: DIC discriminates more efficiently between the given words because it makes sure that the there is a big difference between the log likelihood of a word model and the log likelihood of all the other words (using the same model). DIC is better at solving the classification problem.
DIC disadvantages: If the number of words drastically increases, the execution time will also increase due to calculating the log likelihoods combinations for all of the words.

Model Selector Unit Testing

Run the following unit tests as a sanity check on the implemented model selectors. The test simply looks for valid interfaces but is not exhaustive. However, the project should not be submitted if these tests don't pass.



In [79]:

    
from asl_test_model_selectors import TestSelectors
suite = unittest.TestLoader().loadTestsFromModule(TestSelectors())
unittest.TextTestRunner().run(suite)









    



....
----------------------------------------------------------------------
Ran 4 tests in 45.085s

OK






    Out[79]:





<unittest.runner.TextTestResult run=4 errors=0 failures=0>

PART 3: Recognizer

The objective of this section is to "put it all together". Using the four feature sets created and the three model selectors, you will experiment with the models and present your results. Instead of training only five specific words as in the previous section, train the entire set with a feature set and model selector strategy.

Recognizer Tutorial

Train the full training set

The following example trains the entire set with the example features_ground and SelectorConstant features and model selector. Use this pattern for you experimentation and final submission cells.



In [80]:

    
# autoreload for automatically reloading changes made in my_model_selectors and my_recognizer
%load_ext autoreload
%autoreload 2

from my_model_selectors import SelectorConstant

def train_all_words(features, model_selector):
    training = asl.build_training(features)  # Experiment here with different feature sets defined in part 1
    sequences = training.get_all_sequences()
    Xlengths = training.get_all_Xlengths()
    model_dict = {}
    for word in training.words:
        model = model_selector(sequences, Xlengths, word, 
                        n_constant=3).select()
        model_dict[word]=model
    return model_dict

models = train_all_words(features_ground, SelectorConstant)
print("Number of word models returned = {}".format(len(models)))









    



The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
Number of word models returned = 112

Load the test set

The build_test method in ASLdb is similar to the build_training method already presented, but there are a few differences:

the object is type SinglesData
the internal dictionary keys are the index of the test word rather than the word itself
the getter methods are get_all_sequences, get_all_Xlengths, get_item_sequences and get_item_Xlengths



In [81]:

    
test_set = asl.build_test(features_ground)
print("Number of test set items: {}".format(test_set.num_items))
print("Number of test set sentences: {}".format(len(test_set.sentences_index)))









    



Number of test set items: 178
Number of test set sentences: 40

Recognizer Implementation Submission

For the final project submission, students must implement a recognizer following guidance in the my_recognizer.py module. Experiment with the four feature sets and the three model selection methods (that's 12 possible combinations). You can add and remove cells for experimentation or run the recognizers locally in some other way during your experiments, but retain the results for your discussion. For submission, you will provide code cells of only three interesting combinations for your discussion (see questions below). At least one of these should produce a word error rate of less than 60%, i.e. WER < 0.60 .

Tip: The hmmlearn library may not be able to train or score all models. Implement try/except contructs as necessary to eliminate non-viable models from consideration.



In [82]:

    
# TODO implement the recognize method in my_recognizer
%load_ext autoreload
%autoreload 2

from my_recognizer import recognize
from asl_utils import show_errors









    



The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload



In [88]:

    
# TODO Choose a feature set and model selector
features = features_ground
model_selector = SelectorCV

# TODO Recognize the test set and display the result with the show_errors method
models = train_all_words(features, model_selector)
test_set = asl.build_test(features)
probabilities, guesses = recognize(models, test_set)
show_errors(guesses, test_set)









    



**** WER = 0.5337078651685393
Total correct: 83 out of 178
Video  Recognized                                                    Correct
=====================================================================================================
    2: JOHN WRITE *ARRIVE                                            JOHN WRITE HOMEWORK
    7: JOHN *GO1 *HAVE *WHAT                                         JOHN CAN GO CAN
   12: *IX CAN *CAN CAN                                              JOHN CAN GO CAN
   21: JOHN *HOMEWORK *JOHN *TELL *CAR *CAR *GO *BROTHER             JOHN FISH WONT EAT BUT CAN EAT CHICKEN
   25: JOHN *TELL IX *TELL IX                                        JOHN LIKE IX IX IX
   28: JOHN *TELL IX IX IX                                           JOHN LIKE IX IX IX
   30: JOHN LIKE IX *MARY IX                                         JOHN LIKE IX IX IX
   36: *WHO VEGETABLE *IX *GIVE *BILL *MARY                          MARY VEGETABLE KNOW IX LIKE CORN1
   40: *JANA *BILL *FUTURE1 *JANA *IX                                JOHN IX THINK MARY LOVE
   43: JOHN *SHOULD BUY HOUSE                                        JOHN MUST BUY HOUSE
   50: *JOHN *SEE BUY CAR *ARRIVE                                    FUTURE JOHN BUY CAR SHOULD
   54: JOHN SHOULD *FINISH BUY HOUSE                                 JOHN SHOULD NOT BUY HOUSE
   57: *MARY *MARY *MARY *IX                                         JOHN DECIDE VISIT MARY
   67: JOHN *MOTHER NOT BUY HOUSE                                    JOHN FUTURE NOT BUY HOUSE
   71: JOHN *FINISH *GIVE1 MARY                                      JOHN WILL VISIT MARY
   74: *IX *BILL *MARY MARY                                          JOHN NOT VISIT MARY
   77: *JOHN BLAME *LOVE                                             ANN BLAME MARY
   84: *LOVE *ARRIVE *HOMEWORK *COAT                                 IX-1P FIND SOMETHING-ONE BOOK
   89: *GIVE *GIVE GIVE *IX IX *ARRIVE COAT                          JOHN IX GIVE MAN IX NEW COAT
   90: JOHN *GIVE1 IX *IX WOMAN BOOK                                 JOHN GIVE IX SOMETHING-ONE WOMAN BOOK
   92: JOHN GIVE IX *IX WOMAN BOOK                                   JOHN GIVE IX SOMETHING-ONE WOMAN BOOK
  100: POSS NEW CAR BREAK-DOWN                                       POSS NEW CAR BREAK-DOWN
  105: JOHN *SEE                                                     JOHN LEG
  107: *LIKE *IX FRIEND *VISIT *JOHN                                 JOHN POSS FRIEND HAVE CANDY
  108: *GIVE *LOVE                                                   WOMAN ARRIVE
  113: IX CAR BLUE SUE *ARRIVE                                       IX CAR BLUE SUE BUY
  119: *VEGETABLE *BUY1 IX CAR *SUE                                  SUE BUY IX CAR BLUE
  122: JOHN *GIVE1 BOOK                                              JOHN READ BOOK
  139: JOHN *BUY1 *CAN *VISIT BOOK                                   JOHN BUY WHAT YESTERDAY BOOK
  142: JOHN *VIDEOTAPE YESTERDAY *TEACHER BOOK                       JOHN BUY YESTERDAY WHAT BOOK
  158: LOVE JOHN *VEGETABLE                                          LOVE JOHN WHO
  167: JOHN *SUE *BILL LOVE *LOVE                                    JOHN IX SAY LOVE MARY
  171: JOHN *JOHN BLAME                                              JOHN MARY BLAME
  174: *WHAT *GIVE1 GIVE1 *APPLE *WHAT                               PEOPLE GROUP GIVE1 JANA TOY
  181: JOHN ARRIVE                                                   JOHN ARRIVE
  184: *GIVE1 *IX *GIVE1 TEACHER APPLE                               ALL BOY GIVE TEACHER APPLE
  189: JOHN *IX *APPLE *CAN                                          JOHN GIVE GIRL BOX
  193: JOHN *GIVE1 *GIVE1 BOX                                        JOHN GIVE GIRL BOX
  199: *LOVE CHOCOLATE *TELL                                         LIKE CHOCOLATE WHO
  201: JOHN *SHOULD *GIVE *JOHN *ARRIVE HOUSE                        JOHN TELL MARY IX-1P BUY HOUSE



In [89]:

    
# TODO Choose a feature set and model selector
features = features_polar
model_selector = SelectorBIC

# TODO Recognize the test set and display the result with the show_errors method
models = train_all_words(features, model_selector)
test_set = asl.build_test(features)
probabilities, guesses = recognize(models, test_set)
show_errors(guesses, test_set)









    



**** WER = 0.5449438202247191
Total correct: 81 out of 178
Video  Recognized                                                    Correct
=====================================================================================================
    2: *GO WRITE *NEW                                                JOHN WRITE HOMEWORK
    7: JOHN *PEOPLE GO *ARRIVE                                       JOHN CAN GO CAN
   12: JOHN *WHAT *GO1 CAN                                           JOHN CAN GO CAN
   21: JOHN *NEW WONT *NOT *GIVE1 *TEACHER *FUTURE *WHO              JOHN FISH WONT EAT BUT CAN EAT CHICKEN
   25: JOHN LIKE *LOVE *WHO IX                                       JOHN LIKE IX IX IX
   28: JOHN *WHO *FUTURE *WHO IX                                     JOHN LIKE IX IX IX
   30: JOHN LIKE *MARY *MARY *MARY                                   JOHN LIKE IX IX IX
   36: *VISIT VEGETABLE *GIRL *GIVE *MARY *MARY                      MARY VEGETABLE KNOW IX LIKE CORN1
   40: JOHN *VISIT *FUTURE1 *JOHN *MARY                              JOHN IX THINK MARY LOVE
   43: JOHN *FUTURE BUY HOUSE                                        JOHN MUST BUY HOUSE
   50: *JOHN *SEE *STUDENT CAR *JOHN                                 FUTURE JOHN BUY CAR SHOULD
   54: JOHN SHOULD *WHO BUY HOUSE                                    JOHN SHOULD NOT BUY HOUSE
   57: *MARY *VISIT VISIT MARY                                       JOHN DECIDE VISIT MARY
   67: *SHOULD FUTURE *MARY BUY HOUSE                                JOHN FUTURE NOT BUY HOUSE
   71: JOHN *FINISH *GIVE1 MARY                                      JOHN WILL VISIT MARY
   74: *IX *VISIT *GIVE MARY                                         JOHN NOT VISIT MARY
   77: *JOHN BLAME *LOVE                                             ANN BLAME MARY
   84: *HOMEWORK *GIVE1 *GIVE1 BOOK                                  IX-1P FIND SOMETHING-ONE BOOK
   89: *GIVE *GIVE *WOMAN *WOMAN IX *ARRIVE *BREAK-DOWN              JOHN IX GIVE MAN IX NEW COAT
   90: JOHN *HAVE IX SOMETHING-ONE *VISIT *BREAK-DOWN                JOHN GIVE IX SOMETHING-ONE WOMAN BOOK
   92: JOHN *WOMAN IX *WOMAN WOMAN BOOK                              JOHN GIVE IX SOMETHING-ONE WOMAN BOOK
  100: POSS NEW CAR BREAK-DOWN                                       POSS NEW CAR BREAK-DOWN
  105: JOHN *VEGETABLE                                               JOHN LEG
  107: JOHN *IX *HAVE *GO *JANA                                      JOHN POSS FRIEND HAVE CANDY
  108: *JOHN *HOMEWORK                                               WOMAN ARRIVE
  113: IX CAR *IX *IX *BUY1                                          IX CAR BLUE SUE BUY
  119: *PREFER *BUY1 *CAR CAR *GO                                    SUE BUY IX CAR BLUE
  122: JOHN *GIVE1 BOOK                                              JOHN READ BOOK
  139: JOHN *BUY1 WHAT *BLAME *CHOCOLATE                             JOHN BUY WHAT YESTERDAY BOOK
  142: JOHN BUY YESTERDAY WHAT BOOK                                  JOHN BUY YESTERDAY WHAT BOOK
  158: LOVE JOHN WHO                                                 LOVE JOHN WHO
  167: JOHN IX *VISIT LOVE MARY                                      JOHN IX SAY LOVE MARY
  171: JOHN *IX BLAME                                                JOHN MARY BLAME
  174: *JOHN *GIVE3 GIVE1 *YESTERDAY *JOHN                           PEOPLE GROUP GIVE1 JANA TOY
  181: *EAT ARRIVE                                                   JOHN ARRIVE
  184: ALL BOY *GIVE1 TEACHER APPLE                                  ALL BOY GIVE TEACHER APPLE
  189: *MARY *VISIT *VISIT BOX                                       JOHN GIVE GIRL BOX
  193: JOHN *POSS *VISIT BOX                                         JOHN GIVE GIRL BOX
  199: *HOMEWORK *VIDEOTAPE *JOHN                                    LIKE CHOCOLATE WHO
  201: JOHN *MAN *MAN *LIKE BUY HOUSE                                JOHN TELL MARY IX-1P BUY HOUSE



In [90]:

    
# TODO Choose a feature set and model selector
features = features_polar
model_selector = SelectorDIC

# TODO Recognize the test set and display the result with the show_errors method
models = train_all_words(features, model_selector)
test_set = asl.build_test(features)
probabilities, guesses = recognize(models, test_set)
show_errors(guesses, test_set)









    



**** WER = 0.5449438202247191
Total correct: 81 out of 178
Video  Recognized                                                    Correct
=====================================================================================================
    2: JOHN *NEW *GIVE1                                              JOHN WRITE HOMEWORK
    7: JOHN CAN GO CAN                                               JOHN CAN GO CAN
   12: JOHN *WHAT *JOHN CAN                                          JOHN CAN GO CAN
   21: JOHN *NEW *JOHN *PREFER *GIVE1 *WHAT *FUTURE *WHO             JOHN FISH WONT EAT BUT CAN EAT CHICKEN
   25: JOHN *IX IX *WHO IX                                           JOHN LIKE IX IX IX
   28: JOHN *FUTURE IX *FUTURE *LOVE                                 JOHN LIKE IX IX IX
   30: JOHN LIKE *MARY *MARY *MARY                                   JOHN LIKE IX IX IX
   36: *IX *VISIT *GIVE *GIVE *MARY *MARY                            MARY VEGETABLE KNOW IX LIKE CORN1
   40: JOHN *GO *GIVE *JOHN *MARY                                    JOHN IX THINK MARY LOVE
   43: JOHN *IX BUY HOUSE                                            JOHN MUST BUY HOUSE
   50: *JOHN *SEE BUY CAR *JOHN                                      FUTURE JOHN BUY CAR SHOULD
   54: JOHN SHOULD NOT BUY HOUSE                                     JOHN SHOULD NOT BUY HOUSE
   57: *MARY *GO *GO MARY                                            JOHN DECIDE VISIT MARY
   67: *SHOULD FUTURE *MARY BUY HOUSE                                JOHN FUTURE NOT BUY HOUSE
   71: JOHN *FUTURE *GIVE1 MARY                                      JOHN WILL VISIT MARY
   74: *IX *GO *GO *VISIT                                            JOHN NOT VISIT MARY
   77: *JOHN *GIVE1 MARY                                             ANN BLAME MARY
   84: *HOMEWORK *GIVE1 *GIVE1 *COAT                                 IX-1P FIND SOMETHING-ONE BOOK
   89: *GIVE *GIVE *WOMAN *WOMAN IX *ARRIVE *BOOK                    JOHN IX GIVE MAN IX NEW COAT
   90: JOHN GIVE IX SOMETHING-ONE WOMAN *ARRIVE                      JOHN GIVE IX SOMETHING-ONE WOMAN BOOK
   92: JOHN *WOMAN IX *WOMAN WOMAN BOOK                              JOHN GIVE IX SOMETHING-ONE WOMAN BOOK
  100: POSS NEW CAR BREAK-DOWN                                       POSS NEW CAR BREAK-DOWN
  105: JOHN *SEE                                                     JOHN LEG
  107: JOHN POSS *HAVE HAVE *MARY                                    JOHN POSS FRIEND HAVE CANDY
  108: *LOVE *LOVE                                                   WOMAN ARRIVE
  113: IX CAR *IX *MARY *JOHN                                        IX CAR BLUE SUE BUY
  119: *MARY *BUY1 IX *BLAME *IX                                     SUE BUY IX CAR BLUE
  122: JOHN *GIVE1 BOOK                                              JOHN READ BOOK
  139: JOHN *ARRIVE WHAT *MARY *ARRIVE                               JOHN BUY WHAT YESTERDAY BOOK
  142: JOHN BUY YESTERDAY WHAT BOOK                                  JOHN BUY YESTERDAY WHAT BOOK
  158: LOVE JOHN WHO                                                 LOVE JOHN WHO
  167: JOHN *MARY *VISIT LOVE MARY                                   JOHN IX SAY LOVE MARY
  171: *IX MARY BLAME                                                JOHN MARY BLAME
  174: *JOHN *JOHN GIVE1 *YESTERDAY *JOHN                            PEOPLE GROUP GIVE1 JANA TOY
  181: *EAT ARRIVE                                                   JOHN ARRIVE
  184: *GO BOY *GIVE1 TEACHER *YESTERDAY                             ALL BOY GIVE TEACHER APPLE
  189: *MARY *GO *YESTERDAY BOX                                      JOHN GIVE GIRL BOX
  193: JOHN *GO *YESTERDAY BOX                                       JOHN GIVE GIRL BOX
  199: *JOHN *STUDENT *GO                                            LIKE CHOCOLATE WHO
  201: JOHN *MAN *LOVE *JOHN BUY HOUSE                               JOHN TELL MARY IX-1P BUY HOUSE

Question 3: Summarize the error results from three combinations of features and model selectors. What was the "best" combination and why? What additional information might we use to improve our WER? For more insight on improving WER, take a look at the introduction to Part 4.

Answer 3:

combination	WER	correct	correct %
ground_CV	0.534	83	46.63
ground_BIC	0.551	80	44.94
ground_DIC	0.573	76	42.70
norm_CV	0.607	70	39.33
norm_BIC	0.612	69	38.76
norm_DIC	0.596	72	40.45
polar_CV	0.562	78	43.82
polar_BIC	0.545	81	45.51
polar_DIC	0.545	81	45.51
delta_CV	0.601	71	39.89
delta_BIC	0.601	71	39.89
delta_DIC	0.624	67	37.64
polar_norm_CV	0.629	66	37.08
polar_norm_BIC	0.596	72	40.45
polar_norm_DIC	0.573	76	42.70

It can be seen from the above figure that the best results were obtained by using ground features in combination with the log likelihood and cross validation model selector. WER was 0.534, with a 46.63% rate of correct guesses.

A good performance was also obtained by using polar coordinate values where the nose is the origin in combination with BIC (WER = 0.545, 45.51% correct guesses) and DIC (WER = 0.545, 45.51% correct guesses).

I would have expected BIC or DIC to generally perform better than CV, due to the fact that we have a small dataset. This was the case only for the polar coordinate features and normalized polar coordinates. However, cross validation has the advantage of creating models that generalize well to an unknown dataset, thus reducing overfitting.

Improving WER can be done by using Language Models. The basic idea is that each word has some probability of occurrence within the set, and some probability that it is adjacent to specific other words. We can use that additional information to make better choices.

Recognizer Unit Tests

Run the following unit tests as a sanity check on the defined recognizer. The test simply looks for some valid values but is not exhaustive. However, the project should not be submitted if these tests don't pass.



In [86]:

    
from asl_test_recognizer import TestRecognize
suite = unittest.TestLoader().loadTestsFromModule(TestRecognize())
unittest.TextTestRunner().run(suite)









    



..
----------------------------------------------------------------------
Ran 2 tests in 27.227s

OK






    Out[86]:





<unittest.runner.TextTestResult run=2 errors=0 failures=0>

PART 4: (OPTIONAL) Improve the WER with Language Models

We've squeezed just about as much as we can out of the model and still only get about 50% of the words right! Surely we can do better than that. Probability to the rescue again in the form of statistical language models (SLM). The basic idea is that each word has some probability of occurrence within the set, and some probability that it is adjacent to specific other words. We can use that additional information to make better choices.

Additional reading and resources

Introduction to N-grams (Stanford Jurafsky slides)
Speech Recognition Techniques for a Sign Language Recognition System, Philippe Dreuw et al see the improved results of applying LM on this data!
SLM data for this ASL dataset

Optional challenge

The recognizer you implemented in Part 3 is equivalent to a "0-gram" SLM. Improve the WER with the SLM data provided with the data set in the link above using "1-gram", "2-gram", and/or "3-gram" statistics. The probabilities data you've already calculated will be useful and can be turned into a pandas DataFrame if desired (see next cell).
Good luck! Share your results with the class!



In [87]:

    
# create a DataFrame of log likelihoods for the test word items
df_probs = pd.DataFrame(data=probabilities)
df_probs.head()









    Out[87]:






  
    
      
      ALL
      ANN
      APPLE
      ARRIVE
      BILL
      BLAME
      BLUE
      BOOK
      BORROW
      BOX
      ...
      VIDEOTAPE
      VISIT
      WANT
      WHAT
      WHO
      WILL
      WOMAN
      WONT
      WRITE
      YESTERDAY
    
  
  
    
      0
      -693.405558
      -338.294312
      -979.781878
      -430.812656
      -941.055774
      -318.238598
      -1884.794491
      -682.990874
      -4866.802613
      -482.169554
      ...
      -2675.430976
      -83.655181
      -4344.867156
      -502.618765
      -104.711329
      -646.662179
      -206.439041
      -747.947983
      -701.536318
      -202.613672
    
    
      1
      -5531.242556
      -3814.394279
      -2363.793192
      -27.762983
      -7492.019871
      -36.317406
      -2594.451195
      -22.156978
      -2349.415295
      -69.539959
      ...
      -150.135572
      -128.928661
      -2293.694544
      -243.808758
      -270.237336
      -3862.926554
      -270.683059
      -1682.842124
      1.801576
      -331.304835
    
    
      2
      -6399.677184
      -3935.206313
      -4155.638323
      -158.309482
      -8697.287164
      -370.715853
      -3222.084415
      -655.206050
      -4782.450338
      -276.098683
      ...
      -890.794568
      -335.336057
      -4056.722207
      -673.227315
      -952.947111
      -4876.798459
      -919.867201
      -1823.365399
      -661.980684
      -627.748928
    
    
      3
      -838.181917
      -1035.163788
      -1295.688812
      -227.511579
      -644.723220
      -315.781516
      -982.838734
      -772.193536
      -8231.938850
      -569.347528
      ...
      -3741.368670
      -123.747136
      -3825.822413
      -261.947442
      -52.680631
      -785.774356
      -297.913105
      -269.265750
      -1268.259147
      -426.377717
    
    
      4
      -1552.638889
      -1481.527991
      -523.065429
      -24.473713
      -2863.011296
      -43.865201
      -618.346576
      -120.053695
      -1553.931904
      -55.569892
      ...
      -148.472515
      -15.100647
      -480.609607
      -56.537677
      -229.614280
      -1451.306928
      -134.362640
      -389.042577
      -90.813368
      -131.896009
    
  

5 rows × 112 columns

	left-x	left-y	right-x	right-y	nose-x	nose-y	grnd-ry	grnd-rx	grnd-ly	grnd-lx
speaker
man-1	206.248203	218.679449	155.464350	150.371031	175.031756	61.642600	88.728430	-19.567406	157.036848	31.216447
woman-1	164.661438	161.271242	151.017865	117.332462	162.655120	57.245098	60.087364	-11.637255	104.026144	2.006318
woman-2	183.214509	176.527232	156.866295	119.835714	170.318973	58.022098	61.813616	-13.452679	118.505134	12.895536

		left-x	left-y	right-x	right-y	nose-x	nose-y	speaker
video	frame
98	0	149	181	170	175	161	62	woman-1
	1	149	181	170	175	161	62	woman-1
	2	149	181	170	175	161	62	woman-1
	3	149	181	170	175	161	62	woman-1
	4	149	181	170	175	161	62	woman-1

	left-x	left-y	right-x	right-y	nose-x	nose-y	grnd-ry	grnd-rx	grnd-ly	grnd-lx	left-x-mean
speaker
man-1	15.154425	36.328485	18.901917	54.902340	6.654573	5.520045	53.487999	20.269032	36.572749	15.080360	0.0
woman-1	17.573442	26.594521	16.459943	34.667787	3.549392	3.538330	33.972660	16.764706	27.117393	17.328941	0.0
woman-2	15.388711	28.825025	14.890288	39.649111	4.099760	3.416167	39.128572	16.191324	29.320655	15.050938	0.0

	ALL	ANN	APPLE	ARRIVE	BILL	BLAME	BLUE	BOOK	BORROW	BOX	...	VIDEOTAPE	VISIT	WANT	WHAT	WHO	WILL	WOMAN	WONT	WRITE	YESTERDAY
0	-693.405558	-338.294312	-979.781878	-430.812656	-941.055774	-318.238598	-1884.794491	-682.990874	-4866.802613	-482.169554	...	-2675.430976	-83.655181	-4344.867156	-502.618765	-104.711329	-646.662179	-206.439041	-747.947983	-701.536318	-202.613672
1	-5531.242556	-3814.394279	-2363.793192	-27.762983	-7492.019871	-36.317406	-2594.451195	-22.156978	-2349.415295	-69.539959	...	-150.135572	-128.928661	-2293.694544	-243.808758	-270.237336	-3862.926554	-270.683059	-1682.842124	1.801576	-331.304835
2	-6399.677184	-3935.206313	-4155.638323	-158.309482	-8697.287164	-370.715853	-3222.084415	-655.206050	-4782.450338	-276.098683	...	-890.794568	-335.336057	-4056.722207	-673.227315	-952.947111	-4876.798459	-919.867201	-1823.365399	-661.980684	-627.748928
3	-838.181917	-1035.163788	-1295.688812	-227.511579	-644.723220	-315.781516	-982.838734	-772.193536	-8231.938850	-569.347528	...	-3741.368670	-123.747136	-3825.822413	-261.947442	-52.680631	-785.774356	-297.913105	-269.265750	-1268.259147	-426.377717
4	-1552.638889	-1481.527991	-523.065429	-24.473713	-2863.011296	-43.865201	-618.346576	-120.053695	-1553.931904	-55.569892	...	-148.472515	-15.100647	-480.609607	-56.537677	-229.614280	-1451.306928	-134.362640	-389.042577	-90.813368	-131.896009