Part 1: The Swiss Alp Horn (60 pts)

Given a model of a system's transition and emission probabilities and a series of evidence observations, what is the most likely sequence of states that generated the observed evidence?

In a recent trip to Switzerland, you discover the Swiss Alp Horn - used for remote villages to communicate between the different Alps. This large horn is difficult to play for beginners, and you discover you can make about 4 notes on it - badly.

You don't feel so bad - an expert alp horn player isn't very reliable either. However, by playing different progression of tones, different messsages can be conveyed across the Swiss valleys. For example, a wavering high and low series of notes indicates danger of avalanche. However, a more drawn-out tune which starts high and ends low indicates danger of invasion. For simplicity, assume that the horn is expected to make one of three sounds: high, mid and low.

You befriend a villager, whose job it is to stay up all night and listen for these faint sounds - alerting the village whenever there is danger. However, the villager is not very reliable because he keeps falling asleep. You decide to make a computer program to listen for the alerts and automatically decide which alarm to give depending on which alert is heard.

After much trial and error, you determine that the HMMs below are appropriate to the task. The output distributions are discrete, indicating the four separate notes that are supposed to be played. We assume the prior probabilities of avalanche and invasion are identical.

The models for each possible message look something like:

<img src="avalanche.png", width="400", height="300">

<img src="invasion.png", width="400", height="300">

The following is the transition table for an avalanche warning:

Old State New State
High High: 0.6, Low: 0.4, End: 0
Low High: 0.4, Low: 0.5, End: 0.1

The following is the emission table for an avalanche warning:

System State Note Emission Probability
High C 0.1
C# 0.1
D 0.4
D# 0.4
Low C 0.4
C# 0.4
D 0.1
D# 0.1

The following is the transition table for an invasion warning:

Old State New State
High High: 0.6, Mid: 0.4 Low: 0, End: 0
Mid High: 0, Mid: 0.7 Low: 0.3, End: 0
Low High: 0, Mid: 0 Low: 0.6, End: 0.4

The following is the emission table for an invasion warning:

System State Note Emission Probability
High C 0.1
C# 0.1
D 0.2
D# 0.6
Mid C 0.1
C# 0.1
D 0.7
D# 0.1
Low C 0.6
C# 0.2
D 0.1
D# 0.1

1a) By hand, determine the state sequence with the highest probability of generating the note sequence D# D C# C C# D D# D C# C for each of the HMMs. This will require filling out a probability trellis for each event. According to your calculation, was the alert for avalanche or invasion?


In [133]:
def part_1_a(): #(20 pts)
    # TODO: Fill in below !
    
    # Fill in the matrix below with state probabilities at each time step, P(High) being the value at the 0th index 
    part_a_avalanche_trellis = [[0.4,0], [0.096,0.016], [0.00576,0.01536], [0.0006,0.003], [0.00012,0.0006], [0.000096,0.00003], [0.000023,0.0000038], [0.0000055,0.00000092], [0.00000033,0.00000077], [0.00000003,0.00000015]]

    # Fill in the matrix below with state probabilities at each time step, P(High) being the value at the 0th index,
    # P(Mid) being the value at the 1st index and P(Low) being the value at the 2nd index
    part_a_invasion_trellis = [[0.6,0,0], [0.072,0.168,0], [0.0042,0.012,0.01], [0.00025,0.00084,0.0036], [0.000015,0.000059,0.00043], [0.0000018,0.000029,0.000026], [0.00000065,0.0000020,0.0000016], [0.000000078,0.00000098,0.000000096], [0.0000000047,0.000000069,0.000000059], [0.00000000028,0.0000000048,0.000000021]]

    part_a_most_likely_state_sequence = [0,0,1,1,1,0,0,0,1,1]

    part_a_warning = 'avalanche'
    
    return part_a_invasion_trellis, part_a_avalanche_trellis, part_a_most_likely_state_sequence, part_a_warning

1b) By hand, determine the state sequence with the highest probability of generating the note sequence D# D# D# D D D C# C C for each of the HMMs. According to your calculation, was the alert for avalanche or invasion?


In [134]:
def part_1_b(): #(20 pts)
    #TODO: Fill in below !
    
    # Fill in the matrix below with state probabilities at each time step, P(High) being the value at the 0th index 
    part_b_avalanche_trellis = [[0.4,0], [0.096,0.016], [0.023,0.0038], [0.0055,0.00092], [0.0013,0.00022], [0.00032,0.000053], [0.000019,0.00005], [0.000002,0.00001], [0.0000004,0.000002]]

    # Fill in the matrix below with state probabilities at each time step, P(High) being the value at the 0th index,
    # P(Mid) being the value at the 1st index and P(Low) being the value at the 2nd index
    part_b_invasion_trellis = [[0.6,0,0], [0.216,0.024,0], [0.078,0.0086,0.00072], [0.0094,0.019,0.00026], [0.0011,0.0093,0.00057], [0.00013,0.0046,0.00028], [0.0000078,0.00032,0.00055], [0.00000047,0.000022,0.00055], [0.000000028,0.0000015,0.000071]]

    part_b_most_likely_state_sequence = [0,0,0,1,1,1,2,2,2]

    part_b_warning = 'invasion' 
    
    return part_b_invasion_trellis, part_b_avalanche_trellis, part_b_most_likely_state_sequence, part_b_warning

1c) Implement the Viterbi algorithm, making sure that the conclusions from your hand calculations match your program output. The following cell contains examples of the format for your inputs - you'll need to create tables for both avalanche and invasion probabilities in order to test your hand calculation. We will provide tests for you - stay tuned!

Hint In order to reconstruct your most-likely path after running Viterbi, you'll need to keep track of a back-pointer at each state, which directs you that state's most-likely predecessor.


In [135]:
avalanche_states = ('High', 'Low', 'End')
avalanche_evidence_variables = ['Note']

avalanche_transition_probs = {
    'High': {'High': 0.6, 'Low': 0.4, 'End':0},
    'Low':  {'High': 0.4, 'Low': 0.5, 'End':0.1},
    'End':  {'High': 0,   'Low': 0,   'End':1}
}

avalanche_emission_probs = {
    'High' : {'Note':[0.1, 0.1, 0.4, 0.4]},
    'Low'  : {'Note':[0.4, 0.4, 0.1, 0.1]},
    'End'  : {'Note':[1e-4, 1e-4, 1e-4, 1e-4]}
}

prior = {'High':1, 'Low':0.0, 'End': 0.0}

In [136]:
def viterbi(evidence_vector, prior, states, evidence_variables, transition_probs, emission_probs):
    """
        20 points
        
        This method takes as input the following:
    
        evidence_vector: A list of dictionaries mapping evidence variables to their values
        
        prior: A dictionary corresponding to the prior distribution over states
        
        states: A list of all possible system states
        
        evidence_variables: A list of all valid evidence variables
        
        transition_probs: A dictionary mapping states onto dictionaries mapping states onto probabilities
        
        emission_probs: A dictionary mapping states onto dictionaries mapping evidence variables onto 
                    probabilities for their possible values
                    
                    
        This method returns:
            A list of states that is the most likely sequence of states explaining the evidence
        
    """
    path = [{}];
    v_t_k = [{}];
    
    #initialize for t=0
    for state_n in states:
        v_t_k[0][state_n] = prior[state_n]
        for evidence in evidence_variables:
            v_t_k[0][state_n] = v_t_k[0][state_n]*emission_probs[state_n][evidence][evidence_vector[0][evidence]]
        path[0][state_n] = state_n
            
    #calculate for each observation
    for t in range(1,len(evidence_vector)):
        v_t_k.append({})
        path.append({})
        
        for state_n in states:
            v_t_k[t][state_n] = 0
            for each_state in states:
                temp = v_t_k[t-1][each_state]*transition_probs[each_state][state_n]
                for evidence in evidence_variables:
                    temp = temp*emission_probs[state_n][evidence][evidence_vector[t][evidence]]
                    
                if temp>v_t_k[t][state_n]:
                    v_t_k[t][state_n] = temp
                    path[t][state_n] = each_state
                
    #return the most likely path
    (_,final_state) = max((v_t_k[len(evidence_vector)-1][state_n], state_n) for state_n in states);

    #store the likely path for return
    most_likely_path = [final_state];
    for index in reversed(range(1,len(evidence_vector))):
        most_likely_path.insert(0,path[index][final_state])
        final_state = path[index][final_state]
    return most_likely_path

In [137]:
invasion_states = ('High', 'Mid', 'Low', 'End')
invasion_evidence_variables = ['Note']


invasion_transition_probs = {
    'High': {'High': 0.6, 'Mid':0.4 , 'Low': 0, 'End':0},
    'Mid' : {'High': 0,  'Mid':0.7, 'Low': 0.3, 'End': 0  },
    'Low':  {'High': 0,   'Mid': 0,    'Low': 0.6, 'End':0.4},
    'End':  {'High': 0,   'Mid': 0,    'Low': 0,   'End':1}
}

invasion_emission_probs = {
    'High' : {'Note':[0.1, 0.1, 0.2, 0.6]},
    'Mid'  : {'Note':[0.1, 0.1, 0.7, 0.1]},
    'Low'  : {'Note':[0.6, 0.2, 0.1, 0.1]},
    'End'  : {'Note':[1e-4, 1e-4, 1e-4, 1e-4]}
}

invasion_prior = {'High':1, 'Mid':0.0, 'Low':0.0, 'End': 0.0}


# D# D C# C C# D D# D C# C
evidence_sequence_1 = [{'Note':3}, {'Note':2}, {'Note':1},{'Note':0},{'Note':1},{'Note':2},{'Note':3},{'Note':2},{'Note':1},{'Note':0}]

invasion_state_sequence_1 = viterbi(evidence_sequence_1, invasion_prior, invasion_states, invasion_evidence_variables, invasion_transition_probs, invasion_emission_probs)
avalanche_state_sequence_1 = viterbi(evidence_sequence_1, prior, avalanche_states, avalanche_evidence_variables, avalanche_transition_probs, avalanche_emission_probs )

# D# D# D# D D D C# C C
evidence_sequence_2 = [{'Note':3}, {'Note':3}, {'Note':3},{'Note':2},{'Note':2},{'Note':2},{'Note':1},{'Note':0},{'Note':0}]

invasion_state_sequence_2 = viterbi(evidence_sequence_2, invasion_prior, invasion_states, invasion_evidence_variables, invasion_transition_probs, invasion_emission_probs)
avalanche_state_sequence_2 = viterbi(evidence_sequence_2, prior, avalanche_states, avalanche_evidence_variables, avalanche_transition_probs, avalanche_emission_probs )

print(invasion_state_sequence_1)
print(avalanche_state_sequence_1)
print(invasion_state_sequence_2)
print(avalanche_state_sequence_2)


['High', 'Mid', 'Mid', 'Mid', 'Mid', 'Mid', 'Mid', 'Mid', 'Low', 'Low']
['High', 'High', 'Low', 'Low', 'Low', 'High', 'High', 'High', 'Low', 'Low']
['High', 'High', 'High', 'Mid', 'Mid', 'Mid', 'Low', 'Low', 'Low']
['High', 'High', 'High', 'High', 'High', 'High', 'Low', 'Low', 'Low']

Part 2: Umbrella World & Forward-Backward (40 pts)

Given a model of a system's transition and emission probabilities, and a series of evidence observations, what is the probability that the system is in a given state at an instant in time?

Consider the following variation on the Umbrella World system from Russell & Norvig Ch 15. To refresh your memory: suppose we work in a windowless building all day and want to know what the weather is like outside from one day to the next. We cannot directly observe whether it is raining, but we can see whether our coworker is carrying an umbrella or not and whether or not she is wearing a coat. For simplicity, assume that there are two kinds of weather: rain and sunny.

The normalized state transition probabilities are specified as follows. These are the probabilities of moving from one weather state to the next.

Old State New State
RAIN RAIN: $\frac{2}{3}$, SUNNY: $\frac{1}{3}$
SUNNY RAIN: $\frac{1}{2}$, SUNNY: $\frac{1}{2}$

The emission probabilities are specified as follows. These are the probabilities of observing a particular evidence variable, given the state of the system.

System State Variable Value Emission Probability
RAIN Umbrella True $\frac{3}{4}$
Umbrella False $\frac{1}{4}$
Coat True $\frac{2}{3}$
Coat False $\frac{1}{3}$
SUNNY Umbrella True $\frac{1}{4}$
Umbrella False $\frac{3}{4}$
Coat True $\frac{1}{2}$
Coat False $\frac{1}{2}$

Below is a representation of the system in python, in the same format as the Swiss Horn data. Note the [F,T] order of the emission probabilities.


In [138]:
states = ('Rainy', 'Sunny')
evidence_variables = ('umbrella', 'coat')

transition_probs = {
    'Rainy': {'Rainy': .66 , 'Sunny': .34},
    'Sunny': {'Rainy': .5, 'Sunny': .5}
}

emission_probs = {
    'Rainy' : {'umbrella':[0.25, 0.75], 'coat':[0.34, 0.66]},
    'Sunny' : {'umbrella':[0.75, 0.25], 'coat':[0.5,  0.5]}
}

We can use the forward-backward algorithm to compute the posterior probabilities of the state variables given a sequence of observations. We can use these probabilities in turn to figure out the most likely state at any point in time.

The forward-backward algorithm is specified in figure 15.4 of Russell & Norvig, with the Forward operator given as equation 15.5, and the Backward operator given as 15.9.

Fill out forward_backward() to return the distribution of posterior probabilities of all states at each time step. We will provide tests for you - stay tuned!


In [139]:
from functools import reduce
from operator import mul
def forward_backward(evidence_vector, prior, states, evidence_variables, transition_probs, emission_probs ):
    
    """
        This method takes as input the following:
    
        evidence_vector: A list of dictionaries mapping evidence variables to their values at each time step
        
        prior: A dictionary corresponding to the prior distribution over states
        
        states: A list of all possible system states
        
        evidence_variables: A list of all valid evidence variables
        
        transition_probs: A dictionary mapping states onto dictionaries mapping states onto probabilities
        
        emission_probs: A dictionary mapping states onto dictionaries mapping evidence variables onto 
                    a list of probabilities for their possible values
                    
                    
        This method returns:
            A list of dictionaries giving the distribution of possible states at each time step
        
    """
    
    alpha = [{}]
    beta = [{}]
    
    #initialization
    smoothing_term = []
    for each_state in states:
        alpha[0][each_state] = prior[each_state]
        for evidence in evidence_variables:
            alpha[0][each_state] = alpha[0][each_state]*emission_probs[each_state][evidence][evidence_vector[0][evidence]]
    
    for each_state in states:
        beta[0][each_state] = 1
        
    #forward
    for t in range(1,len(evidence_vector)):
        alpha.append({})
        
        for curr_state in states:
            alpha[t][curr_state] = 0
            for prev_state in states:
                prob = alpha[t-1][prev_state]*transition_probs[prev_state][curr_state]
                for evidence in evidence_variables:
                    prob = prob*emission_probs[curr_state][evidence][evidence_vector[0][evidence]]
                    
                alpha[t][curr_state] = alpha[t][curr_state] + prob
    
    #backward
    for rev_index in reversed(range(1,len(evidence_vector))):
        temp_back = {}
        
        for curr_state in states:
            temp_back[curr_state] = 0
            
            for each_state in states:
                prob = beta[0][each_state]*transition_probs[curr_state][each_state]
                for evidence in evidence_variables:
                    prob= prob*emission_probs[each_state][evidence][evidence_vector[rev_index][evidence]]
                    
                temp_back[curr_state] = temp_back[curr_state] + prob
                
        beta.insert(0, temp_back)
        
    posterior = []
    for i in range(len(evidence_vector)):
        posterior.append({each_state: alpha[i][each_state]*beta[i][each_state]/sum(alpha[len(evidence_vector)-1][k] for k in states) for each_state in states})
        
    return posterior

In [140]:
#The following is an example input:

evidence_vector = [{'umbrella':0, 'coat':1},
                   {'umbrella':0, 'coat':0},
                   {'umbrella':1, 'coat':0},
                   {'umbrella':1, 'coat':1},
                   {'umbrella':0, 'coat':1} ]

prior = {'Rainy': 0.5, 'Sunny': 0.5}

warmup_output = forward_backward(evidence_vector, 
                                 prior, 
                                 states, 
                                 evidence_variables, 
                                 transition_probs, 
                                 emission_probs)
print(warmup_output)


[{'Rainy': 0.20929974455219835, 'Sunny': 0.5867791449070745}, {'Rainy': 0.36761690407995906, 'Sunny': 0.6067004843271727}, {'Rainy': 0.47296938195451615, 'Sunny': 0.7300941216140928}, {'Rainy': 0.3261488138382283, 'Sunny': 0.6738511861617718}, {'Rainy': 0.3561426530205938, 'Sunny': 0.6438573469794061}]

Now, for the purposes of comparison, compute the most likely sequence of paths for the Swiss Horn examples using the Viterbi algorithm, then compute path by choosing the sequence of most likely states at each instant as given by the Forward-Backward algorithm. How do the total probabilities of the state sequences compare?


In [ ]:
def part_2_b():  
    
    options = {'a': 'The total probabilities are the same.',
               'b': 'The Viterbi algorithm yields a more likely sequence',
               'c': 'The forward-backward algorithm yields a more likely sequence',
               'd': 'Insufficient data is provided',
               'e': 'No answer'}
    
    return options['b']