Long Short-Term Memory (LSTM)

This page attempts to explain why LSTM was first proposed, and what are the core features together with some examples.

This is based on the paper Hochreiter and Schmidhuber. 1997. Long Short-Term Memory

Constant-Error Carrousel

The core feature of an LSTM unit as first proposed is the constant-error carrousel (CEC) which solves the vanishing gradient problem with standard RNN.

A CEC is neural network unit which consists of a single neuron with self-loop with weight fixed to 1.0 to ensure constant error flow when doing backpropagation.

Fig 1. Diagram of a single CEC unit

Now let's see an example of CEC at work. We will use CEC to do very simple task: recognizing whether current character is inside a bracketed expression, with the opening bracket considered to be inside, and the closing bracket considered to be outside, for simplicity. This is solvable only using network that can store memory, since to recognize whether a character is inside a bracketed expression, we need to have the knowledge that there is an opening bracket to the left of current character which does not have the corresponding closing bracket.

The input alphabets are coming from the set: $\{`a`, `b`, `(`, `)`\}$ with the following 2-dimensional embedding:

$$ \begin{eqnarray} emb(`a`) &=& (1, 1) \nonumber\\ emb(`b`) &=& (-1, -1) \nonumber\\ emb(`(`) &=& (1, 0) \nonumber\\ emb(`)`) &=& (0, -1) \nonumber \end{eqnarray} $$

For this task, we define a very simple network with two input units, one CEC unit, and one output unit with sigmoid activation ($\sigma(x) = \frac{1}{1 + e^{-x}}$), as follows:

Fig 2. Network used for the bracketed expression recognition

For this task, we define the loss function as the cross-entropy (CE) between the predicted and the true one:

$$ \begin{eqnarray} \mathrm{CE}(x, y) = - (x\log(y) + (1-x)\log(1-y)) \nonumber\\ \mathrm{Loss}(\hat{o}_t, o_t) = \mathrm{CE}(\hat{o}_t, o_t) - \mathrm{CE}(\hat{o}_t, \hat{o}_t) \end{eqnarray} $$

with $\hat{o}_t$ and $o_t$ represent the target value (gold standard) and output value (network prediction), respectively, at time step $t$. The first term is the cross-entropy between the target value and the output value, and the second term is the entropy of the target value itself. Note that the second term is a constant, and serves just to make the minimum achievable loss to be 0 (perfect output).

More specifically, we have:

$$ \begin{equation} o_t = \sigma(w_3*s_t) \end{equation} $$

where $s_t$ is the output of the CEC unit (a.k.a. the memory), which depends on the previous value of the memory $s_{t-1}$, and the input $x_{t,1}$ and $x_{t,2}$ (representing the first and second dimension of the input at time step $t$):

$$ \begin{equation} s_t = \underbrace{w_s * s_{t-1}}_\text{previous value} + \underbrace{w_1 * x_{t,1} + w_2 * x_{t,2}}_\text{input} \end{equation} $$

where $w_s$ is the weight of the self-loop, which is 1.0. But for clarity of why this should be 1.0, the calculation later does not assume $w_s=1.0$.



In [14]:

    
import math
from IPython.display import Markdown, display

def printmd(string):
    display(Markdown(string))

# Embedding
embedding = {}
embedding['a'] = (1.0, 1)
embedding['b'] = (-1, -1)
embedding['('] = (1, 0)
embedding[')'] = (0, 1)

# embedding['a'] = (-1, 0)
# embedding['b'] = (-0.5, 0)
# embedding['('] = (1, 1)
# embedding[')'] = (1, -1)

# Weights
w1=1.0
w2=1.0
w3=1.0
ws=1.0

memory_history = [0]
output_history = [0]

def sigmoid(x):
    return 1.0/(1+math.exp(-x))

def gold(seq):
    result = [0]
    bracket_count = 0
    for char in seq:
        if char == '(':
            bracket_count += 1
        if char == ')':
            bracket_count -= 1
        result.append(sigmoid(bracket_count))
    return result

def activate_memory(x1, x2):
    prev_memory = memory_history[-1]
    memory_history.append(ws*prev_memory + w1*x1 + w2*x2)
    return memory_history[-1]

def activate_output(h):
    output_history.append(sigmoid(w3*h))
    return output_history[-1]

def predict(seq):
    for char in seq:
        activate_output(activate_memory(*embedding[char]))
    result = output_history[:]
    return result

def reset():
    global memory_history, output_history
    memory_history = [0]
    output_history = [0]
    
def loss(gold_seq, pred_seq):
    result = 0.0
    per_position_loss = []
    for idx, (corr, pred) in enumerate(zip(gold_seq, pred_seq)):
        cur_loss  = -(corr*math.log(pred) + (1-corr)*math.log(1-pred))
        cur_loss -= -(corr*math.log(corr) + (1-corr)*math.log(1-corr))
        result += cur_loss
        per_position_loss.append(cur_loss)
    return result, per_position_loss


def print_list(lst):
    '''A convenience method to print a list of real numbers'''
    as_str = ['{:+.3f}'.format(num) for num in lst]
    print('[{}]'.format(', '.join(as_str)))



In [15]:

    
# See typical values of sigmoid
for i in range(5):
    print('sigmoid({}) = {}'.format(i, sigmoid(i)))









    



sigmoid(0) = 0.5
sigmoid(1) = 0.7310585786300049
sigmoid(2) = 0.8807970779778823
sigmoid(3) = 0.9525741268224334
sigmoid(4) = 0.9820137900379085

Now let's check the function calculating the target value. Basically we want it to output $\sigma(0)$ or $\sigma(1)$ when the output is outside or inside a bracketed expression, respectively.



In [16]:

    
gold('a(a)a')[1:]  # The first element is dummy









    Out[16]:





[0.5, 0.7310585786300049, 0.7310585786300049, 0.5, 0.5]

Which is $\sigma(0), \sigma(1), \sigma(1), \sigma(0), \sigma(0)$, which is what we expect. So far so good.

Now let's see what our network outputs



In [17]:

    
test_seq = 'ab(ab)ab'
reset()
w1 = 1.0
w2 = 1.0
w3 = 1.0
result = predict(test_seq)
correct = gold(test_seq)
print('Output: ', end='')
print_list(result[1:])
print('Target: ', end='')
print_list(correct[1:])
print('Loss  : {:.3f}'.format(loss(correct[1:], result[1:])[0]))









    



Output: [+0.881, +0.500, +0.731, +0.953, +0.731, +0.881, +0.982, +0.881]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Loss  : 2.900

We see that the loss is still non-zero, and we see that some values are incorrectly predicted.

Next we will see the gradient calculation in progress, so that we can update the weight to reduce the loss.

Calculating Gradients

To do the weight update, we need to calculate the partial derivative of the loss function with respect to the each weight. We have three weight parameters $w_1, w_2$, and $w_3$, so we need to compute three different partial derivatives.

For ease of notation, we denote $\mathrm{Loss}_t = \mathrm{Loss}(\hat{o}_t, o_t)$ as the loss at time step $t$ and $\mathrm{Loss} = \sum_t \mathrm{Loss}_t$ as the total loss over one sequence.

Remember that our objective is to reduce the total loss.

$$ \begin{eqnarray} \frac{\partial\mathrm{Loss}}{\partial w_i} & = & \sum_t\frac{\partial \mathrm{Loss}_t}{\partial w_i} \\ & = & \sum_t\frac{\partial \mathrm{Loss}_t}{\partial o_t} \cdot \frac{\partial o_t}{\partial w_i} \qquad \text{(by chain rule)} \\ \end{eqnarray} $$

for $w_3$, we can already compute the gradient here, which is:

$$ \require{cancel} \begin{eqnarray} \frac{\partial\mathrm{Loss}}{\partial w_3} & = & \sum_t\frac{\partial \mathrm{Loss}_t}{\partial o_t} \cdot \frac{\partial o_t}{\partial w_i} \\ & = & \sum_t\underbrace{\frac{o_t - \hat{o}_t}{\cancel{o_t(1-o_t)}}}_{=\frac{\partial \mathrm{Loss}_t}{\partial o_t}} \cdot \underbrace{s_t \cdot \cancel{o_t(1-o_t)}}_{=\frac{\partial o_t}{\partial w_i}} \\ & = & \sum_t(o_t-\hat{o}_t)s_t \end{eqnarray} $$

for $w_1$ and $w_2$, we have:

$$ \begin{eqnarray} \frac{\partial\mathrm{Loss}}{\partial w_i} & = & \sum_t\frac{\partial \mathrm{Loss}_t}{\partial o_t} \cdot \frac{\partial o_t}{\partial w_i} \\ & = & \sum_t \frac{o_t - \hat{o}_t}{o_t(1-o_t)} \cdot \frac{\partial o_t}{\partial s_t} \cdot \frac{\partial s_t}{\partial w_i} \\ & = & \sum_t \frac{o_t - \hat{o}_t}{\cancel{o_t(1-o_t)}} \cdot w_3\cdot \cancel{o_t(1-o_t)} \cdot \frac{\partial s_t}{\partial w_i} \\ & = & \sum_t (o_t - \hat{o}_t)w_3 \cdot \frac{\partial s_t}{\partial w_i} \\ & = & \sum_t (o_t - \hat{o}_t)w_3 \cdot \left(w_s\cdot\frac{\partial s_{t-1}}{\partial w_i} + x_{t,i}\right) \\ & = & \sum_t (o_t - \hat{o}_t)w_3 \cdot \left({w_s}^2\cdot\frac{\partial s_{t-2}}{\partial w_i} + w_s\cdot x_{t-1,i} + x_{t,i}\right) \\ & & \ldots \\ & = & \sum_t (o_t - \hat{o}_t)w_3 \cdot \left(\sum_{t'\leq t} {w_s}^{t-t'}x_{t',i}\right) \\ \end{eqnarray} $$

Important Note on $w_s$!

We see that the gradient with respect to $w_1$ and $w_2$ contains the factor ${w_s}^{t-t'}$, where $t-t'$ can be as large as the input sequence length. So if $w_s \neq 1.0$, then either the gradient will vanish or blow up as the input sequence gets longer.



In [18]:

    
def dLdw1(test_seq, gold_seq, pred_seq, state_seq, info):
    result = 0.0
    grad_str = '<div style="font-family:monaco; font-size:12px">dL/dw1 = '
    for time_step in range(1, len(gold_seq)):
        cur_dell = (pred_seq[time_step] - gold_seq[time_step]) * w3
        cur_dell *= sum(ws**(step-1)*embedding[test_seq[step-1]][0] for step in range(1, time_step+1))
        if cur_dell < 0:
            color = 'red'
        else:
            color = 'blue'
        grad_str += '{}<span style="color:{}">{:+.3f}</span>'.format(' + ' if time_step > 1 else '', color, cur_dell)
        result += cur_dell
    grad_str += ' = <span style="color:{}; text-decoration:underline">{:+.3f}</span></div>'.format(
                'red' if result < 0 else 'blue', result)
    # printmd(grad_str)
    info[0] += grad_str
    return result

def dLdw2(test_seq, gold_seq, pred_seq, state_seq, info):
    result = 0.0
    grad_str = '<div style="font-family:monaco; font-size:12px">dL/dw2 = '
    for time_step in range(1, len(gold_seq)):
        cur_dell = (pred_seq[time_step] - gold_seq[time_step]) * w3
        cur_dell *= sum(ws**(step-1)*embedding[test_seq[step-1]][1] for step in range(1, time_step+1))
        if cur_dell < 0:
            color = 'red'
        else:
            color = 'blue'
        grad_str += '{}<span style="color:{}">{:+.3f}</span>'.format(' + ' if time_step > 1 else '', color, cur_dell)
        result += cur_dell
    grad_str += ' = <span style="color:{}; text-decoration:underline">{:+.3f}</span></div>'.format(
                'red' if result < 0 else 'blue', result)
    # printmd(grad_str)
    info[0] += grad_str
    return result

def dLdw3(test_seq, gold_seq, pred_seq, state_seq, info):
    result = 0.0
    grad_str = '<div style="font-family:monaco; font-size:12px">dL/dw3 = '
    for time_step in range(1, len(gold_seq)):
        cur_dell = (pred_seq[time_step] - gold_seq[time_step]) * state_seq[time_step]
        if cur_dell < 0:
            color = 'red'
        else:
            color = 'blue'
        grad_str += '{}<span style="color:{}">{:+.3f}</span>'.format(' + ' if time_step > 1 else '', color, cur_dell)
        result += cur_dell
    grad_str += ' = <span style="color:{}; text-decoration:underline">{:+.3f}</span></div>'.format(
                'red' if result < 0 else 'blue', result)
    # printmd(grad_str)
    info[0] += grad_str
    return result

Experiment

Now we define an experiment which takes in initial values of all the weights, learning rate, and maximum number of iterations. We also want to experiment with fixing the weight $w_3$ (i.e., it is not learned).

The code below will print the total loss, the loss at each time step, the output, target, and memory at each time step, and also the gradient for each learned parameter at each time step.



In [19]:

    
def experiment(test_seq, _w1=1.0, _w2=1.0, _w3=1.0, alpha=1e-1, max_iter=250, fixed_w3=True):
    global w1, w2, w3
    reset()
    w1 = _w1
    w2 = _w2
    w3 = _w3
    correct = gold(test_seq)
    print('w1={:+.3f}, w2={:+.3f}, w3={:+.3f}'.format(w1, w2, w3))
    for iter_num in range(max_iter):
        result = predict(test_seq)
        if iter_num < 15 or (iter_num % 50 == 49):
            printmd('<div style="font-weight:bold">Iteration {}</div>'.format(iter_num))
            print('Output: ', end='')
            print_list(result[1:])
            print('Target: ', end='')
            print_list(correct[1:])
            print('Memory: ', end='')
            print_list(memory_history[1:])
        total_loss, per_position_loss = loss(correct[1:], result[1:])
        info = ['', iter_num]
        info[0] = ('<div>Loss: <span style="font-weight:bold">{:.5f}</span>' +
                   '= <span style="font-family:monaco; font-size:12px">').format(total_loss)
        for idx, per_pos_loss in enumerate(per_position_loss):
            info[0] += '{}{:.3f}'.format(' + ' if idx > 0 else '', per_pos_loss)
        info[0] += '</span></div>'
        # printmd(loss_str)
        w1 -= alpha * dLdw1(test_seq, correct, result, memory_history, info)
        w2 -= alpha * dLdw2(test_seq, correct, result, memory_history, info)
        if not fixed_w3:
            w3 -= alpha * dLdw3(test_seq, correct, result, memory_history, info)
        if iter_num < 15 or (iter_num % 50 == 49):
            printmd(info[0])
            print('w1={:+.3f}, w2={:+.3f}, w3={:+.3f}'.format(w1, w2, w3))
            print()
        reset()
    return w1, w2, w3



In [20]:

    
embedding['a'] = (1.0, 1)
embedding['b'] = (-1, -1)
embedding['('] = (1, 0)
embedding[')'] = (0, 1)

w1, w2, w3 = experiment('ab(ab)bb', _w1=1.0, _w2=1.0, max_iter=250, alpha=1e-1, fixed_w3=True)
printmd('## Test on longer sequence')
experiment('aabba(aba)bab', _w1=w1, _w2=w2, _w3=w3, alpha=1e-2, max_iter=100)









    



w1=+1.000, w2=+1.000, w3=+1.000






    




Iteration 0






    



Output: [+0.881, +0.500, +0.731, +0.953, +0.731, +0.881, +0.500, +0.119]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+2.000, +0.000, +1.000, +3.000, +1.000, +2.000, +0.000, -2.000]






    




Loss: 1.57455= 0.434 + 0.000 + 0.000 + 0.273 + 0.000 + 0.434 + 0.000 + 0.434
dL/dw1 = +0.381 + +0.000 + +0.000 + +0.443 + +0.000 + +0.381 + +0.000 + +0.381 = +1.585
dL/dw2 = +0.381 + +0.000 + +0.000 + +0.222 + +0.000 + +0.381 + +0.000 + +0.381 = +1.364






    



w1=+0.841, w2=+0.864, w3=+1.000







    




Iteration 1






    



Output: [+0.846, +0.500, +0.699, +0.927, +0.699, +0.846, +0.500, +0.154]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+1.705, +0.000, +0.841, +2.547, +0.841, +1.705, +0.000, -1.705]






    




Loss: 1.16233= 0.326 + 0.000 + 0.003 + 0.178 + 0.003 + 0.326 + 0.000 + 0.326
dL/dw1 = +0.346 + +0.000 + -0.032 + +0.393 + -0.032 + +0.346 + +0.000 + +0.346 = +1.367
dL/dw2 = +0.346 + +0.000 + -0.000 + +0.196 + -0.000 + +0.346 + +0.000 + +0.346 = +1.235






    



w1=+0.705, w2=+0.740, w3=+1.000







    




Iteration 2






    



Output: [+0.809, +0.500, +0.669, +0.896, +0.669, +0.809, +0.500, +0.191]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+1.445, +0.000, +0.705, +2.150, +0.705, +1.445, +0.000, -1.445]






    




Loss: 0.84706= 0.241 + 0.000 + 0.009 + 0.106 + 0.009 + 0.241 + 0.000 + 0.241
dL/dw1 = +0.309 + +0.000 + -0.062 + +0.329 + -0.062 + +0.309 + +0.000 + +0.309 = +1.133
dL/dw2 = +0.309 + +0.000 + -0.000 + +0.165 + -0.000 + +0.309 + +0.000 + +0.309 = +1.092






    



w1=+0.591, w2=+0.631, w3=+1.000







    




Iteration 3






    



Output: [+0.772, +0.500, +0.644, +0.860, +0.644, +0.772, +0.500, +0.228]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+1.222, -0.000, +0.591, +1.814, +0.591, +1.222, -0.000, -1.222]






    




Loss: 0.61998= 0.176 + 0.000 + 0.017 + 0.057 + 0.017 + 0.176 + 0.000 + 0.176
dL/dw1 = +0.272 + +0.000 + -0.087 + +0.258 + -0.087 + +0.272 + +0.000 + +0.272 = +0.900
dL/dw2 = +0.272 + +0.000 + -0.000 + +0.129 + -0.000 + +0.272 + +0.000 + +0.272 = +0.946






    



w1=+0.501, w2=+0.536, w3=+1.000







    




Iteration 4






    



Output: [+0.738, +0.500, +0.623, +0.823, +0.623, +0.738, +0.500, +0.262]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+1.038, +0.000, +0.501, +1.539, +0.501, +1.038, +0.000, -1.038]






    




Loss: 0.46541= 0.129 + 0.000 + 0.026 + 0.026 + 0.026 + 0.129 + 0.000 + 0.129
dL/dw1 = +0.238 + +0.000 + -0.108 + +0.185 + -0.108 + +0.238 + +0.000 + +0.238 = +0.683
dL/dw2 = +0.238 + +0.000 + -0.000 + +0.092 + -0.000 + +0.238 + +0.000 + +0.238 = +0.808






    



w1=+0.433, w2=+0.456, w3=+1.000







    




Iteration 5






    



Output: [+0.709, +0.500, +0.607, +0.789, +0.607, +0.709, +0.500, +0.291]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.889, +0.000, +0.433, +1.322, +0.433, +0.889, -0.000, -0.889]






    




Loss: 0.36481= 0.096 + 0.000 + 0.034 + 0.010 + 0.034 + 0.096 + 0.000 + 0.096
dL/dw1 = +0.209 + +0.000 + -0.124 + +0.117 + -0.124 + +0.209 + +0.000 + +0.209 = +0.494
dL/dw2 = +0.209 + +0.000 + -0.000 + +0.058 + -0.000 + +0.209 + +0.000 + +0.209 = +0.684






    



w1=+0.384, w2=+0.387, w3=+1.000







    




Iteration 6






    



Output: [+0.684, +0.500, +0.595, +0.760, +0.595, +0.684, +0.500, +0.316]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.771, +0.000, +0.384, +1.155, +0.384, +0.771, +0.000, -0.771]






    




Loss: 0.30095= 0.073 + 0.000 + 0.041 + 0.002 + 0.041 + 0.073 + 0.000 + 0.073
dL/dw1 = +0.184 + +0.000 + -0.136 + +0.059 + -0.136 + +0.184 + +0.000 + +0.184 = +0.337
dL/dw2 = +0.184 + +0.000 + -0.000 + +0.029 + -0.000 + +0.184 + +0.000 + +0.184 = +0.580






    



w1=+0.350, w2=+0.329, w3=+1.000







    




Iteration 7






    



Output: [+0.664, +0.500, +0.587, +0.737, +0.587, +0.664, +0.500, +0.336]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.679, +0.000, +0.350, +1.029, +0.350, +0.679, -0.000, -0.679]






    




Loss: 0.26040= 0.057 + 0.000 + 0.045 + 0.000 + 0.045 + 0.057 + 0.000 + 0.057
dL/dw1 = +0.164 + +0.000 + -0.144 + +0.011 + -0.144 + +0.164 + +0.000 + +0.164 = +0.213
dL/dw2 = +0.164 + +0.000 + -0.000 + +0.006 + -0.000 + +0.164 + +0.000 + +0.164 = +0.496






    



w1=+0.329, w2=+0.279, w3=+1.000







    




Iteration 8






    



Output: [+0.648, +0.500, +0.581, +0.718, +0.581, +0.648, +0.500, +0.352]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.608, +0.000, +0.329, +0.937, +0.329, +0.608, +0.000, -0.608]






    




Loss: 0.23388= 0.046 + 0.000 + 0.048 + 0.000 + 0.048 + 0.046 + 0.000 + 0.046
dL/dw1 = +0.148 + +0.000 + -0.150 + -0.025 + -0.150 + +0.148 + +0.000 + +0.148 = +0.118
dL/dw2 = +0.148 + +0.000 + -0.000 + -0.013 + -0.000 + +0.148 + +0.000 + +0.148 = +0.430






    



w1=+0.317, w2=+0.236, w3=+1.000







    




Iteration 9






    



Output: [+0.635, +0.500, +0.579, +0.705, +0.579, +0.635, +0.500, +0.365]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.553, -0.000, +0.317, +0.870, +0.317, +0.553, -0.000, -0.553]






    




Loss: 0.21552= 0.038 + 0.000 + 0.050 + 0.002 + 0.050 + 0.038 + 0.000 + 0.038
dL/dw1 = +0.135 + +0.000 + -0.152 + -0.053 + -0.152 + +0.135 + +0.000 + +0.135 = +0.047
dL/dw2 = +0.135 + +0.000 + -0.000 + -0.026 + -0.000 + +0.135 + +0.000 + +0.135 = +0.378






    



w1=+0.312, w2=+0.199, w3=+1.000







    




Iteration 10






    



Output: [+0.625, +0.500, +0.577, +0.695, +0.577, +0.625, +0.500, +0.375]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.511, -0.000, +0.312, +0.823, +0.312, +0.511, -0.000, -0.511]






    




Loss: 0.20184= 0.032 + 0.000 + 0.051 + 0.003 + 0.051 + 0.032 + 0.000 + 0.032
dL/dw1 = +0.125 + +0.000 + -0.154 + -0.072 + -0.154 + +0.125 + +0.000 + +0.125 = -0.005
dL/dw2 = +0.125 + +0.000 + -0.000 + -0.036 + -0.000 + +0.125 + +0.000 + +0.125 = +0.339






    



w1=+0.313, w2=+0.165, w3=+1.000







    




Iteration 11






    



Output: [+0.617, +0.500, +0.578, +0.688, +0.578, +0.617, +0.500, +0.383]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.477, +0.000, +0.313, +0.790, +0.313, +0.477, -0.000, -0.477]






    




Loss: 0.19087= 0.028 + 0.000 + 0.051 + 0.004 + 0.051 + 0.028 + 0.000 + 0.028
dL/dw1 = +0.117 + +0.000 + -0.154 + -0.086 + -0.154 + +0.117 + +0.000 + +0.117 = -0.042
dL/dw2 = +0.117 + +0.000 + -0.000 + -0.043 + -0.000 + +0.117 + +0.000 + +0.117 = +0.308






    



w1=+0.317, w2=+0.134, w3=+1.000







    




Iteration 12






    



Output: [+0.611, +0.500, +0.579, +0.683, +0.579, +0.611, +0.500, +0.389]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.451, -0.000, +0.317, +0.768, +0.317, +0.451, -0.000, -0.451]






    




Loss: 0.18151= 0.025 + 0.000 + 0.050 + 0.005 + 0.050 + 0.025 + 0.000 + 0.025
dL/dw1 = +0.111 + +0.000 + -0.152 + -0.096 + -0.152 + +0.111 + +0.000 + +0.111 = -0.069
dL/dw2 = +0.111 + +0.000 + -0.000 + -0.048 + -0.000 + +0.111 + +0.000 + +0.111 = +0.284






    



w1=+0.324, w2=+0.105, w3=+1.000







    




Iteration 13






    



Output: [+0.606, +0.500, +0.580, +0.680, +0.580, +0.606, +0.500, +0.394]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.429, +0.000, +0.324, +0.753, +0.324, +0.429, +0.000, -0.429]






    




Loss: 0.17315= 0.023 + 0.000 + 0.049 + 0.006 + 0.049 + 0.023 + 0.000 + 0.023
dL/dw1 = +0.106 + +0.000 + -0.151 + -0.102 + -0.151 + +0.106 + +0.000 + +0.106 = -0.087
dL/dw2 = +0.106 + +0.000 + -0.000 + -0.051 + -0.000 + +0.106 + +0.000 + +0.106 = +0.266






    



w1=+0.332, w2=+0.079, w3=+1.000







    




Iteration 14






    



Output: [+0.601, +0.500, +0.582, +0.678, +0.582, +0.601, +0.500, +0.399]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.411, -0.000, +0.332, +0.744, +0.332, +0.411, -0.000, -0.411]






    




Loss: 0.16547= 0.021 + 0.000 + 0.048 + 0.007 + 0.048 + 0.021 + 0.000 + 0.021
dL/dw1 = +0.101 + +0.000 + -0.149 + -0.106 + -0.149 + +0.101 + +0.000 + +0.101 = -0.100
dL/dw2 = +0.101 + +0.000 + -0.000 + -0.053 + -0.000 + +0.101 + +0.000 + +0.101 = +0.251






    



w1=+0.342, w2=+0.054, w3=+1.000







    




Iteration 49






    



Output: [+0.545, +0.500, +0.661, +0.700, +0.661, +0.545, +0.500, +0.455]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.181, +0.000, +0.667, +0.848, +0.667, +0.181, +0.000, -0.181]






    




Loss: 0.03748= 0.004 + 0.000 + 0.011 + 0.002 + 0.011 + 0.004 + 0.000 + 0.004
dL/dw1 = +0.045 + +0.000 + -0.070 + -0.062 + -0.070 + +0.045 + +0.000 + +0.045 = -0.067
dL/dw2 = +0.045 + +0.000 + -0.000 + -0.031 + -0.000 + +0.045 + +0.000 + +0.045 = +0.105






    



w1=+0.674, w2=-0.496, w3=+1.000







    




Iteration 99






    



Output: [+0.516, +0.500, +0.706, +0.720, +0.706, +0.516, +0.500, +0.484]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.065, +0.000, +0.878, +0.942, +0.878, +0.065, +0.000, -0.065]






    




Loss: 0.00490= 0.001 + 0.000 + 0.002 + 0.000 + 0.002 + 0.001 + 0.000 + 0.001
dL/dw1 = +0.016 + +0.000 + -0.025 + -0.023 + -0.025 + +0.016 + +0.000 + +0.016 = -0.024
dL/dw2 = +0.016 + +0.000 + -0.000 + -0.012 + -0.000 + +0.016 + +0.000 + +0.016 = +0.037






    



w1=+0.880, w2=-0.817, w3=+1.000







    




Iteration 149






    



Output: [+0.506, +0.500, +0.722, +0.727, +0.722, +0.506, +0.500, +0.494]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.024, +0.000, +0.954, +0.978, +0.954, +0.024, +0.000, -0.024]






    




Loss: 0.00067= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.006 + +0.000 + -0.009 + -0.009 + -0.009 + +0.006 + +0.000 + +0.006 = -0.009
dL/dw2 = +0.006 + +0.000 + -0.000 + -0.004 + -0.000 + +0.006 + +0.000 + +0.006 = +0.014






    



w1=+0.955, w2=-0.932, w3=+1.000







    




Iteration 199






    



Output: [+0.502, +0.500, +0.728, +0.729, +0.728, +0.502, +0.500, +0.498]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.009, +0.000, +0.983, +0.992, +0.983, +0.009, +0.000, -0.009]






    




Loss: 0.00009= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.002 + +0.000 + -0.003 + -0.003 + -0.003 + +0.002 + +0.000 + +0.002 = -0.003
dL/dw2 = +0.002 + +0.000 + -0.000 + -0.002 + -0.000 + +0.002 + +0.000 + +0.002 = +0.005






    



w1=+0.983, w2=-0.975, w3=+1.000







    




Iteration 249






    



Output: [+0.501, +0.500, +0.730, +0.730, +0.730, +0.501, +0.500, +0.499]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.003, +0.000, +0.994, +0.997, +0.994, +0.003, +0.000, -0.003]






    




Loss: 0.00001= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.001 + +0.000 + -0.001 + -0.001 + -0.001 + +0.001 + +0.000 + +0.001 = -0.001
dL/dw2 = +0.001 + +0.000 + -0.000 + -0.001 + -0.000 + +0.001 + +0.000 + +0.001 = +0.002






    



w1=+0.994, w2=-0.990, w3=+1.000







    




Test on longer sequence






    



w1=+0.994, w2=-0.990, w3=+1.000






    




Iteration 0






    



Output: [+0.501, +0.502, +0.501, +0.500, +0.501, +0.730, +0.731, +0.730, +0.731, +0.502, +0.502, +0.502, +0.502]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.003, +0.007, +0.003, +0.000, +0.003, +0.997, +1.000, +0.997, +1.000, +0.010, +0.007, +0.010, +0.007]






    




Loss: 0.00005= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.001 + +0.003 + +0.001 + +0.000 + +0.001 + -0.001 + +0.000 + -0.001 + +0.000 + +0.007 + +0.003 + +0.007 + +0.003 = +0.025
dL/dw2 = +0.001 + +0.003 + +0.001 + +0.000 + +0.001 + -0.001 + +0.000 + -0.001 + +0.000 + +0.007 + +0.003 + +0.007 + +0.003 = +0.026






    



w1=+0.993, w2=-0.991, w3=+1.000







    




Iteration 1






    



Output: [+0.501, +0.501, +0.501, +0.500, +0.501, +0.730, +0.731, +0.730, +0.731, +0.502, +0.501, +0.502, +0.501]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.003, +0.006, +0.003, +0.000, +0.003, +0.996, +0.999, +0.996, +0.999, +0.008, +0.006, +0.008, +0.006]






    




Loss: 0.00003= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.001 + +0.003 + +0.001 + +0.000 + +0.001 + -0.001 + -0.001 + -0.001 + -0.001 + +0.006 + +0.003 + +0.006 + +0.003 = +0.019
dL/dw2 = +0.001 + +0.003 + +0.001 + +0.000 + +0.001 + -0.001 + -0.000 + -0.001 + -0.000 + +0.006 + +0.003 + +0.006 + +0.003 = +0.020






    



w1=+0.993, w2=-0.991, w3=+1.000







    




Iteration 2






    



Output: [+0.501, +0.501, +0.501, +0.500, +0.501, +0.730, +0.731, +0.730, +0.731, +0.502, +0.501, +0.502, +0.501]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.002, +0.005, +0.002, +0.000, +0.002, +0.996, +0.998, +0.996, +0.998, +0.007, +0.005, +0.007, +0.005]






    




Loss: 0.00003= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.001 + +0.002 + +0.001 + +0.000 + +0.001 + -0.002 + -0.001 + -0.002 + -0.001 + +0.005 + +0.002 + +0.005 + +0.002 = +0.014
dL/dw2 = +0.001 + +0.002 + +0.001 + +0.000 + +0.001 + -0.001 + -0.001 + -0.001 + -0.001 + +0.005 + +0.002 + +0.005 + +0.002 = +0.016






    



w1=+0.993, w2=-0.991, w3=+1.000







    




Iteration 3






    



Output: [+0.501, +0.501, +0.501, +0.500, +0.501, +0.730, +0.731, +0.730, +0.731, +0.502, +0.501, +0.502, +0.501]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.002, +0.004, +0.002, +0.000, +0.002, +0.995, +0.997, +0.995, +0.997, +0.006, +0.004, +0.006, +0.004]






    




Loss: 0.00002= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.001 + +0.002 + +0.001 + +0.000 + +0.001 + -0.002 + -0.002 + -0.002 + -0.002 + +0.005 + +0.002 + +0.005 + +0.002 = +0.010
dL/dw2 = +0.001 + +0.002 + +0.001 + +0.000 + +0.001 + -0.001 + -0.001 + -0.001 + -0.001 + +0.005 + +0.002 + +0.005 + +0.002 = +0.013






    



w1=+0.993, w2=-0.991, w3=+1.000







    




Iteration 4






    



Output: [+0.500, +0.501, +0.500, +0.500, +0.500, +0.730, +0.730, +0.730, +0.730, +0.501, +0.501, +0.501, +0.501]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.002, +0.004, +0.002, +0.000, +0.002, +0.995, +0.997, +0.995, +0.997, +0.006, +0.004, +0.006, +0.004]






    




Loss: 0.00002= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + +0.002 + +0.000 + +0.000 + +0.000 + -0.002 + -0.002 + -0.002 + -0.002 + +0.004 + +0.002 + +0.004 + +0.002 = +0.007
dL/dw2 = +0.000 + +0.002 + +0.000 + +0.000 + +0.000 + -0.001 + -0.001 + -0.001 + -0.001 + +0.004 + +0.002 + +0.004 + +0.002 = +0.011






    



w1=+0.993, w2=-0.991, w3=+1.000







    




Iteration 5






    



Output: [+0.500, +0.501, +0.500, +0.500, +0.500, +0.730, +0.730, +0.730, +0.730, +0.501, +0.501, +0.501, +0.501]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.002, +0.003, +0.002, +0.000, +0.002, +0.995, +0.996, +0.995, +0.996, +0.005, +0.003, +0.005, +0.003]






    




Loss: 0.00002= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + +0.002 + +0.000 + +0.000 + +0.000 + -0.002 + -0.002 + -0.002 + -0.002 + +0.004 + +0.002 + +0.004 + +0.002 = +0.005
dL/dw2 = +0.000 + +0.002 + +0.000 + +0.000 + +0.000 + -0.001 + -0.001 + -0.001 + -0.001 + +0.004 + +0.002 + +0.004 + +0.002 = +0.009






    



w1=+0.993, w2=-0.991, w3=+1.000







    




Iteration 6






    



Output: [+0.500, +0.501, +0.500, +0.500, +0.500, +0.730, +0.730, +0.730, +0.730, +0.501, +0.501, +0.501, +0.501]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.002, +0.003, +0.002, +0.000, +0.002, +0.994, +0.996, +0.994, +0.996, +0.005, +0.003, +0.005, +0.003]






    




Loss: 0.00002= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + +0.002 + +0.000 + +0.000 + +0.000 + -0.002 + -0.002 + -0.002 + -0.002 + +0.003 + +0.002 + +0.003 + +0.002 = +0.003
dL/dw2 = +0.000 + +0.002 + +0.000 + +0.000 + +0.000 + -0.001 + -0.002 + -0.001 + -0.002 + +0.003 + +0.002 + +0.003 + +0.002 = +0.007






    



w1=+0.993, w2=-0.991, w3=+1.000







    




Iteration 7






    



Output: [+0.500, +0.501, +0.500, +0.500, +0.500, +0.730, +0.730, +0.730, +0.730, +0.501, +0.501, +0.501, +0.501]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.001, +0.003, +0.001, +0.000, +0.001, +0.994, +0.996, +0.994, +0.996, +0.004, +0.003, +0.004, +0.003]






    




Loss: 0.00002= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.002 + -0.003 + -0.002 + -0.003 + +0.003 + +0.001 + +0.003 + +0.001 = +0.002
dL/dw2 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.001 + -0.002 + -0.001 + -0.002 + +0.003 + +0.001 + +0.003 + +0.001 = +0.006






    



w1=+0.993, w2=-0.992, w3=+1.000







    




Iteration 8






    



Output: [+0.500, +0.501, +0.500, +0.500, +0.500, +0.730, +0.730, +0.730, +0.730, +0.501, +0.501, +0.501, +0.501]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.001, +0.003, +0.001, +0.000, +0.001, +0.994, +0.996, +0.994, +0.996, +0.004, +0.003, +0.004, +0.003]






    




Loss: 0.00002= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.002 + -0.003 + -0.002 + -0.003 + +0.003 + +0.001 + +0.003 + +0.001 = +0.001
dL/dw2 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.001 + -0.002 + -0.001 + -0.002 + +0.003 + +0.001 + +0.003 + +0.001 = +0.005






    



w1=+0.993, w2=-0.992, w3=+1.000







    




Iteration 9






    



Output: [+0.500, +0.501, +0.500, +0.500, +0.500, +0.730, +0.730, +0.730, +0.730, +0.501, +0.501, +0.501, +0.501]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.001, +0.003, +0.001, +0.000, +0.001, +0.994, +0.995, +0.994, +0.995, +0.004, +0.003, +0.004, +0.003]






    




Loss: 0.00002= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.002 + -0.003 + -0.002 + -0.003 + +0.003 + +0.001 + +0.003 + +0.001 = +0.000
dL/dw2 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.001 + -0.002 + -0.001 + -0.002 + +0.003 + +0.001 + +0.003 + +0.001 = +0.005






    



w1=+0.993, w2=-0.992, w3=+1.000







    




Iteration 10






    



Output: [+0.500, +0.501, +0.500, +0.500, +0.500, +0.730, +0.730, +0.730, +0.730, +0.501, +0.501, +0.501, +0.501]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.001, +0.002, +0.001, +0.000, +0.001, +0.994, +0.995, +0.994, +0.995, +0.004, +0.002, +0.004, +0.002]






    




Loss: 0.00002= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.002 + -0.003 + -0.002 + -0.003 + +0.003 + +0.001 + +0.003 + +0.001 = -0.000
dL/dw2 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.001 + -0.002 + -0.001 + -0.002 + +0.003 + +0.001 + +0.003 + +0.001 = +0.004






    



w1=+0.993, w2=-0.992, w3=+1.000







    




Iteration 11






    



Output: [+0.500, +0.501, +0.500, +0.500, +0.500, +0.730, +0.730, +0.730, +0.730, +0.501, +0.501, +0.501, +0.501]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.001, +0.002, +0.001, +0.000, +0.001, +0.994, +0.995, +0.994, +0.995, +0.004, +0.002, +0.004, +0.002]






    




Loss: 0.00002= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.002 + -0.003 + -0.002 + -0.003 + +0.003 + +0.001 + +0.003 + +0.001 = -0.001
dL/dw2 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.001 + -0.002 + -0.001 + -0.002 + +0.003 + +0.001 + +0.003 + +0.001 = +0.004






    



w1=+0.993, w2=-0.992, w3=+1.000







    




Iteration 12






    



Output: [+0.500, +0.501, +0.500, +0.500, +0.500, +0.730, +0.730, +0.730, +0.730, +0.501, +0.501, +0.501, +0.501]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.001, +0.002, +0.001, +0.000, +0.001, +0.994, +0.995, +0.994, +0.995, +0.003, +0.002, +0.003, +0.002]






    




Loss: 0.00002= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.002 + -0.003 + -0.002 + -0.003 + +0.003 + +0.001 + +0.003 + +0.001 = -0.001
dL/dw2 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.001 + -0.002 + -0.001 + -0.002 + +0.003 + +0.001 + +0.003 + +0.001 = +0.003






    



w1=+0.993, w2=-0.992, w3=+1.000







    




Iteration 13






    



Output: [+0.500, +0.501, +0.500, +0.500, +0.500, +0.730, +0.730, +0.730, +0.730, +0.501, +0.501, +0.501, +0.501]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.001, +0.002, +0.001, +0.000, +0.001, +0.994, +0.995, +0.994, +0.995, +0.003, +0.002, +0.003, +0.002]






    




Loss: 0.00002= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.002 + -0.003 + -0.002 + -0.003 + +0.003 + +0.001 + +0.003 + +0.001 = -0.001
dL/dw2 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.001 + -0.002 + -0.001 + -0.002 + +0.003 + +0.001 + +0.003 + +0.001 = +0.003






    



w1=+0.993, w2=-0.992, w3=+1.000







    




Iteration 14






    



Output: [+0.500, +0.501, +0.500, +0.500, +0.500, +0.730, +0.730, +0.730, +0.730, +0.501, +0.501, +0.501, +0.501]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.001, +0.002, +0.001, +0.000, +0.001, +0.994, +0.995, +0.994, +0.995, +0.003, +0.002, +0.003, +0.002]






    




Loss: 0.00002= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.002 + -0.003 + -0.002 + -0.003 + +0.002 + +0.001 + +0.002 + +0.001 = -0.001
dL/dw2 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.001 + -0.002 + -0.001 + -0.002 + +0.002 + +0.001 + +0.002 + +0.001 = +0.003






    



w1=+0.993, w2=-0.992, w3=+1.000







    




Iteration 49






    



Output: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.730, +0.730, +0.730, +0.730, +0.501, +0.500, +0.501, +0.500]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.001, +0.002, +0.001, +0.000, +0.001, +0.994, +0.995, +0.994, +0.995, +0.003, +0.002, +0.003, +0.002]






    




Loss: 0.00001= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.002 + -0.003 + -0.002 + -0.003 + +0.002 + +0.001 + +0.002 + +0.001 = -0.002
dL/dw2 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.001 + -0.002 + -0.001 + -0.002 + +0.002 + +0.001 + +0.002 + +0.001 = +0.002






    



w1=+0.994, w2=-0.993, w3=+1.000







    




Iteration 99






    



Output: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.730, +0.730, +0.730, +0.730, +0.501, +0.500, +0.501, +0.500]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.001, +0.002, +0.001, +0.000, +0.001, +0.995, +0.996, +0.995, +0.996, +0.002, +0.002, +0.002, +0.002]






    




Loss: 0.00001= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.002 + -0.002 + -0.002 + -0.002 + +0.002 + +0.001 + +0.002 + +0.001 = -0.002
dL/dw2 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.001 + -0.002 + -0.001 + -0.002 + +0.002 + +0.001 + +0.002 + +0.001 = +0.002






    



w1=+0.994, w2=-0.994, w3=+1.000







    Out[20]:





(0.9943962680848304, -0.9935690516603158, 1.0)

Let's Try Adding Input Gate

We saw in the experiment before that there is conflicting update (at one point of the sequence the gradient is positive, while at another point it is negative), which the original paper explains that it is caused by the weight into the memory cell needs to update the memory at one point (when we see brackets in this case) and retain information at another point (when we see any other characters).

Another core feature of LSTM that was designed to resolve this issue is that it adds gates: input gate and output gate, to control the flow of information through the memory cells.

In the following, we try adding an input gate, which the network should learn to activate (value = 1) only when it sees an opening bracket or closing bracket. So basically the input gate is telling the network which inputs are relevant and which are not.

Note: Below we have two versions for the input gate: linear with sigmoid, and bilinear with bias. The $w_4$ and $w_5$ have different interpretation depending on the input gate chosen. The bilinear gate was added because the input doesn't allow the linear gate to be useful.



In [21]:

    
w4 = 1.0
w5 = 1.0
input_history = [0]
gate_history = [0]

def reset_gated():
    global memory_history, output_history, input_history, gate_history
    memory_history = [0]
    output_history = [0]
    input_history = [0]
    gate_history = [0]

def activate_input(x1, x2):
    result = (w1*x1+w2*x2)
    input_history.append(result)
    return result

def activate_gate(x1, x2, bilinear_gate=True):
    if bilinear_gate:
        result = w4 + w5*x1*x2  # Bilinear gate
    else:
        result = sigmoid(w4*x1+w5*x2)  # The true linear gate
    gate_history.append(result)
    return result

def dLdw1_gated(test_seq, gold_seq, pred_seq, state_seq, input_seq, gate_seq, info, bilinear_gate=True):
    result = 0.0
    grad_str = '<div style="font-family:monaco; font-size:12px">dL/dw1 = '
    for time_step in range(1, len(gold_seq)):
        cur_dell = (pred_seq[time_step] - gold_seq[time_step]) * w3
        cur_dell *= sum(embedding[test_seq[step-1]][0]*gate_seq[step] for step in range(1, time_step+1))
        if cur_dell < 0:
            color = 'red'
        else:
            color = 'blue'
        grad_str += '{}<span style="color:{}">{:+.3f}</span>'.format(' + ' if time_step > 1 else '', color, cur_dell)
        result += cur_dell
    grad_str += ' = <span style="color:{}; text-decoration:underline">{:+.3f}</span></div>'.format(
                'red' if result < 0 else 'blue', result)
    # printmd(grad_str)
    info[0] += grad_str
    return result

def dLdw2_gated(test_seq, gold_seq, pred_seq, state_seq, input_seq, gate_seq, info, bilinear_gate=True):
    result = 0.0
    grad_str = '<div style="font-family:monaco; font-size:12px">dL/dw2 = '
    for time_step in range(1, len(gold_seq)):
        cur_dell = (pred_seq[time_step] - gold_seq[time_step]) * w3
        cur_dell *= sum(embedding[test_seq[step-1]][1]*gate_seq[step] for step in range(1, time_step+1))
        if cur_dell < 0:
            color = 'red'
        else:
            color = 'blue'
        grad_str += '{}<span style="color:{}">{:+.3f}</span>'.format(' + ' if time_step > 1 else '', color, cur_dell)
        result += cur_dell
    grad_str += ' = <span style="color:{}; text-decoration:underline">{:+.3f}</span></div>'.format(
                'red' if result < 0 else 'blue', result)
    # printmd(grad_str)
    info[0] += grad_str
    return result

def dLdw4_gated(test_seq, gold_seq, pred_seq, state_seq, input_seq, gate_seq, info, bilinear_gate=True):
    result = 0.0
    grad_str = '<div style="font-family:monaco; font-size:12px">dL/dw4 = '
    for time_step in range(1, len(gold_seq)):
        cur_dell = (pred_seq[time_step] - gold_seq[time_step]) * w3
        if bilinear_gate:
            cur_dell *= sum(input_seq[step] for step in range(1, time_step+1))
        else:
            cur_dell *= sum(embedding[test_seq[step-1]][0]*gate_seq[step]*input_seq[step]*(1-gate_seq[step])
                            for step in range(1,time_step+1))
        if cur_dell < 0:
            color = 'red'
        else:
            color = 'blue'
        grad_str += '{}<span style="color:{}">{:+.3f}</span>'.format(' + ' if time_step > 1 else '', color, cur_dell)
        result += cur_dell
    grad_str += ' = <span style="color:{}; text-decoration:underline">{:+.3f}</span></div>'.format(
                    'red' if result < 0 else 'blue', result)
    # printmd(grad_str)
    info[0] += grad_str
    return result

def dLdw5_gated(test_seq, gold_seq, pred_seq, state_seq, input_seq, gate_seq, info, bilinear_gate=True):
    result = 0.0
    grad_str = '<div style="font-family:monaco; font-size:12px">dL/dw5 = '
    for time_step in range(1, len(gold_seq)):
        cur_dell = (pred_seq[time_step] - gold_seq[time_step]) * w3
        if bilinear_gate:
            cur_dell *= sum(embedding[test_seq[step-1]][0]*embedding[test_seq[step-1]][1]*input_seq[step]
                            for step in range(1, time_step+1))
        else:
            cur_dell *= sum(embedding[test_seq[step-1]][1]*gate_seq[step]*input_seq[step]*(1-gate_seq[step])
                            for step in range(1,time_step+1))
        if cur_dell < 0:
            color = 'red'
        else:
            color = 'blue'
        grad_str += '{}<span style="color:{}">{:+.3f}</span>'.format(' + ' if time_step > 1 else '', color, cur_dell)
        result += cur_dell
    grad_str += ' = <span style="color:{}; text-decoration:underline">{:+.3f}</span></div>'.format(
                    'red' if result < 0 else 'blue', result)
    # printmd(grad_str)
    info[0] += grad_str
    return result

def activate_memory_gated():
    memory_history.append(ws*memory_history[-1] + input_history[-1]*gate_history[-1])
    return memory_history[-1]

def predict_gated(seq):
    for char in seq:
        activate_input(*embedding[char])
        activate_gate(*embedding[char])
        activate_output(activate_memory_gated())
    result = output_history[:]
    return result

def experiment_gated(test_seq, _w1=1.0, _w2=1.0, _w3=1.0, _w4=1.0, _w5=1.0, alpha=1e-1, max_iter=750,
                     bilinear_gate=True, fixed_w3=True, fixed_w4=False, fixed_w5=False):
    global w1, w2, w3, w4, w5
    reset_gated()
    w1 = _w1
    w2 = _w2
    w3 = _w3
    w4 = _w4
    w5 = _w5
    correct = gold(test_seq)
    print('w1={:+.3f}, w2={:+.3f}, w3={:+.3f}, w4={:+.3f}, w5={:+.3f}'.format(w1, w2, w3, w4, w5))
    for iter_num in range(max_iter):
        result = predict_gated(test_seq)
        if iter_num < 15 or (iter_num % 50 == 49):
            printmd('<div style="font-weight:bold">Iteration {}</div>'.format(iter_num))
            print('Output: ', end='')
            print_list(result[1:])
            print('Target: ', end='')
            print_list(correct[1:])
            print('Memory: ', end='')
            print_list(memory_history[1:])
            print('Input : ', end='')
            print_list(input_history[1:])
            print('Gate  : ', end='')
            print_list(gate_history[1:])
        total_loss, per_position_loss = loss(correct[1:], result[1:])
        info = ['', iter_num]
        info[0] = ('<div>Loss: <span style="font-weight:bold">{:.5f}</span>' +
                   '= <span style="font-family:monaco">').format(total_loss)
        for idx, per_pos_loss in enumerate(per_position_loss):
            info[0] += '{}{:.3f}'.format(' + ' if idx > 0 else '', per_pos_loss)
        info[0] += '</span></div>'
        # printmd(loss_str)
        w1 -= alpha * dLdw1_gated(test_seq, correct, result, memory_history, input_history, gate_history,
                                  info, bilinear_gate)
        w2 -= alpha * dLdw2_gated(test_seq, correct, result, memory_history, input_history, gate_history,
                                  info, bilinear_gate)
        if not fixed_w3:
            w3 -= alpha * dLdw3(test_seq, correct, result, memory_history, info, bilinear_gate)
        if not fixed_w4:
            w4 -= alpha * dLdw4_gated(test_seq, correct, result, memory_history, input_history, gate_history,
                                      info, bilinear_gate)
        if not fixed_w5:
            w5 -= alpha * dLdw5_gated(test_seq, correct, result, memory_history, input_history, gate_history,
                                      info, bilinear_gate)
        if iter_num < 15 or (iter_num % 50 == 49):
            printmd(info[0])
            print('w1={:+.3f}, w2={:+.3f}, w3={:+.3f}, w4={:+.3f}, w5={:+.3f}'.format(w1, w2, w3, w4, w5))
            print()
        reset_gated()
    return w1, w2, w3, w4, w5



In [22]:

    
embedding['a'] = (1.0, 1)
embedding['b'] = (-1, -1)
embedding['('] = (1, 0)
embedding[')'] = (0, 1)

experiment_gated('ab(ab)bb', _w1=1.0, _w2=1.0, _w4=1.0, _w5=1.0, alpha=1e-1, max_iter=250, fixed_w3=True)









    



w1=+1.000, w2=+1.000, w3=+1.000, w4=+1.000, w5=+1.000






    




Iteration 0






    



Output: [+0.982, +0.500, +0.731, +0.993, +0.731, +0.881, +0.119, +0.002]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+4.000, +0.000, +1.000, +5.000, +1.000, +2.000, -2.000, -6.000]
Input : [+2.000, -2.000, +1.000, +2.000, -2.000, +1.000, -2.000, -2.000]
Gate  : [+2.000, +2.000, +1.000, +2.000, +2.000, +1.000, +2.000, +2.000]






    




Loss: 5.27111= 1.325 + 0.000 + 0.000 + 0.769 + 0.000 + 0.434 + 0.434 + 2.309
dL/dw1 = +0.964 + +0.000 + +0.000 + +0.787 + +0.000 + +0.381 + +0.381 + +1.493 = +4.005
dL/dw2 = +0.964 + +0.000 + +0.000 + +0.524 + +0.000 + +0.381 + +0.381 + +1.493 = +3.743
dL/dw4 = +0.964 + +0.000 + +0.000 + +0.787 + +0.000 + +0.762 + -0.000 + +0.995 = +3.507
dL/dw5 = +0.964 + +0.000 + +0.000 + +0.524 + +0.000 + +0.000 + +0.762 + +1.990 = +4.240






    



w1=+0.600, w2=+0.626, w3=+1.000, w4=+0.649, w5=+0.576







    




Iteration 1






    



Output: [+0.818, +0.500, +0.596, +0.869, +0.596, +0.689, +0.331, +0.099]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+1.501, +0.000, +0.389, +1.890, +0.389, +0.795, -0.706, -2.207]
Input : [+1.225, -1.225, +0.600, +1.225, -1.225, +0.626, -1.225, -1.225]
Gate  : [+1.225, +1.225, +0.649, +1.225, +1.225, +0.649, +1.225, +1.225]






    




Loss: 1.05796= 0.259 + 0.000 + 0.040 + 0.067 + 0.040 + 0.077 + 0.061 + 0.515
dL/dw1 = +0.389 + +0.000 + -0.088 + +0.258 + -0.088 + +0.123 + +0.098 + +0.722 = +1.415
dL/dw2 = +0.389 + +0.000 + -0.000 + +0.169 + -0.000 + +0.123 + +0.098 + +0.722 = +1.500
dL/dw4 = +0.389 + +0.000 + -0.081 + +0.251 + -0.081 + +0.232 + -0.000 + +0.491 = +1.202
dL/dw5 = +0.389 + +0.000 + -0.000 + +0.169 + -0.000 + +0.000 + +0.208 + +0.982 = +1.748






    



w1=+0.458, w2=+0.476, w3=+1.000, w4=+0.529, w5=+0.401







    




Iteration 2






    



Output: [+0.704, +0.500, +0.560, +0.752, +0.560, +0.621, +0.407, +0.224]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.869, +0.000, +0.242, +1.111, +0.242, +0.494, -0.375, -1.243]
Input : [+0.934, -0.934, +0.458, +0.934, -0.934, +0.476, -0.934, -0.934]
Gate  : [+0.930, +0.930, +0.529, +0.930, +0.930, +0.529, +0.930, +0.930]






    




Loss: 0.44676= 0.091 + 0.000 + 0.062 + 0.001 + 0.062 + 0.030 + 0.017 + 0.182
dL/dw1 = +0.190 + +0.000 + -0.090 + +0.031 + -0.090 + +0.064 + +0.037 + +0.368 = +0.509
dL/dw2 = +0.190 + +0.000 + -0.000 + +0.020 + -0.000 + +0.064 + +0.037 + +0.368 = +0.679
dL/dw4 = +0.191 + +0.000 + -0.078 + +0.030 + -0.078 + +0.113 + +0.000 + +0.258 = +0.435
dL/dw5 = +0.191 + +0.000 + -0.000 + +0.020 + -0.000 + +0.000 + +0.086 + +0.516 = +0.813






    



w1=+0.407, w2=+0.408, w3=+1.000, w4=+0.486, w5=+0.320







    




Iteration 3






    



Output: [+0.658, +0.500, +0.549, +0.701, +0.549, +0.598, +0.435, +0.286]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.656, +0.000, +0.198, +0.854, +0.198, +0.396, -0.261, -0.917]
Input : [+0.815, -0.815, +0.407, +0.815, -0.815, +0.408, -0.815, -0.815]
Gate  : [+0.806, +0.806, +0.486, +0.806, +0.806, +0.486, +0.806, +0.806]






    




Loss: 0.32491= 0.053 + 0.000 + 0.070 + 0.002 + 0.070 + 0.019 + 0.008 + 0.102
dL/dw1 = +0.128 + +0.000 + -0.088 + -0.038 + -0.088 + +0.047 + +0.021 + +0.241 = +0.222
dL/dw2 = +0.128 + +0.000 + -0.000 + -0.024 + -0.000 + +0.047 + +0.021 + +0.241 = +0.413
dL/dw4 = +0.129 + +0.000 + -0.074 + -0.036 + -0.074 + +0.080 + -0.000 + +0.175 = +0.199
dL/dw5 = +0.129 + +0.000 + -0.000 + -0.024 + -0.000 + +0.000 + +0.053 + +0.350 = +0.507






    



w1=+0.385, w2=+0.366, w3=+1.000, w4=+0.466, w5=+0.269







    




Iteration 4






    



Output: [+0.635, +0.500, +0.545, +0.675, +0.545, +0.587, +0.450, +0.320]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.552, +0.000, +0.179, +0.731, +0.179, +0.350, -0.202, -0.754]
Input : [+0.751, -0.751, +0.385, +0.751, -0.751, +0.366, -0.751, -0.751]
Gate  : [+0.735, +0.735, +0.466, +0.735, +0.735, +0.466, +0.735, +0.735]






    




Loss: 0.28193= 0.038 + 0.000 + 0.074 + 0.007 + 0.074 + 0.015 + 0.005 + 0.070
dL/dw1 = +0.099 + +0.000 + -0.087 + -0.067 + -0.087 + +0.040 + +0.014 + +0.181 = +0.093
dL/dw2 = +0.099 + +0.000 + -0.000 + -0.041 + -0.000 + +0.040 + +0.014 + +0.181 = +0.293
dL/dw4 = +0.101 + +0.000 + -0.072 + -0.064 + -0.072 + +0.065 + -0.000 + +0.135 = +0.095
dL/dw5 = +0.101 + +0.000 + -0.000 + -0.042 + -0.000 + +0.000 + +0.038 + +0.271 = +0.368






    



w1=+0.376, w2=+0.337, w3=+1.000, w4=+0.456, w5=+0.232







    




Iteration 5






    



Output: [+0.620, +0.500, +0.543, +0.660, +0.543, +0.581, +0.459, +0.342]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.491, +0.000, +0.171, +0.662, +0.171, +0.325, -0.166, -0.657]
Input : [+0.713, -0.713, +0.376, +0.713, -0.713, +0.337, -0.713, -0.713]
Gate  : [+0.689, +0.689, +0.456, +0.689, +0.689, +0.456, +0.689, +0.689]






    




Loss: 0.26115= 0.030 + 0.000 + 0.075 + 0.012 + 0.075 + 0.013 + 0.003 + 0.053
dL/dw1 = +0.083 + +0.000 + -0.086 + -0.082 + -0.086 + +0.037 + +0.010 + +0.146 = +0.022
dL/dw2 = +0.083 + +0.000 + -0.000 + -0.049 + -0.000 + +0.037 + +0.010 + +0.146 = +0.226
dL/dw4 = +0.086 + +0.000 + -0.071 + -0.078 + -0.071 + +0.057 + +0.000 + +0.113 = +0.037
dL/dw5 = +0.086 + +0.000 + -0.000 + -0.051 + -0.000 + +0.000 + +0.029 + +0.226 = +0.290






    



w1=+0.373, w2=+0.315, w3=+1.000, w4=+0.453, w5=+0.203







    




Iteration 6






    



Output: [+0.611, +0.500, +0.542, +0.650, +0.542, +0.577, +0.465, +0.356]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.451, +0.000, +0.169, +0.620, +0.169, +0.311, -0.140, -0.591]
Input : [+0.688, -0.688, +0.373, +0.688, -0.688, +0.315, -0.688, -0.688]
Gate  : [+0.656, +0.656, +0.453, +0.656, +0.656, +0.453, +0.656, +0.656]






    




Loss: 0.24872= 0.025 + 0.000 + 0.075 + 0.015 + 0.075 + 0.012 + 0.002 + 0.043
dL/dw1 = +0.073 + +0.000 + -0.085 + -0.090 + -0.085 + +0.035 + +0.007 + +0.123 = -0.022
dL/dw2 = +0.073 + +0.000 + -0.000 + -0.053 + -0.000 + +0.035 + +0.007 + +0.123 = +0.185
dL/dw4 = +0.076 + +0.000 + -0.071 + -0.086 + -0.071 + +0.053 + -0.000 + +0.099 = +0.001
dL/dw5 = +0.076 + +0.000 + -0.000 + -0.056 + -0.000 + +0.000 + +0.024 + +0.198 = +0.242






    



w1=+0.376, w2=+0.296, w3=+1.000, w4=+0.452, w5=+0.179







    




Iteration 7






    



Output: [+0.604, +0.500, +0.542, +0.644, +0.542, +0.575, +0.470, +0.367]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.424, +0.000, +0.170, +0.594, +0.170, +0.304, -0.120, -0.545]
Input : [+0.672, -0.672, +0.376, +0.672, -0.672, +0.296, -0.672, -0.672]
Gate  : [+0.631, +0.631, +0.452, +0.631, +0.631, +0.452, +0.631, +0.631]






    




Loss: 0.23997= 0.022 + 0.000 + 0.075 + 0.017 + 0.075 + 0.011 + 0.002 + 0.037
dL/dw1 = +0.066 + +0.000 + -0.085 + -0.094 + -0.085 + +0.034 + +0.005 + +0.108 = -0.052
dL/dw2 = +0.066 + +0.000 + -0.000 + -0.055 + -0.000 + +0.034 + +0.005 + +0.108 = +0.158
dL/dw4 = +0.070 + +0.000 + -0.071 + -0.091 + -0.071 + +0.051 + -0.000 + +0.089 = -0.023
dL/dw5 = +0.070 + +0.000 + -0.000 + -0.058 + -0.000 + +0.000 + +0.020 + +0.178 = +0.211






    



w1=+0.381, w2=+0.280, w3=+1.000, w4=+0.455, w5=+0.158







    




Iteration 8






    



Output: [+0.600, +0.500, +0.543, +0.641, +0.543, +0.575, +0.474, +0.375]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.405, +0.000, +0.173, +0.578, +0.173, +0.301, -0.104, -0.510]
Input : [+0.661, -0.661, +0.381, +0.661, -0.661, +0.280, -0.661, -0.661]
Gate  : [+0.613, +0.613, +0.455, +0.613, +0.613, +0.455, +0.613, +0.613]






    




Loss: 0.23302= 0.020 + 0.000 + 0.075 + 0.019 + 0.075 + 0.011 + 0.001 + 0.032
dL/dw1 = +0.061 + +0.000 + -0.085 + -0.097 + -0.085 + +0.034 + +0.004 + +0.096 = -0.072
dL/dw2 = +0.061 + +0.000 + -0.000 + -0.055 + -0.000 + +0.034 + +0.004 + +0.096 = +0.140
dL/dw4 = +0.066 + +0.000 + -0.072 + -0.094 + -0.072 + +0.049 + +0.000 + +0.082 = -0.040
dL/dw5 = +0.066 + +0.000 + -0.000 + -0.060 + -0.000 + +0.000 + +0.017 + +0.165 = +0.188






    



w1=+0.388, w2=+0.266, w3=+1.000, w4=+0.459, w5=+0.139







    




Iteration 9






    



Output: [+0.597, +0.500, +0.544, +0.639, +0.544, +0.574, +0.477, +0.382]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.391, +0.000, +0.178, +0.569, +0.178, +0.300, -0.091, -0.482]
Input : [+0.654, -0.654, +0.388, +0.654, -0.654, +0.266, -0.654, -0.654]
Gate  : [+0.598, +0.598, +0.459, +0.598, +0.598, +0.459, +0.598, +0.598]






    




Loss: 0.22701= 0.019 + 0.000 + 0.074 + 0.019 + 0.074 + 0.011 + 0.001 + 0.029
dL/dw1 = +0.058 + +0.000 + -0.086 + -0.098 + -0.086 + +0.034 + +0.003 + +0.087 = -0.087
dL/dw2 = +0.058 + +0.000 + -0.000 + -0.055 + -0.000 + +0.034 + +0.003 + +0.087 = +0.127
dL/dw4 = +0.063 + +0.000 + -0.072 + -0.096 + -0.072 + +0.049 + -0.000 + +0.077 = -0.052
dL/dw5 = +0.063 + +0.000 + -0.000 + -0.061 + -0.000 + +0.000 + +0.015 + +0.155 = +0.172






    



w1=+0.397, w2=+0.254, w3=+1.000, w4=+0.464, w5=+0.122







    




Iteration 10






    



Output: [+0.594, +0.500, +0.546, +0.638, +0.546, +0.575, +0.480, +0.387]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.381, +0.000, +0.184, +0.565, +0.184, +0.302, -0.079, -0.460]
Input : [+0.650, -0.650, +0.397, +0.650, -0.650, +0.254, -0.650, -0.650]
Gate  : [+0.586, +0.586, +0.464, +0.586, +0.586, +0.464, +0.586, +0.586]






    




Loss: 0.22150= 0.018 + 0.000 + 0.073 + 0.020 + 0.073 + 0.011 + 0.001 + 0.026
dL/dw1 = +0.055 + +0.000 + -0.086 + -0.098 + -0.086 + +0.035 + +0.002 + +0.080 = -0.098
dL/dw2 = +0.055 + +0.000 + -0.000 + -0.055 + -0.000 + +0.035 + +0.002 + +0.080 = +0.118
dL/dw4 = +0.061 + +0.000 + -0.073 + -0.098 + -0.073 + +0.049 + +0.000 + +0.074 = -0.061
dL/dw5 = +0.061 + +0.000 + -0.000 + -0.061 + -0.000 + +0.000 + +0.013 + +0.147 = +0.160






    



w1=+0.406, w2=+0.242, w3=+1.000, w4=+0.470, w5=+0.106







    




Iteration 11






    



Output: [+0.592, +0.500, +0.548, +0.637, +0.548, +0.576, +0.483, +0.391]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.373, +0.000, +0.191, +0.564, +0.191, +0.305, -0.069, -0.442]
Input : [+0.648, -0.648, +0.406, +0.648, -0.648, +0.242, -0.648, -0.648]
Gate  : [+0.576, +0.576, +0.470, +0.576, +0.576, +0.470, +0.576, +0.576]






    




Loss: 0.21627= 0.017 + 0.000 + 0.071 + 0.020 + 0.071 + 0.012 + 0.001 + 0.024
dL/dw1 = +0.053 + +0.000 + -0.086 + -0.098 + -0.086 + +0.036 + +0.002 + +0.074 = -0.106
dL/dw2 = +0.053 + +0.000 + -0.000 + -0.054 + -0.000 + +0.036 + +0.002 + +0.074 = +0.111
dL/dw4 = +0.060 + +0.000 + -0.075 + -0.099 + -0.075 + +0.049 + -0.000 + +0.070 = -0.069
dL/dw5 = +0.060 + +0.000 + -0.000 + -0.061 + -0.000 + +0.000 + +0.011 + +0.141 = +0.151






    



w1=+0.417, w2=+0.231, w3=+1.000, w4=+0.477, w5=+0.091







    




Iteration 12






    



Output: [+0.591, +0.500, +0.550, +0.638, +0.550, +0.577, +0.485, +0.395]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.368, +0.000, +0.199, +0.567, +0.199, +0.309, -0.059, -0.427]
Input : [+0.648, -0.648, +0.417, +0.648, -0.648, +0.231, -0.648, -0.648]
Gate  : [+0.568, +0.568, +0.477, +0.568, +0.568, +0.477, +0.568, +0.568]






    




Loss: 0.21120= 0.017 + 0.000 + 0.070 + 0.020 + 0.070 + 0.012 + 0.000 + 0.023
dL/dw1 = +0.052 + +0.000 + -0.087 + -0.097 + -0.087 + +0.037 + +0.001 + +0.069 = -0.112
dL/dw2 = +0.052 + +0.000 + -0.000 + -0.053 + -0.000 + +0.037 + +0.001 + +0.069 = +0.106
dL/dw4 = +0.059 + +0.000 + -0.076 + -0.099 + -0.076 + +0.050 + -0.000 + +0.068 = -0.074
dL/dw5 = +0.059 + +0.000 + -0.000 + -0.060 + -0.000 + +0.000 + +0.010 + +0.136 = +0.144






    



w1=+0.428, w2=+0.220, w3=+1.000, w4=+0.484, w5=+0.076







    




Iteration 13






    



Output: [+0.590, +0.500, +0.552, +0.639, +0.552, +0.578, +0.488, +0.398]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.363, +0.000, +0.207, +0.571, +0.207, +0.314, -0.050, -0.413]
Input : [+0.648, -0.648, +0.428, +0.648, -0.648, +0.220, -0.648, -0.648]
Gate  : [+0.561, +0.561, +0.484, +0.561, +0.561, +0.484, +0.561, +0.561]






    




Loss: 0.20622= 0.016 + 0.000 + 0.068 + 0.019 + 0.068 + 0.012 + 0.000 + 0.021
dL/dw1 = +0.050 + +0.000 + -0.087 + -0.096 + -0.087 + +0.038 + +0.001 + +0.065 = -0.116
dL/dw2 = +0.050 + +0.000 + -0.000 + -0.052 + -0.000 + +0.038 + +0.001 + +0.065 = +0.102
dL/dw4 = +0.058 + +0.000 + -0.077 + -0.099 + -0.077 + +0.050 + -0.000 + +0.066 = -0.078
dL/dw5 = +0.058 + +0.000 + -0.000 + -0.060 + -0.000 + +0.000 + +0.008 + +0.132 = +0.139






    



w1=+0.440, w2=+0.210, w3=+1.000, w4=+0.492, w5=+0.063







    




Iteration 14






    



Output: [+0.589, +0.500, +0.554, +0.640, +0.554, +0.579, +0.490, +0.401]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.360, +0.000, +0.216, +0.577, +0.216, +0.320, -0.041, -0.401]
Input : [+0.650, -0.650, +0.440, +0.650, -0.650, +0.210, -0.650, -0.650]
Gate  : [+0.555, +0.555, +0.492, +0.555, +0.555, +0.492, +0.555, +0.555]






    




Loss: 0.20131= 0.016 + 0.000 + 0.067 + 0.019 + 0.067 + 0.013 + 0.000 + 0.020
dL/dw1 = +0.049 + +0.000 + -0.087 + -0.095 + -0.087 + +0.039 + +0.001 + +0.061 = -0.119
dL/dw2 = +0.049 + +0.000 + -0.000 + -0.050 + -0.000 + +0.039 + +0.001 + +0.061 = +0.100
dL/dw4 = +0.058 + +0.000 + -0.078 + -0.099 + -0.078 + +0.051 + -0.000 + +0.064 = -0.081
dL/dw5 = +0.058 + +0.000 + -0.000 + -0.059 + -0.000 + +0.000 + +0.007 + +0.129 = +0.134






    



w1=+0.452, w2=+0.200, w3=+1.000, w4=+0.500, w5=+0.049







    




Iteration 49






    



Output: [+0.572, +0.500, +0.637, +0.701, +0.637, +0.608, +0.537, +0.464]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.291, +0.000, +0.561, +0.852, +0.561, +0.440, +0.149, -0.143]
Input : [+0.595, -0.595, +0.759, +0.595, -0.595, -0.164, -0.595, -0.595]
Gate  : [+0.490, +0.490, +0.739, +0.490, +0.490, +0.739, +0.490, +0.490]






    




Loss: 0.08241= 0.011 + 0.000 + 0.020 + 0.002 + 0.020 + 0.024 + 0.003 + 0.003
dL/dw1 = +0.035 + +0.000 + -0.070 + -0.037 + -0.070 + +0.080 + +0.009 + +0.009 = -0.043
dL/dw2 = +0.035 + +0.000 + -0.000 + -0.015 + -0.000 + +0.080 + +0.009 + +0.009 = +0.119
dL/dw4 = +0.043 + +0.000 + -0.072 + -0.041 + -0.072 + +0.064 + +0.000 + +0.021 = -0.055
dL/dw5 = +0.043 + +0.000 + -0.000 + -0.018 + -0.000 + +0.000 + -0.022 + +0.042 = +0.046






    



w1=+0.763, w2=-0.175, w3=+1.000, w4=+0.745, w5=-0.254







    




Iteration 99






    



Output: [+0.528, +0.500, +0.700, +0.723, +0.700, +0.543, +0.515, +0.487]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.111, +0.000, +0.848, +0.960, +0.848, +0.171, +0.059, -0.052]
Input : [+0.168, -0.168, +0.837, +0.168, -0.168, -0.669, -0.168, -0.168]
Gate  : [+0.661, +0.661, +1.013, +0.661, +0.661, +1.013, +0.661, +0.661]






    




Loss: 0.01076= 0.002 + 0.000 + 0.002 + 0.000 + 0.002 + 0.004 + 0.000 + 0.000
dL/dw1 = +0.018 + +0.000 + -0.031 + -0.013 + -0.031 + +0.043 + +0.005 + +0.004 = -0.005
dL/dw2 = +0.018 + +0.000 + -0.000 + -0.005 + -0.000 + +0.043 + +0.005 + +0.004 = +0.065
dL/dw4 = +0.005 + +0.000 + -0.026 + -0.008 + -0.026 + +0.007 + +0.000 + +0.002 = -0.046
dL/dw5 = +0.005 + +0.000 + -0.000 + -0.001 + -0.000 + +0.000 + -0.002 + +0.004 = +0.005






    



w1=+0.838, w2=-0.675, w3=+1.000, w4=+1.018, w5=-0.353







    




Iteration 149






    



Output: [+0.504, +0.500, +0.726, +0.730, +0.726, +0.506, +0.502, +0.498]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.017, +0.000, +0.975, +0.993, +0.975, +0.025, +0.008, -0.009]
Input : [+0.022, -0.022, +0.856, +0.022, -0.022, -0.834, -0.022, -0.022]
Gate  : [+0.780, +0.780, +1.140, +0.780, +0.780, +1.140, +0.780, +0.780]






    




Loss: 0.00026= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.003 + +0.000 + -0.006 + -0.003 + -0.006 + +0.007 + +0.001 + +0.001 = -0.002
dL/dw2 = +0.003 + +0.000 + -0.000 + -0.001 + -0.000 + +0.007 + +0.001 + +0.001 = +0.011
dL/dw4 = +0.000 + +0.000 + -0.004 + -0.001 + -0.004 + +0.000 + +0.000 + +0.000 = -0.009
dL/dw5 = +0.000 + +0.000 + -0.000 + -0.000 + -0.000 + +0.000 + -0.000 + +0.000 = +0.000






    



w1=+0.856, w2=-0.835, w3=+1.000, w4=+1.141, w5=-0.360







    




Iteration 199






    



Output: [+0.500, +0.500, +0.730, +0.731, +0.730, +0.501, +0.500, +0.500]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.002, +0.000, +0.997, +0.999, +0.997, +0.003, +0.001, -0.001]
Input : [+0.002, -0.002, +0.859, +0.002, -0.002, -0.857, -0.002, -0.002]
Gate  : [+0.800, +0.800, +1.160, +0.800, +0.800, +1.160, +0.800, +0.800]






    




Loss: 0.00000= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + +0.000 + -0.001 + -0.000 + -0.001 + +0.001 + +0.000 + +0.000 = -0.000
dL/dw2 = +0.000 + +0.000 + -0.000 + -0.000 + -0.000 + +0.001 + +0.000 + +0.000 = +0.001
dL/dw4 = +0.000 + +0.000 + -0.000 + -0.000 + -0.000 + +0.000 + +0.000 + +0.000 = -0.001
dL/dw5 = +0.000 + +0.000 + -0.000 + -0.000 + -0.000 + +0.000 + -0.000 + +0.000 = +0.000






    



w1=+0.860, w2=-0.857, w3=+1.000, w4=+1.160, w5=-0.360







    




Iteration 249






    



Output: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.000, +0.000, +1.000, +1.000, +1.000, +0.000, +0.000, -0.000]
Input : [+0.000, -0.000, +0.860, +0.000, -0.000, -0.860, -0.000, -0.000]
Gate  : [+0.803, +0.803, +1.162, +0.803, +0.803, +1.162, +0.803, +0.803]






    




Loss: 0.00000= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + +0.000 + -0.000 + -0.000 + -0.000 + +0.000 + +0.000 + +0.000 = -0.000
dL/dw2 = +0.000 + +0.000 + -0.000 + -0.000 + -0.000 + +0.000 + +0.000 + +0.000 = +0.000
dL/dw4 = +0.000 + +0.000 + -0.000 + -0.000 + -0.000 + +0.000 + +0.000 + +0.000 = -0.000
dL/dw5 = +0.000 + +0.000 + -0.000 + -0.000 + -0.000 + +0.000 + -0.000 + +0.000 = +0.000






    



w1=+0.860, w2=-0.860, w3=+1.000, w4=+1.162, w5=-0.360







    Out[22]:





(0.8599643112688363,
 -0.859699702729852,
 1.0,
 1.162480041288255,
 -0.3597956247833731)

Discussion

We see that after adding input gate (assuming it is possible for the input gate to exhibit the same properties as the true input gate, manifested by using bilinear gate here), can reach the optimal (loss = 0.0) faster (after iteration 199) compared to the one without input gate (only after iteration 249), although there are more parameters to learn with the input gate (two more: $w_4$ and $w_5$) and that the initial loss is higher with input gate (due to the incorrect gate value initially).

Also we see that the gate learned is actually not the true gate that we want. This is because the input is already separable even without input gate.

Noisy embedding

In previous experiment, the input gate learned is not the true gate that we want, but that's because the input embedding is ideal, i.e., it allows separation even without input gate.

Now let's experiment with noisy embedding, in which the true function cannot be obtained without input gate.



In [23]:

    
import random
a_1 = 1.0 + 0.2*(random.random()-0.5)
a_2 = 1.0/a_1
b_1 = -1.0 + 0.2*(random.random()-0.5)
b_2 = 1.0/b_1
embedding['a'] = (a_1, a_2)
embedding['b'] = (b_1, b_2)
embedding['('] = (1, 0)
embedding[')'] = (0, 1)

from pprint import pprint
pprint(embedding)









    



{'(': (1, 0),
 ')': (0, 1),
 'a': (1.0367606488244225, 0.9645427815319715),
 'b': (-0.9331636393562619, -1.0716234086122824)}

Here we make the input embedding such that the other characters have noise which should be ignored.

Let's see how the two models perform in this case.



In [24]:

    
embedding['a'] = (a_1, a_2)
embedding['b'] = (b_1, b_2)
embedding['('] = (1, 0)
embedding[')'] = (0, 1)

experiment('ab(ab)bb', _w1=1.0, _w2=1.0, alpha=1e-1, max_iter=250, fixed_w3=True)









    



w1=+1.000, w2=+1.000, w3=+1.000






    




Iteration 0






    



Output: [+0.881, +0.499, +0.730, +0.952, +0.730, +0.880, +0.497, +0.117]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+2.001, -0.003, +0.997, +2.998, +0.993, +1.993, -0.012, -2.017]






    




Loss: 1.57825= 0.434 + 0.000 + 0.000 + 0.273 + 0.000 + 0.431 + 0.000 + 0.440
dL/dw1 = +0.395 + -0.000 + -0.001 + +0.474 + -0.002 + +0.459 + -0.001 + +0.252 = +1.576
dL/dw2 = +0.367 + +0.000 + +0.000 + +0.190 + +0.000 + +0.299 + +0.001 + +0.519 = +1.376






    



w1=+0.842, w2=+0.862, w3=+1.000







    




Iteration 1






    



Output: [+0.846, +0.499, +0.698, +0.927, +0.697, +0.845, +0.496, +0.151]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+1.705, -0.005, +0.837, +2.542, +0.832, +1.695, -0.016, -1.726]






    




Loss: 1.16557= 0.326 + 0.000 + 0.003 + 0.177 + 0.003 + 0.323 + 0.000 + 0.334
dL/dw1 = +0.359 + -0.000 + -0.037 + +0.420 + -0.041 + +0.416 + -0.001 + +0.230 = +1.346
dL/dw2 = +0.334 + +0.000 + +0.004 + +0.168 + +0.007 + +0.271 + +0.001 + +0.474 = +1.259






    



w1=+0.708, w2=+0.736, w3=+1.000







    




Iteration 2






    



Output: [+0.809, +0.499, +0.669, +0.895, +0.667, +0.807, +0.496, +0.188]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+1.444, -0.006, +0.702, +2.146, +0.697, +1.433, -0.017, -1.466]






    




Loss: 0.85001= 0.241 + 0.000 + 0.009 + 0.106 + 0.009 + 0.237 + 0.000 + 0.248
dL/dw1 = +0.320 + -0.000 + -0.069 + +0.352 + -0.077 + +0.371 + -0.001 + +0.206 = +1.102
dL/dw2 = +0.298 + +0.000 + +0.007 + +0.141 + +0.014 + +0.242 + +0.001 + +0.424 = +1.126






    



w1=+0.598, w2=+0.624, w3=+1.000







    




Iteration 3






    



Output: [+0.772, +0.499, +0.644, +0.860, +0.643, +0.771, +0.496, +0.224]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+1.221, -0.005, +0.593, +1.814, +0.588, +1.212, -0.015, -1.241]






    




Loss: 0.62204= 0.176 + 0.000 + 0.017 + 0.057 + 0.018 + 0.173 + 0.000 + 0.181
dL/dw1 = +0.282 + -0.000 + -0.096 + +0.276 + -0.106 + +0.327 + -0.001 + +0.182 = +0.863
dL/dw2 = +0.263 + +0.000 + +0.009 + +0.110 + +0.019 + +0.213 + +0.001 + +0.374 = +0.989






    



w1=+0.511, w2=+0.525, w3=+1.000







    




Iteration 4






    



Output: [+0.738, +0.499, +0.624, +0.824, +0.624, +0.737, +0.498, +0.259]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+1.036, -0.003, +0.508, +1.544, +0.505, +1.030, -0.010, -1.050]






    




Loss: 0.46547= 0.129 + 0.000 + 0.025 + 0.027 + 0.026 + 0.127 + 0.000 + 0.132
dL/dw1 = +0.247 + -0.000 + -0.118 + +0.199 + -0.130 + +0.286 + -0.001 + +0.159 = +0.642
dL/dw2 = +0.230 + +0.000 + +0.011 + +0.080 + +0.023 + +0.186 + +0.001 + +0.327 = +0.858






    



w1=+0.447, w2=+0.439, w3=+1.000







    




Iteration 5






    



Output: [+0.708, +0.500, +0.610, +0.791, +0.610, +0.708, +0.499, +0.291]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.887, -0.001, +0.446, +1.333, +0.446, +0.885, -0.003, -0.891]






    




Loss: 0.36166= 0.095 + 0.000 + 0.033 + 0.010 + 0.033 + 0.095 + 0.000 + 0.096
dL/dw1 = +0.216 + -0.000 + -0.134 + +0.129 + -0.147 + +0.251 + -0.000 + +0.138 = +0.453
dL/dw2 = +0.201 + +0.000 + +0.013 + +0.052 + +0.026 + +0.163 + +0.000 + +0.284 = +0.739






    



w1=+0.402, w2=+0.365, w3=+1.000







    




Iteration 6






    



Output: [+0.683, +0.501, +0.600, +0.764, +0.600, +0.684, +0.501, +0.319]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.769, +0.003, +0.404, +1.173, +0.407, +0.772, +0.006, -0.761]






    




Loss: 0.29370= 0.072 + 0.000 + 0.038 + 0.003 + 0.037 + 0.073 + 0.000 + 0.071
dL/dw1 = +0.190 + +0.000 + -0.145 + +0.070 + -0.158 + +0.222 + +0.000 + +0.120 = +0.299
dL/dw2 = +0.177 + -0.000 + +0.014 + +0.028 + +0.028 + +0.145 + -0.000 + +0.246 = +0.637






    



w1=+0.372, w2=+0.302, w3=+1.000







    




Iteration 7






    



Output: [+0.663, +0.502, +0.593, +0.742, +0.595, +0.665, +0.504, +0.342]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.676, +0.006, +0.378, +1.054, +0.384, +0.686, +0.016, -0.654]






    




Loss: 0.24857= 0.056 + 0.000 + 0.041 + 0.000 + 0.040 + 0.058 + 0.000 + 0.053
dL/dw1 = +0.169 + +0.000 + -0.152 + +0.023 + -0.164 + +0.199 + +0.001 + +0.104 = +0.180
dL/dw2 = +0.157 + -0.000 + +0.015 + +0.009 + +0.029 + +0.130 + -0.001 + +0.214 = +0.553






    



w1=+0.354, w2=+0.246, w3=+1.000







    




Iteration 8






    



Output: [+0.647, +0.503, +0.590, +0.725, +0.593, +0.650, +0.507, +0.362]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.604, +0.010, +0.364, +0.968, +0.374, +0.621, +0.027, -0.567]






    




Loss: 0.21745= 0.045 + 0.000 + 0.043 + 0.000 + 0.042 + 0.047 + 0.000 + 0.040
dL/dw1 = +0.152 + +0.000 + -0.156 + -0.013 + -0.167 + +0.182 + +0.002 + +0.091 = +0.091
dL/dw2 = +0.141 + -0.000 + +0.015 + -0.005 + +0.030 + +0.118 + -0.002 + +0.188 = +0.484






    



w1=+0.345, w2=+0.198, w3=+1.000







    




Iteration 9






    



Output: [+0.634, +0.504, +0.589, +0.713, +0.592, +0.639, +0.509, +0.379]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.548, +0.015, +0.359, +0.908, +0.374, +0.572, +0.038, -0.496]






    




Loss: 0.19479= 0.037 + 0.000 + 0.044 + 0.001 + 0.042 + 0.040 + 0.000 + 0.030
dL/dw1 = +0.139 + +0.000 + -0.157 + -0.040 + -0.167 + +0.168 + +0.003 + +0.080 = +0.026
dL/dw2 = +0.129 + -0.000 + +0.015 + -0.016 + +0.030 + +0.109 + -0.003 + +0.165 = +0.429






    



w1=+0.342, w2=+0.155, w3=+1.000







    




Iteration 10






    



Output: [+0.623, +0.505, +0.589, +0.704, +0.594, +0.631, +0.512, +0.393]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.504, +0.019, +0.361, +0.865, +0.380, +0.535, +0.050, -0.436]






    




Loss: 0.17732= 0.031 + 0.000 + 0.044 + 0.002 + 0.041 + 0.035 + 0.000 + 0.024
dL/dw1 = +0.128 + +0.000 + -0.156 + -0.058 + -0.166 + +0.158 + +0.003 + +0.071 = -0.020
dL/dw2 = +0.119 + -0.001 + +0.015 + -0.023 + +0.029 + +0.103 + -0.004 + +0.146 = +0.384






    



w1=+0.344, w2=+0.116, w3=+1.000







    




Iteration 11






    



Output: [+0.615, +0.506, +0.591, +0.698, +0.596, +0.624, +0.515, +0.405]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.469, +0.023, +0.367, +0.837, +0.391, +0.507, +0.061, -0.385]






    




Loss: 0.16317= 0.027 + 0.000 + 0.043 + 0.003 + 0.040 + 0.032 + 0.000 + 0.018
dL/dw1 = +0.119 + +0.001 + -0.155 + -0.071 + -0.163 + +0.150 + +0.004 + +0.063 = -0.052
dL/dw2 = +0.111 + -0.001 + +0.015 + -0.029 + +0.029 + +0.098 + -0.004 + +0.129 = +0.348






    



w1=+0.349, w2=+0.082, w3=+1.000







    




Iteration 12






    



Output: [+0.609, +0.507, +0.593, +0.694, +0.600, +0.619, +0.518, +0.416]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.441, +0.027, +0.377, +0.818, +0.404, +0.486, +0.072, -0.341]






    




Loss: 0.15126= 0.024 + 0.000 + 0.042 + 0.003 + 0.038 + 0.029 + 0.001 + 0.014
dL/dw1 = +0.112 + +0.001 + -0.152 + -0.080 + -0.159 + +0.144 + +0.005 + +0.056 = -0.073
dL/dw2 = +0.105 + -0.001 + +0.015 + -0.032 + +0.028 + +0.094 + -0.005 + +0.115 = +0.318






    



w1=+0.357, w2=+0.050, w3=+1.000







    




Iteration 13






    



Output: [+0.603, +0.508, +0.596, +0.691, +0.603, +0.615, +0.521, +0.425]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.418, +0.032, +0.388, +0.806, +0.420, +0.470, +0.084, -0.303]






    




Loss: 0.14097= 0.022 + 0.000 + 0.040 + 0.004 + 0.036 + 0.027 + 0.001 + 0.011
dL/dw1 = +0.107 + +0.001 + -0.149 + -0.085 + -0.154 + +0.139 + +0.006 + +0.050 = -0.086
dL/dw2 = +0.099 + -0.001 + +0.014 + -0.034 + +0.027 + +0.091 + -0.006 + +0.102 = +0.293






    



w1=+0.365, w2=+0.021, w3=+1.000







    




Iteration 14






    



Output: [+0.598, +0.509, +0.599, +0.690, +0.607, +0.612, +0.524, +0.433]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.399, +0.036, +0.401, +0.800, +0.437, +0.457, +0.094, -0.269]






    




Loss: 0.13193= 0.020 + 0.000 + 0.038 + 0.004 + 0.034 + 0.026 + 0.001 + 0.009
dL/dw1 = +0.102 + +0.001 + -0.146 + -0.088 + -0.149 + +0.136 + +0.006 + +0.044 = -0.094
dL/dw2 = +0.095 + -0.001 + +0.014 + -0.035 + +0.026 + +0.088 + -0.007 + +0.091 = +0.271






    



w1=+0.375, w2=-0.007, w3=+1.000







    




Iteration 49






    



Output: [+0.552, +0.526, +0.663, +0.707, +0.685, +0.593, +0.567, +0.542]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.207, +0.102, +0.675, +0.882, +0.777, +0.376, +0.272, +0.167]






    




Loss: 0.05422= 0.005 + 0.001 + 0.011 + 0.001 + 0.005 + 0.018 + 0.009 + 0.003
dL/dw1 = +0.053 + +0.003 + -0.075 + -0.051 + -0.055 + +0.112 + +0.018 + -0.027 = -0.022
dL/dw2 = +0.050 + -0.003 + +0.007 + -0.020 + +0.010 + +0.073 + -0.019 + -0.056 = +0.041






    



w1=+0.575, w2=-0.405, w3=+1.000







    




Iteration 99






    



Output: [+0.544, +0.529, +0.675, +0.712, +0.699, +0.591, +0.576, +0.561]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.176, +0.115, +0.730, +0.906, +0.845, +0.367, +0.305, +0.244]






    




Loss: 0.05201= 0.004 + 0.002 + 0.007 + 0.001 + 0.002 + 0.017 + 0.012 + 0.007
dL/dw1 = +0.046 + +0.003 + -0.062 + -0.040 + -0.038 + +0.109 + +0.021 + -0.040 = -0.002
dL/dw2 = +0.042 + -0.003 + +0.006 + -0.016 + +0.007 + +0.071 + -0.022 + -0.082 = +0.003






    



w1=+0.615, w2=-0.478, w3=+1.000







    




Iteration 149






    



Output: [+0.543, +0.529, +0.676, +0.713, +0.701, +0.590, +0.576, +0.562]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.174, +0.116, +0.734, +0.908, +0.850, +0.366, +0.308, +0.250]






    




Loss: 0.05199= 0.004 + 0.002 + 0.007 + 0.001 + 0.002 + 0.017 + 0.012 + 0.008
dL/dw1 = +0.045 + +0.003 + -0.061 + -0.039 + -0.037 + +0.109 + +0.021 + -0.041 = -0.000
dL/dw2 = +0.042 + -0.003 + +0.006 + -0.016 + +0.007 + +0.071 + -0.022 + -0.084 = +0.000






    



w1=+0.618, w2=-0.484, w3=+1.000







    




Iteration 199






    



Output: [+0.543, +0.529, +0.676, +0.713, +0.701, +0.590, +0.576, +0.562]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.174, +0.116, +0.735, +0.908, +0.851, +0.366, +0.308, +0.250]






    




Loss: 0.05199= 0.004 + 0.002 + 0.007 + 0.001 + 0.002 + 0.017 + 0.012 + 0.008
dL/dw1 = +0.045 + +0.003 + -0.061 + -0.039 + -0.037 + +0.109 + +0.021 + -0.041 = -0.000
dL/dw2 = +0.042 + -0.003 + +0.006 + -0.016 + +0.007 + +0.071 + -0.022 + -0.085 = +0.000






    



w1=+0.619, w2=-0.485, w3=+1.000







    




Iteration 249






    



Output: [+0.543, +0.529, +0.676, +0.713, +0.701, +0.590, +0.576, +0.562]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.174, +0.116, +0.735, +0.908, +0.851, +0.366, +0.308, +0.250]






    




Loss: 0.05199= 0.004 + 0.002 + 0.007 + 0.001 + 0.002 + 0.017 + 0.012 + 0.008
dL/dw1 = +0.045 + +0.003 + -0.061 + -0.039 + -0.037 + +0.109 + +0.021 + -0.041 = -0.000
dL/dw2 = +0.042 + -0.003 + +0.006 + -0.016 + +0.007 + +0.071 + -0.022 + -0.085 = +0.000






    



w1=+0.619, w2=-0.485, w3=+1.000







    Out[24]:





(0.6186425813596521, -0.48489064055179726, 1.0)



In [25]:

    
embedding['a'] = (a_1, a_2)
embedding['b'] = (b_1, b_2)
embedding['('] = (1, 0)
embedding[')'] = (0, 1)

experiment_gated('ab(ab)bb', _w1=1.0, _w2=1.0, _w4=1.0, _w5=1.0, alpha=1e-1, max_iter=250, fixed_w3=True)









    



w1=+1.000, w2=+1.000, w3=+1.000, w4=+1.000, w5=+1.000






    




Iteration 0






    



Output: [+0.982, +0.498, +0.730, +0.993, +0.728, +0.879, +0.117, +0.002]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+4.003, -0.007, +0.993, +4.996, +0.986, +1.986, -2.024, -6.033]
Input : [+2.001, -2.005, +1.000, +2.001, -2.005, +1.000, -2.005, -2.005]
Gate  : [+2.000, +2.000, +1.000, +2.000, +2.000, +1.000, +2.000, +2.000]






    




Loss: 5.29140= 1.326 + 0.000 + 0.000 + 0.768 + 0.000 + 0.428 + 0.443 + 2.326
dL/dw1 = +1.000 + -0.000 + -0.002 + +0.860 + -0.004 + +0.537 + +0.173 + +1.154 = +3.717
dL/dw2 = +0.930 + +0.000 + +0.000 + +0.450 + +0.001 + +0.217 + +0.602 + +1.849 = +4.049
dL/dw4 = +0.965 + +0.000 + -0.001 + +0.786 + -0.003 + +0.756 + +0.005 + +1.003 = +3.511
dL/dw5 = +0.965 + +0.000 + +0.000 + +0.524 + +0.000 + -0.003 + +0.771 + +1.999 = +4.256






    



w1=+0.628, w2=+0.595, w3=+1.000, w4=+0.649, w5=+0.574







    




Iteration 1






    



Output: [+0.817, +0.500, +0.601, +0.871, +0.601, +0.689, +0.332, +0.100]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+1.499, +0.002, +0.409, +1.908, +0.411, +0.797, -0.700, -2.198]
Input : [+1.225, -1.224, +0.628, +1.225, -1.224, +0.595, -1.224, -1.224]
Gate  : [+1.223, +1.223, +0.649, +1.223, +1.223, +0.649, +1.223, +1.223]






    




Loss: 1.04987= 0.258 + 0.000 + 0.037 + 0.069 + 0.037 + 0.077 + 0.060 + 0.511
dL/dw1 = +0.403 + +0.000 + -0.101 + +0.286 + -0.117 + +0.171 + +0.040 + +0.552 = +1.234
dL/dw2 = +0.375 + -0.000 + +0.017 + +0.147 + +0.034 + +0.073 + +0.155 + +0.894 = +1.695
dL/dw4 = +0.389 + +0.000 + -0.082 + +0.259 + -0.082 + +0.232 + -0.000 + +0.489 = +1.205
dL/dw5 = +0.389 + +0.000 + -0.000 + +0.171 + -0.000 + +0.001 + +0.205 + +0.978 = +1.744






    



w1=+0.505, w2=+0.426, w3=+1.000, w4=+0.528, w5=+0.400







    




Iteration 2






    



Output: [+0.704, +0.502, +0.568, +0.758, +0.569, +0.623, +0.412, +0.228]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.867, +0.006, +0.273, +1.140, +0.279, +0.504, -0.357, -1.218]
Input : [+0.934, -0.927, +0.505, +0.934, -0.927, +0.426, -0.927, -0.927]
Gate  : [+0.928, +0.928, +0.528, +0.928, +0.928, +0.528, +0.928, +0.928]






    




Loss: 0.42849= 0.091 + 0.000 + 0.057 + 0.002 + 0.056 + 0.031 + 0.016 + 0.175
dL/dw1 = +0.197 + +0.000 + -0.102 + +0.042 + -0.117 + +0.089 + +0.013 + +0.275 = +0.397
dL/dw2 = +0.183 + -0.000 + +0.016 + +0.021 + +0.032 + +0.041 + +0.059 + +0.451 = +0.803
dL/dw4 = +0.191 + +0.000 + -0.084 + +0.039 + -0.084 + +0.117 + -0.001 + +0.247 = +0.424
dL/dw5 = +0.191 + +0.000 + -0.001 + +0.025 + -0.002 + +0.002 + +0.081 + +0.500 = +0.795






    



w1=+0.465, w2=+0.345, w3=+1.000, w4=+0.486, w5=+0.321







    




Iteration 3






    



Output: [+0.659, +0.502, +0.559, +0.709, +0.561, +0.602, +0.441, +0.292]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.658, +0.009, +0.235, +0.893, +0.244, +0.412, -0.237, -0.885]
Input : [+0.815, -0.804, +0.465, +0.815, -0.804, +0.345, -0.804, -0.804]
Gate  : [+0.807, +0.807, +0.486, +0.807, +0.807, +0.486, +0.807, +0.807]






    




Loss: 0.30269= 0.053 + 0.000 + 0.064 + 0.001 + 0.062 + 0.021 + 0.007 + 0.095
dL/dw1 = +0.133 + +0.000 + -0.098 + -0.030 + -0.111 + +0.066 + +0.006 + +0.177 = +0.142
dL/dw2 = +0.123 + -0.000 + +0.015 + -0.015 + +0.029 + +0.032 + +0.032 + +0.294 = +0.511
dL/dw4 = +0.129 + +0.000 + -0.082 + -0.028 + -0.083 + +0.085 + -0.002 + +0.161 = +0.180
dL/dw5 = +0.129 + +0.000 + -0.002 + -0.018 + -0.004 + +0.002 + +0.046 + +0.330 = +0.484






    



w1=+0.451, w2=+0.294, w3=+1.000, w4=+0.468, w5=+0.272







    




Iteration 4






    



Output: [+0.636, +0.503, +0.555, +0.685, +0.558, +0.592, +0.457, +0.328]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.556, +0.011, +0.222, +0.778, +0.234, +0.371, -0.174, -0.718]
Input : [+0.751, -0.736, +0.451, +0.751, -0.736, +0.294, -0.736, -0.736]
Gate  : [+0.740, +0.740, +0.468, +0.740, +0.740, +0.468, +0.740, +0.740]






    




Loss: 0.25675= 0.038 + 0.000 + 0.066 + 0.005 + 0.064 + 0.017 + 0.004 + 0.063
dL/dw1 = +0.104 + +0.000 + -0.096 + -0.060 + -0.107 + +0.057 + +0.003 + +0.131 = +0.032
dL/dw2 = +0.097 + -0.000 + +0.014 + -0.029 + +0.027 + +0.028 + +0.021 + +0.220 = +0.378
dL/dw4 = +0.102 + +0.000 + -0.082 + -0.056 + -0.083 + +0.071 + -0.002 + +0.120 = +0.070
dL/dw5 = +0.102 + +0.000 + -0.003 + -0.035 + -0.005 + +0.003 + +0.031 + +0.248 = +0.341






    



w1=+0.448, w2=+0.256, w3=+1.000, w4=+0.461, w5=+0.238







    




Iteration 5






    



Output: [+0.622, +0.503, +0.555, +0.672, +0.558, +0.587, +0.467, +0.350]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.497, +0.013, +0.220, +0.717, +0.233, +0.351, -0.133, -0.617]
Input : [+0.712, -0.693, +0.448, +0.712, -0.693, +0.256, -0.693, -0.693]
Gate  : [+0.699, +0.699, +0.461, +0.699, +0.699, +0.461, +0.699, +0.699]






    




Loss: 0.23339= 0.031 + 0.000 + 0.066 + 0.008 + 0.064 + 0.015 + 0.002 + 0.047
dL/dw1 = +0.088 + +0.000 + -0.094 + -0.074 + -0.105 + +0.053 + +0.002 + +0.105 = -0.026
dL/dw2 = +0.082 + -0.000 + +0.013 + -0.035 + +0.026 + +0.027 + +0.015 + +0.178 = +0.305
dL/dw4 = +0.087 + +0.000 + -0.082 + -0.070 + -0.084 + +0.064 + -0.002 + +0.096 = +0.010
dL/dw5 = +0.087 + +0.000 + -0.003 + -0.043 + -0.007 + +0.003 + +0.022 + +0.202 = +0.260






    



w1=+0.450, w2=+0.226, w3=+1.000, w4=+0.460, w5=+0.212







    




Iteration 6






    



Output: [+0.613, +0.504, +0.555, +0.664, +0.559, +0.584, +0.474, +0.366]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.460, +0.015, +0.222, +0.682, +0.237, +0.341, -0.104, -0.549]
Input : [+0.685, -0.662, +0.450, +0.685, -0.662, +0.226, -0.662, -0.662]
Gate  : [+0.672, +0.672, +0.460, +0.672, +0.672, +0.460, +0.672, +0.672]






    




Loss: 0.21859= 0.026 + 0.000 + 0.066 + 0.010 + 0.063 + 0.014 + 0.001 + 0.037
dL/dw1 = +0.079 + +0.000 + -0.093 + -0.082 + -0.103 + +0.051 + +0.001 + +0.088 = -0.060
dL/dw2 = +0.073 + -0.000 + +0.013 + -0.038 + +0.025 + +0.027 + +0.010 + +0.151 = +0.260
dL/dw4 = +0.077 + +0.000 + -0.083 + -0.077 + -0.085 + +0.061 + -0.002 + +0.081 = -0.028
dL/dw5 = +0.077 + +0.000 + -0.004 + -0.047 + -0.008 + +0.004 + +0.016 + +0.171 = +0.210






    



w1=+0.456, w2=+0.200, w3=+1.000, w4=+0.463, w5=+0.191







    




Iteration 7






    



Output: [+0.607, +0.504, +0.557, +0.660, +0.561, +0.584, +0.480, +0.378]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.435, +0.017, +0.228, +0.663, +0.245, +0.337, -0.081, -0.500]
Input : [+0.666, -0.640, +0.456, +0.666, -0.640, +0.200, -0.640, -0.640]
Gate  : [+0.654, +0.654, +0.463, +0.654, +0.654, +0.463, +0.654, +0.654]






    




Loss: 0.20766= 0.024 + 0.000 + 0.065 + 0.012 + 0.062 + 0.014 + 0.001 + 0.031
dL/dw1 = +0.073 + +0.000 + -0.092 + -0.086 + -0.102 + +0.050 + +0.000 + +0.076 = -0.081
dL/dw2 = +0.068 + -0.000 + +0.012 + -0.040 + +0.024 + +0.027 + +0.008 + +0.132 = +0.230
dL/dw4 = +0.071 + +0.000 + -0.084 + -0.082 + -0.086 + +0.059 + -0.001 + +0.070 = -0.053
dL/dw5 = +0.071 + +0.000 + -0.005 + -0.049 + -0.009 + +0.004 + +0.012 + +0.150 = +0.176






    



w1=+0.464, w2=+0.177, w3=+1.000, w4=+0.468, w5=+0.174







    




Iteration 8






    



Output: [+0.603, +0.505, +0.559, +0.658, +0.563, +0.584, +0.484, +0.387]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.418, +0.019, +0.236, +0.654, +0.255, +0.338, -0.062, -0.462]
Input : [+0.652, -0.623, +0.464, +0.652, -0.623, +0.177, -0.623, -0.623]
Gate  : [+0.642, +0.642, +0.468, +0.642, +0.642, +0.468, +0.642, +0.642]






    




Loss: 0.19871= 0.022 + 0.000 + 0.063 + 0.012 + 0.060 + 0.014 + 0.000 + 0.026
dL/dw1 = +0.069 + +0.000 + -0.092 + -0.088 + -0.101 + +0.050 + -0.000 + +0.068 = -0.094
dL/dw2 = +0.064 + -0.000 + +0.012 + -0.040 + +0.023 + +0.028 + +0.006 + +0.118 = +0.210
dL/dw4 = +0.067 + +0.000 + -0.085 + -0.084 + -0.088 + +0.058 + -0.001 + +0.062 = -0.070
dL/dw5 = +0.067 + +0.000 + -0.005 + -0.050 + -0.010 + +0.005 + +0.009 + +0.135 = +0.151






    



w1=+0.474, w2=+0.156, w3=+1.000, w4=+0.475, w5=+0.158







    




Iteration 9






    



Output: [+0.600, +0.505, +0.561, +0.657, +0.566, +0.584, +0.489, +0.394]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.406, +0.021, +0.246, +0.652, +0.266, +0.340, -0.046, -0.432]
Input : [+0.642, -0.609, +0.474, +0.642, -0.609, +0.156, -0.609, -0.609]
Gate  : [+0.633, +0.633, +0.475, +0.633, +0.633, +0.475, +0.633, +0.633]






    




Loss: 0.19087= 0.021 + 0.000 + 0.062 + 0.013 + 0.058 + 0.014 + 0.000 + 0.023
dL/dw1 = +0.066 + +0.000 + -0.092 + -0.088 + -0.100 + +0.051 + -0.000 + +0.061 = -0.102
dL/dw2 = +0.061 + -0.000 + +0.012 + -0.040 + +0.022 + +0.029 + +0.004 + +0.108 = +0.195
dL/dw4 = +0.064 + +0.000 + -0.086 + -0.084 + -0.089 + +0.058 + -0.001 + +0.056 = -0.082
dL/dw5 = +0.064 + +0.000 + -0.006 + -0.050 + -0.011 + +0.005 + +0.006 + +0.123 = +0.133






    



w1=+0.484, w2=+0.136, w3=+1.000, w4=+0.483, w5=+0.145







    




Iteration 10






    



Output: [+0.598, +0.506, +0.564, +0.658, +0.569, +0.585, +0.492, +0.400]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.398, +0.022, +0.256, +0.654, +0.278, +0.344, -0.031, -0.407]
Input : [+0.633, -0.598, +0.484, +0.633, -0.598, +0.136, -0.598, -0.598]
Gate  : [+0.628, +0.628, +0.483, +0.628, +0.628, +0.483, +0.628, +0.628]






    




Loss: 0.18371= 0.020 + 0.000 + 0.060 + 0.012 + 0.056 + 0.015 + 0.000 + 0.021
dL/dw1 = +0.064 + +0.000 + -0.092 + -0.088 + -0.099 + +0.052 + -0.000 + +0.056 = -0.106
dL/dw2 = +0.059 + -0.000 + +0.011 + -0.039 + +0.022 + +0.030 + +0.003 + +0.100 = +0.185
dL/dw4 = +0.062 + +0.000 + -0.087 + -0.084 + -0.090 + +0.059 + -0.001 + +0.051 = -0.090
dL/dw5 = +0.062 + +0.000 + -0.006 + -0.049 + -0.012 + +0.006 + +0.004 + +0.113 = +0.119






    



w1=+0.495, w2=+0.118, w3=+1.000, w4=+0.492, w5=+0.133







    




Iteration 11






    



Output: [+0.597, +0.506, +0.566, +0.659, +0.572, +0.587, +0.496, +0.405]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.392, +0.024, +0.268, +0.659, +0.292, +0.350, -0.018, -0.386]
Input : [+0.626, -0.588, +0.495, +0.626, -0.588, +0.118, -0.588, -0.588]
Gate  : [+0.625, +0.625, +0.492, +0.625, +0.625, +0.492, +0.625, +0.625]






    




Loss: 0.17703= 0.019 + 0.000 + 0.058 + 0.012 + 0.054 + 0.015 + 0.000 + 0.018
dL/dw1 = +0.063 + +0.000 + -0.092 + -0.087 + -0.099 + +0.054 + -0.000 + +0.052 = -0.108
dL/dw2 = +0.058 + -0.000 + +0.011 + -0.039 + +0.021 + +0.031 + +0.001 + +0.094 = +0.178
dL/dw4 = +0.061 + +0.000 + -0.088 + -0.083 + -0.091 + +0.060 + -0.000 + +0.046 = -0.096
dL/dw5 = +0.061 + +0.000 + -0.006 + -0.048 + -0.012 + +0.007 + +0.002 + +0.105 = +0.108






    



w1=+0.505, w2=+0.100, w3=+1.000, w4=+0.502, w5=+0.122







    




Iteration 12






    



Output: [+0.596, +0.506, +0.569, +0.661, +0.576, +0.588, +0.499, +0.409]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.387, +0.026, +0.280, +0.667, +0.306, +0.356, -0.006, -0.367]
Input : [+0.621, -0.579, +0.505, +0.621, -0.579, +0.100, -0.579, -0.579]
Gate  : [+0.624, +0.624, +0.502, +0.624, +0.624, +0.502, +0.624, +0.624]






    




Loss: 0.17071= 0.019 + 0.000 + 0.056 + 0.011 + 0.052 + 0.016 + 0.000 + 0.017
dL/dw1 = +0.062 + +0.000 + -0.092 + -0.085 + -0.098 + +0.056 + -0.000 + +0.048 = -0.109
dL/dw2 = +0.058 + -0.000 + +0.011 + -0.038 + +0.021 + +0.032 + +0.000 + +0.088 = +0.172
dL/dw4 = +0.059 + +0.000 + -0.088 + -0.082 + -0.091 + +0.061 + -0.000 + +0.043 = -0.099
dL/dw5 = +0.059 + +0.000 + -0.007 + -0.047 + -0.013 + +0.007 + +0.001 + +0.097 = +0.099






    



w1=+0.516, w2=+0.083, w3=+1.000, w4=+0.512, w5=+0.113







    




Iteration 13






    



Output: [+0.595, +0.507, +0.572, +0.663, +0.579, +0.590, +0.502, +0.413]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.384, +0.028, +0.292, +0.676, +0.320, +0.362, +0.006, -0.350]
Input : [+0.615, -0.571, +0.516, +0.615, -0.571, +0.083, -0.571, -0.571]
Gate  : [+0.624, +0.624, +0.512, +0.624, +0.624, +0.512, +0.624, +0.624]






    




Loss: 0.16468= 0.018 + 0.000 + 0.054 + 0.011 + 0.050 + 0.016 + 0.000 + 0.015
dL/dw1 = +0.061 + +0.000 + -0.091 + -0.083 + -0.097 + +0.057 + +0.000 + +0.045 = -0.107
dL/dw2 = +0.057 + -0.000 + +0.011 + -0.037 + +0.020 + +0.034 + -0.000 + +0.083 = +0.168
dL/dw4 = +0.058 + +0.000 + -0.089 + -0.080 + -0.092 + +0.062 + +0.000 + +0.039 = -0.101
dL/dw5 = +0.058 + +0.000 + -0.007 + -0.045 + -0.014 + +0.008 + -0.001 + +0.091 = +0.091






    



w1=+0.527, w2=+0.066, w3=+1.000, w4=+0.522, w5=+0.103







    




Iteration 14






    



Output: [+0.594, +0.507, +0.576, +0.665, +0.583, +0.591, +0.504, +0.417]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.381, +0.030, +0.305, +0.686, +0.334, +0.369, +0.017, -0.335]
Input : [+0.610, -0.563, +0.527, +0.610, -0.563, +0.066, -0.563, -0.563]
Gate  : [+0.625, +0.625, +0.522, +0.625, +0.625, +0.522, +0.625, +0.625]






    




Loss: 0.15892= 0.018 + 0.000 + 0.052 + 0.010 + 0.048 + 0.017 + 0.000 + 0.014
dL/dw1 = +0.061 + +0.000 + -0.091 + -0.081 + -0.097 + +0.059 + +0.000 + +0.043 = -0.105
dL/dw2 = +0.057 + -0.000 + +0.010 + -0.035 + +0.020 + +0.035 + -0.001 + +0.079 = +0.164
dL/dw4 = +0.058 + +0.000 + -0.089 + -0.078 + -0.092 + +0.063 + +0.001 + +0.036 = -0.102
dL/dw5 = +0.058 + +0.000 + -0.007 + -0.043 + -0.014 + +0.009 + -0.002 + +0.085 = +0.085






    



w1=+0.538, w2=+0.050, w3=+1.000, w4=+0.532, w5=+0.095







    




Iteration 49






    



Output: [+0.565, +0.522, +0.652, +0.708, +0.671, +0.601, +0.559, +0.516]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.260, +0.087, +0.628, +0.887, +0.715, +0.408, +0.236, +0.063]
Input : [+0.336, -0.224, +0.687, +0.336, -0.224, -0.389, -0.224, -0.224]
Gate  : [+0.772, +0.772, +0.787, +0.772, +0.772, +0.787, +0.772, +0.772]






    




Loss: 0.06145= 0.008 + 0.001 + 0.014 + 0.001 + 0.008 + 0.021 + 0.007 + 0.000
dL/dw1 = +0.052 + +0.002 + -0.069 + -0.038 + -0.056 + +0.095 + +0.013 + -0.008 = -0.009
dL/dw2 = +0.048 + -0.002 + +0.007 + -0.015 + +0.010 + +0.063 + -0.012 + -0.016 = +0.082
dL/dw4 = +0.022 + +0.002 + -0.063 + -0.026 + -0.054 + +0.053 + +0.018 + +0.001 = -0.048
dL/dw5 = +0.022 + +0.002 + -0.009 + -0.010 + -0.013 + +0.023 + +0.000 + -0.003 = +0.011






    



w1=+0.688, w2=-0.398, w3=+1.000, w4=+0.792, w5=-0.016







    




Iteration 99






    



Output: [+0.536, +0.526, +0.678, +0.708, +0.700, +0.583, +0.573, +0.563]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.144, +0.103, +0.744, +0.888, +0.847, +0.333, +0.293, +0.252]
Input : [+0.188, -0.053, +0.712, +0.188, -0.053, -0.571, -0.053, -0.053]
Gate  : [+0.762, +0.762, +0.900, +0.762, +0.762, +0.900, +0.762, +0.762]






    




Loss: 0.04661= 0.003 + 0.001 + 0.007 + 0.001 + 0.002 + 0.014 + 0.011 + 0.008
dL/dw1 = +0.028 + +0.002 + -0.052 + -0.040 + -0.033 + +0.087 + +0.025 + -0.023 = -0.005
dL/dw2 = +0.026 + -0.002 + +0.004 + -0.015 + +0.005 + +0.061 + -0.006 + -0.056 = +0.018
dL/dw4 = +0.007 + +0.003 + -0.045 + -0.023 + -0.031 + +0.034 + +0.026 + +0.019 = -0.010
dL/dw5 = +0.007 + +0.003 + -0.007 + -0.007 + -0.008 + +0.022 + +0.016 + +0.010 = +0.036






    



w1=+0.713, w2=-0.572, w3=+1.000, w4=+0.901, w5=-0.141







    




Iteration 149






    



Output: [+0.518, +0.522, +0.688, +0.703, +0.707, +0.560, +0.565, +0.569]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.070, +0.089, +0.792, +0.862, +0.881, +0.241, +0.260, +0.279]
Input : [+0.118, +0.032, +0.744, +0.118, +0.032, -0.678, +0.032, +0.032]
Gate  : [+0.596, +0.596, +0.943, +0.596, +0.596, +0.943, +0.596, +0.596]






    




Loss: 0.03474= 0.001 + 0.001 + 0.004 + 0.002 + 0.001 + 0.007 + 0.008 + 0.010
dL/dw1 = +0.011 + +0.001 + -0.043 + -0.046 + -0.026 + +0.064 + +0.033 + -0.003 = -0.008
dL/dw2 = +0.010 + -0.001 + +0.003 + -0.014 + +0.003 + +0.049 + +0.011 + -0.032 = +0.028
dL/dw4 = +0.002 + +0.003 + -0.038 + -0.028 + -0.025 + +0.022 + +0.026 + +0.030 = -0.009
dL/dw5 = +0.002 + +0.003 + -0.006 + -0.008 + -0.007 + +0.018 + +0.021 + +0.025 = +0.049






    



w1=+0.745, w2=-0.681, w3=+1.000, w4=+0.944, w5=-0.352







    




Iteration 199






    



Output: [+0.501, +0.517, +0.703, +0.704, +0.717, +0.521, +0.537, +0.553]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.004, +0.067, +0.863, +0.866, +0.930, +0.084, +0.147, +0.211]
Input : [+0.009, +0.164, +0.800, +0.009, +0.164, -0.850, +0.164, +0.164]
Gate  : [+0.385, +0.385, +0.995, +0.385, +0.385, +0.995, +0.385, +0.385]






    




Loss: 0.01388= 0.000 + 0.001 + 0.002 + 0.002 + 0.000 + 0.001 + 0.003 + 0.006
dL/dw1 = +0.000 + +0.001 + -0.029 + -0.039 + -0.015 + +0.023 + +0.026 + +0.019 = -0.014
dL/dw2 = +0.000 + -0.001 + +0.001 + -0.009 + +0.001 + +0.019 + +0.018 + +0.005 = +0.035
dL/dw4 = +0.000 + +0.003 + -0.027 + -0.027 + -0.016 + +0.006 + +0.017 + +0.033 = -0.011
dL/dw5 = +0.000 + +0.003 + -0.005 + -0.005 + -0.005 + +0.007 + +0.019 + +0.036 = +0.050






    



w1=+0.802, w2=-0.854, w3=+1.000, w4=+0.996, w5=-0.614







    




Iteration 249






    



Output: [+0.498, +0.512, +0.720, +0.718, +0.729, +0.497, +0.510, +0.524]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [-0.009, +0.047, +0.943, +0.934, +0.990, -0.014, +0.042, +0.097]
Input : [-0.037, +0.231, +0.868, -0.037, +0.231, -0.972, +0.231, +0.231]
Gate  : [+0.240, +0.240, +1.033, +0.240, +0.240, +1.033, +0.240, +0.240]






    




Loss: 0.00246= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.001
dL/dw1 = -0.001 + +0.000 + -0.012 + -0.017 + -0.002 + -0.004 + +0.009 + +0.015 = -0.011
dL/dw2 = -0.001 + -0.000 + +0.000 + -0.003 + +0.000 + -0.003 + +0.008 + +0.011 = +0.012
dL/dw4 = +0.000 + +0.002 + -0.012 + -0.013 + -0.003 + -0.001 + +0.005 + +0.018 = -0.003
dL/dw5 = +0.000 + +0.002 + -0.002 + -0.002 + -0.001 + -0.001 + +0.006 + +0.021 = +0.023






    



w1=+0.869, w2=-0.973, w3=+1.000, w4=+1.033, w5=-0.795







    Out[25]:





(0.8691986382286676,
 -0.9728195862949829,
 1.0,
 1.0330824913276644,
 -0.7952812244235943)

Now we see that the input gate is closer to the true gate: it tries to ignore irrelevant input by setting the weights of those input closer to 0. Although in this case it is still far from the true gate (the irrelevant input still gets positive score), we see that it has good impact on the loss, reaching an order of magnitude lower. And actually if we run more iterations, we see later that the gate will be learned correctly ($w_4 = 1.0, w_5=-1.0$).

Notice that in the network without input gate, at the end the overall gradient is zero, but actually the gradient at each position in the sequence is not zero, and in fact the magnitude is not quite small, meaning the network ends up at a non-optimal position, while in the gated version, we see the gradient approaches zero in all position.



In [28]:

    
# Trying nested brackets
experiment_gated('ab(aaa(bab)b)')









    



w1=+1.000, w2=+1.000, w3=+1.000, w4=+1.000, w5=+1.000






    




Iteration 0






    



Output: [+0.982, +0.498, +0.730, +0.993, +1.000, +1.000, +1.000, +1.000, +1.000, +1.000, +1.000, +0.999, +1.000]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+4.003, -0.007, +0.993, +4.996, +8.998, +13.001, +14.001, +9.991, +13.994, +9.984, +10.984, +6.975, +7.975]
Input : [+2.001, -2.005, +1.000, +2.001, +2.001, +2.001, +1.000, -2.005, +2.001, -2.005, +1.000, -2.005, +1.000]
Gate  : [+2.000, +2.000, +1.000, +2.000, +2.000, +2.000, +1.000, +2.000, +2.000, +2.000, +1.000, +2.000, +1.000]






    




Loss: 18.06454= 1.326 + 0.000 + 0.000 + 0.768 + 1.838 + 2.914 + 1.304 + 0.826 + 1.303 + 0.825 + 2.372 + 1.295 + 3.295
dL/dw1 = +1.000 + -0.000 + -0.002 + +0.860 + +1.439 + +1.998 + +1.005 + +0.782 + +1.029 + +0.807 + +1.820 + +1.314 + +2.449 = +14.501
dL/dw2 = +0.930 + +0.000 + +0.000 + +0.450 + +0.980 + +1.499 + +0.664 + +0.409 + +0.639 + +0.383 + +1.134 + +0.555 + +1.535 = +9.178
dL/dw4 = +0.965 + +0.000 + -0.001 + +0.786 + +1.344 + +1.883 + +0.954 + +0.714 + +0.953 + +0.714 + +1.880 + +1.337 + +2.992 = +14.520
dL/dw5 = +0.965 + +0.000 + +0.000 + +0.524 + +1.075 + +1.614 + +0.715 + +0.476 + +0.715 + +0.476 + +1.074 + +0.533 + +0.993 = +9.159






    



w1=-0.450, w2=+0.082, w3=+1.000, w4=-0.452, w5=+0.084







    




Iteration 1






    



Output: [+0.536, +0.505, +0.556, +0.591, +0.625, +0.657, +0.702, +0.675, +0.706, +0.680, +0.672, +0.644, +0.636]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.142, +0.020, +0.224, +0.366, +0.509, +0.651, +0.855, +0.733, +0.875, +0.753, +0.716, +0.594, +0.557]
Input : [-0.387, +0.332, -0.450, -0.387, -0.387, -0.387, -0.450, +0.332, -0.387, +0.332, +0.082, +0.332, +0.082]
Gate  : [-0.368, -0.368, -0.452, -0.368, -0.368, -0.368, -0.452, -0.368, -0.368, -0.368, -0.452, -0.368, -0.452]






    




Loss: 0.61597= 0.003 + 0.000 + 0.065 + 0.043 + 0.025 + 0.013 + 0.091 + 0.114 + 0.087 + 0.110 + 0.008 + 0.017 + 0.038
dL/dw1 = -0.014 + -0.000 + +0.086 + +0.122 + +0.133 + +0.121 + +0.374 + +0.358 + +0.372 + +0.358 + +0.106 + +0.125 + -0.195 = +1.946
dL/dw2 = -0.013 + +0.000 + -0.007 + +0.044 + +0.071 + +0.076 + +0.184 + +0.130 + +0.172 + +0.119 + +0.062 + +0.056 + -0.149 = +0.746
dL/dw4 = -0.014 + -0.000 + +0.089 + +0.125 + +0.136 + +0.123 + +0.380 + +0.367 + +0.380 + +0.370 + +0.104 + +0.124 + -0.182 = +2.002
dL/dw5 = -0.014 + -0.000 + +0.010 + +0.062 + +0.088 + +0.090 + +0.218 + +0.182 + +0.223 + +0.189 + +0.056 + +0.053 + -0.083 = +1.074






    



w1=-0.645, w2=+0.008, w3=+1.000, w4=-0.652, w5=-0.023







    




Iteration 2






    



Output: [+0.610, +0.511, +0.614, +0.714, +0.796, +0.859, +0.903, +0.861, +0.906, +0.867, +0.866, +0.812, +0.812]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.446, +0.046, +0.466, +0.913, +1.359, +1.806, +2.226, +1.825, +2.272, +1.871, +1.866, +1.465, +1.460]
Input : [-0.661, +0.593, -0.645, -0.661, -0.661, -0.661, -0.645, +0.593, -0.661, +0.593, +0.008, +0.593, +0.008]
Gate  : [-0.675, -0.675, -0.652, -0.675, -0.675, -0.675, -0.652, -0.675, -0.675, -0.675, -0.652, -0.675, -0.652]






    




Loss: 0.46103= 0.025 + 0.000 + 0.030 + 0.001 + 0.012 + 0.056 + 0.003 + 0.002 + 0.004 + 0.001 + 0.063 + 0.020 + 0.246
dL/dw1 = -0.077 + -0.001 + +0.084 + +0.025 + -0.137 + -0.361 + -0.076 + +0.056 + -0.091 + +0.042 + -0.393 + -0.186 + -0.712 = -1.826
dL/dw2 = -0.072 + +0.001 + -0.008 + +0.010 + -0.079 + -0.240 + -0.041 + +0.023 + -0.047 + +0.015 + -0.235 + -0.082 + -0.519 = -1.274
dL/dw4 = -0.073 + -0.001 + +0.083 + +0.024 + -0.131 + -0.344 + -0.073 + +0.054 + -0.088 + +0.040 + -0.379 + -0.180 + -0.687 = -1.754
dL/dw5 = -0.073 + -0.001 + +0.008 + +0.013 + -0.090 + -0.262 + -0.045 + +0.029 + -0.054 + +0.022 + -0.206 + -0.076 + -0.290 = -1.025






    



w1=-0.462, w2=+0.135, w3=+1.000, w4=-0.477, w5=+0.079







    




Iteration 3






    



Output: [+0.535, +0.506, +0.561, +0.595, +0.628, +0.660, +0.707, +0.683, +0.712, +0.688, +0.674, +0.649, +0.634]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.139, +0.025, +0.245, +0.384, +0.522, +0.661, +0.881, +0.768, +0.906, +0.792, +0.728, +0.614, +0.550]
Input : [-0.349, +0.286, -0.462, -0.349, -0.349, -0.349, -0.462, +0.286, -0.349, +0.286, +0.135, +0.286, +0.135]
Gate  : [-0.398, -0.398, -0.477, -0.398, -0.398, -0.398, -0.477, -0.398, -0.398, -0.398, -0.477, -0.398, -0.477]






    




Loss: 0.57942= 0.002 + 0.000 + 0.062 + 0.041 + 0.024 + 0.012 + 0.086 + 0.107 + 0.082 + 0.103 + 0.008 + 0.015 + 0.037
dL/dw1 = -0.014 + -0.000 + +0.088 + +0.127 + +0.139 + +0.126 + +0.388 + +0.368 + +0.383 + +0.366 + +0.108 + +0.126 + -0.205 = +1.998
dL/dw2 = -0.013 + +0.000 + -0.007 + +0.046 + +0.075 + +0.079 + +0.192 + +0.135 + +0.180 + +0.123 + +0.063 + +0.057 + -0.156 = +0.774
dL/dw4 = -0.012 + -0.000 + +0.089 + +0.119 + +0.126 + +0.112 + +0.353 + +0.345 + +0.353 + +0.348 + +0.095 + +0.114 + -0.168 = +1.875
dL/dw5 = -0.012 + -0.000 + +0.011 + +0.056 + +0.079 + +0.079 + +0.193 + +0.163 + +0.197 + +0.170 + +0.050 + +0.049 + -0.080 = +0.954






    



w1=-0.662, w2=+0.058, w3=+1.000, w4=-0.664, w5=-0.016







    




Iteration 4






    



Output: [+0.606, +0.513, +0.620, +0.715, +0.794, +0.855, +0.902, +0.863, +0.906, +0.869, +0.864, +0.814, +0.808]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.429, +0.051, +0.491, +0.920, +1.349, +1.778, +2.218, +1.839, +2.268, +1.890, +1.852, +1.474, +1.435]
Input : [-0.631, +0.556, -0.662, -0.631, -0.631, -0.631, -0.662, +0.556, -0.631, +0.556, +0.058, +0.556, +0.058]
Gate  : [-0.681, -0.681, -0.664, -0.681, -0.681, -0.681, -0.664, -0.681, -0.681, -0.681, -0.664, -0.681, -0.664]






    




Loss: 0.44256= 0.023 + 0.000 + 0.027 + 0.001 + 0.011 + 0.052 + 0.002 + 0.001 + 0.004 + 0.001 + 0.062 + 0.020 + 0.238
dL/dw1 = -0.075 + -0.001 + +0.081 + +0.023 + -0.135 + -0.355 + -0.074 + +0.052 + -0.091 + +0.035 + -0.393 + -0.191 + -0.713 = -1.836
dL/dw2 = -0.069 + +0.001 + -0.008 + +0.009 + -0.078 + -0.236 + -0.040 + +0.021 + -0.046 + +0.013 + -0.234 + -0.085 + -0.521 = -1.274
dL/dw4 = -0.067 + -0.001 + +0.082 + +0.022 + -0.126 + -0.327 + -0.069 + +0.049 + -0.086 + +0.034 + -0.367 + -0.181 + -0.658 = -1.694
dL/dw5 = -0.067 + -0.001 + +0.008 + +0.011 + -0.084 + -0.245 + -0.041 + +0.025 + -0.052 + +0.018 + -0.198 + -0.077 + -0.286 = -0.987






    



w1=-0.478, w2=+0.185, w3=+1.000, w4=-0.495, w5=+0.083







    




Iteration 5






    



Output: [+0.533, +0.507, +0.566, +0.598, +0.629, +0.659, +0.710, +0.688, +0.716, +0.694, +0.675, +0.652, +0.631]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.131, +0.029, +0.265, +0.396, +0.527, +0.658, +0.895, +0.792, +0.923, +0.821, +0.729, +0.627, +0.536]
Input : [-0.317, +0.248, -0.478, -0.317, -0.317, -0.317, -0.478, +0.248, -0.317, +0.248, +0.185, +0.248, +0.185]
Gate  : [-0.412, -0.412, -0.495, -0.412, -0.412, -0.412, -0.495, -0.412, -0.412, -0.412, -0.495, -0.412, -0.495]






    




Loss: 0.55541= 0.002 + 0.000 + 0.058 + 0.039 + 0.023 + 0.012 + 0.084 + 0.103 + 0.079 + 0.097 + 0.007 + 0.014 + 0.035
dL/dw1 = -0.014 + -0.000 + +0.089 + +0.129 + +0.142 + +0.132 + +0.396 + +0.372 + +0.389 + +0.368 + +0.111 + +0.126 + -0.208 = +2.031
dL/dw2 = -0.013 + +0.000 + -0.007 + +0.047 + +0.077 + +0.083 + +0.196 + +0.136 + +0.182 + +0.124 + +0.065 + +0.057 + -0.158 = +0.789
dL/dw4 = -0.010 + -0.000 + +0.090 + +0.115 + +0.121 + +0.108 + +0.338 + +0.333 + +0.338 + +0.335 + +0.091 + +0.108 + -0.154 = +1.813
dL/dw5 = -0.010 + -0.000 + +0.011 + +0.052 + +0.072 + +0.074 + +0.175 + +0.149 + +0.180 + +0.157 + +0.048 + +0.047 + -0.078 = +0.875






    



w1=-0.681, w2=+0.106, w3=+1.000, w4=-0.676, w5=-0.005







    




Iteration 6






    



Output: [+0.601, +0.514, +0.626, +0.717, +0.792, +0.852, +0.901, +0.865, +0.906, +0.871, +0.863, +0.815, +0.804]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.412, +0.056, +0.517, +0.928, +1.340, +1.751, +2.212, +1.856, +2.268, +1.912, +1.840, +1.485, +1.413]
Input : [-0.604, +0.522, -0.681, -0.604, -0.604, -0.604, -0.681, +0.522, -0.604, +0.522, +0.106, +0.522, +0.106]
Gate  : [-0.681, -0.681, -0.676, -0.681, -0.681, -0.681, -0.676, -0.681, -0.681, -0.681, -0.676, -0.681, -0.676]






    




Loss: 0.42587= 0.021 + 0.000 + 0.025 + 0.001 + 0.011 + 0.049 + 0.002 + 0.001 + 0.004 + 0.000 + 0.060 + 0.021 + 0.231
dL/dw1 = -0.072 + -0.001 + +0.078 + +0.021 + -0.133 + -0.347 + -0.073 + +0.046 + -0.092 + +0.028 + -0.393 + -0.197 + -0.712 = -1.845
dL/dw2 = -0.067 + +0.001 + -0.008 + +0.008 + -0.076 + -0.230 + -0.039 + +0.019 + -0.046 + +0.010 + -0.234 + -0.088 + -0.523 = -1.271
dL/dw4 = -0.061 + -0.001 + +0.080 + +0.020 + -0.121 + -0.312 + -0.067 + +0.044 + -0.085 + +0.027 + -0.358 + -0.184 + -0.633 = -1.652
dL/dw5 = -0.061 + -0.001 + +0.009 + +0.010 + -0.079 + -0.229 + -0.039 + +0.022 + -0.050 + +0.014 + -0.192 + -0.078 + -0.283 = -0.959






    



w1=-0.497, w2=+0.233, w3=+1.000, w4=-0.511, w5=+0.091







    




Iteration 7






    



Output: [+0.530, +0.508, +0.571, +0.601, +0.629, +0.657, +0.712, +0.693, +0.719, +0.700, +0.675, +0.655, +0.627]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.122, +0.032, +0.286, +0.408, +0.530, +0.652, +0.906, +0.816, +0.938, +0.848, +0.729, +0.639, +0.520]
Input : [-0.290, +0.214, -0.497, -0.290, -0.290, -0.290, -0.497, +0.214, -0.290, +0.214, +0.233, +0.214, +0.233]
Gate  : [-0.420, -0.420, -0.511, -0.420, -0.420, -0.420, -0.511, -0.420, -0.420, -0.420, -0.511, -0.420, -0.511]






    




Loss: 0.53373= 0.002 + 0.000 + 0.055 + 0.037 + 0.023 + 0.013 + 0.082 + 0.098 + 0.077 + 0.092 + 0.008 + 0.013 + 0.033
dL/dw1 = -0.013 + -0.000 + +0.089 + +0.129 + +0.145 + +0.137 + +0.400 + +0.371 + +0.392 + +0.366 + +0.114 + +0.125 + -0.207 = +2.046
dL/dw2 = -0.012 + +0.000 + -0.007 + +0.047 + +0.078 + +0.086 + +0.197 + +0.135 + +0.183 + +0.122 + +0.067 + +0.056 + -0.159 = +0.794
dL/dw4 = -0.009 + -0.001 + +0.092 + +0.113 + +0.117 + +0.106 + +0.327 + +0.324 + +0.327 + +0.326 + +0.089 + +0.104 + -0.143 = +1.772
dL/dw5 = -0.009 + -0.001 + +0.012 + +0.048 + +0.067 + +0.070 + +0.160 + +0.137 + +0.166 + +0.146 + +0.046 + +0.046 + -0.076 = +0.812






    



w1=-0.702, w2=+0.154, w3=+1.000, w4=-0.688, w5=+0.010







    




Iteration 8






    



Output: [+0.597, +0.515, +0.633, +0.718, +0.791, +0.848, +0.901, +0.867, +0.906, +0.874, +0.861, +0.817, +0.800]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.393, +0.060, +0.543, +0.936, +1.329, +1.722, +2.205, +1.872, +2.265, +1.933, +1.827, +1.495, +1.389]
Input : [-0.579, +0.490, -0.702, -0.579, -0.579, -0.579, -0.702, +0.490, -0.579, +0.490, +0.154, +0.490, +0.154]
Gate  : [-0.679, -0.679, -0.688, -0.679, -0.679, -0.679, -0.688, -0.679, -0.679, -0.679, -0.688, -0.679, -0.688]






    




Loss: 0.40831= 0.019 + 0.000 + 0.022 + 0.000 + 0.010 + 0.045 + 0.002 + 0.001 + 0.003 + 0.000 + 0.058 + 0.022 + 0.224
dL/dw1 = -0.068 + -0.001 + +0.075 + +0.019 + -0.129 + -0.337 + -0.071 + +0.041 + -0.091 + +0.022 + -0.390 + -0.202 + -0.709 = -1.843
dL/dw2 = -0.063 + +0.001 + -0.007 + +0.007 + -0.074 + -0.222 + -0.038 + +0.016 + -0.046 + +0.008 + -0.232 + -0.090 + -0.523 = -1.262
dL/dw4 = -0.056 + -0.001 + +0.078 + +0.017 + -0.116 + -0.296 + -0.064 + +0.039 + -0.083 + +0.020 + -0.349 + -0.187 + -0.610 = -1.609
dL/dw5 = -0.056 + -0.001 + +0.009 + +0.009 + -0.074 + -0.214 + -0.036 + +0.019 + -0.048 + +0.010 + -0.186 + -0.080 + -0.281 = -0.931






    



w1=-0.517, w2=+0.280, w3=+1.000, w4=-0.527, w5=+0.103







    




Iteration 9






    



Output: [+0.528, +0.509, +0.576, +0.604, +0.630, +0.656, +0.715, +0.699, +0.722, +0.706, +0.675, +0.658, +0.624]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.113, +0.035, +0.308, +0.421, +0.534, +0.647, +0.920, +0.843, +0.956, +0.878, +0.730, +0.653, +0.505]
Input : [-0.266, +0.183, -0.517, -0.266, -0.266, -0.266, -0.517, +0.183, -0.266, +0.183, +0.280, +0.183, +0.280]
Gate  : [-0.425, -0.425, -0.527, -0.425, -0.425, -0.425, -0.527, -0.425, -0.425, -0.425, -0.527, -0.425, -0.527]






    




Loss: 0.50985= 0.002 + 0.000 + 0.052 + 0.036 + 0.023 + 0.013 + 0.080 + 0.093 + 0.074 + 0.087 + 0.007 + 0.012 + 0.032
dL/dw1 = -0.012 + -0.000 + +0.088 + +0.129 + +0.146 + +0.141 + +0.401 + +0.368 + +0.391 + +0.360 + +0.116 + +0.123 + -0.207 = +2.043
dL/dw2 = -0.012 + +0.000 + -0.007 + +0.046 + +0.078 + +0.088 + +0.196 + +0.132 + +0.180 + +0.119 + +0.068 + +0.055 + -0.159 = +0.787
dL/dw4 = -0.008 + -0.001 + +0.093 + +0.110 + +0.114 + +0.104 + +0.318 + +0.315 + +0.317 + +0.317 + +0.086 + +0.099 + -0.133 = +1.734
dL/dw5 = -0.008 + -0.001 + +0.013 + +0.045 + +0.062 + +0.066 + +0.146 + +0.127 + +0.153 + +0.137 + +0.044 + +0.044 + -0.074 = +0.754






    



w1=-0.722, w2=+0.201, w3=+1.000, w4=-0.701, w5=+0.027







    




Iteration 10






    



Output: [+0.592, +0.516, +0.639, +0.720, +0.789, +0.844, +0.900, +0.868, +0.905, +0.876, +0.859, +0.818, +0.796]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.373, +0.065, +0.570, +0.943, +1.316, +1.689, +2.195, +1.887, +2.260, +1.952, +1.811, +1.502, +1.361]
Input : [-0.554, +0.458, -0.722, -0.554, -0.554, -0.554, -0.722, +0.458, -0.554, +0.458, +0.201, +0.458, +0.201]
Gate  : [-0.673, -0.673, -0.701, -0.673, -0.673, -0.673, -0.701, -0.673, -0.673, -0.673, -0.701, -0.673, -0.701]






    




Loss: 0.38915= 0.017 + 0.001 + 0.019 + 0.000 + 0.009 + 0.042 + 0.002 + 0.001 + 0.003 + 0.000 + 0.056 + 0.023 + 0.216
dL/dw1 = -0.064 + -0.001 + +0.071 + +0.017 + -0.125 + -0.324 + -0.068 + +0.036 + -0.090 + +0.016 + -0.386 + -0.207 + -0.704 = -1.829
dL/dw2 = -0.060 + +0.001 + -0.007 + +0.007 + -0.071 + -0.212 + -0.036 + +0.014 + -0.045 + +0.006 + -0.229 + -0.092 + -0.522 = -1.245
dL/dw4 = -0.051 + -0.002 + +0.075 + +0.015 + -0.111 + -0.280 + -0.061 + +0.034 + -0.081 + +0.015 + -0.339 + -0.190 + -0.586 = -1.561
dL/dw5 = -0.051 + -0.002 + +0.009 + +0.007 + -0.069 + -0.199 + -0.033 + +0.016 + -0.046 + +0.007 + -0.179 + -0.082 + -0.278 = -0.899






    



w1=-0.539, w2=+0.326, w3=+1.000, w4=-0.545, w5=+0.117







    




Iteration 11






    



Output: [+0.526, +0.510, +0.582, +0.607, +0.632, +0.656, +0.719, +0.705, +0.727, +0.713, +0.676, +0.661, +0.620]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.104, +0.039, +0.332, +0.437, +0.541, +0.645, +0.939, +0.873, +0.977, +0.912, +0.734, +0.669, +0.491]
Input : [-0.244, +0.154, -0.539, -0.244, -0.244, -0.244, -0.539, +0.154, -0.244, +0.154, +0.326, +0.154, +0.326]
Gate  : [-0.427, -0.427, -0.545, -0.427, -0.427, -0.427, -0.545, -0.427, -0.427, -0.427, -0.545, -0.427, -0.545]






    




Loss: 0.48277= 0.001 + 0.000 + 0.048 + 0.034 + 0.022 + 0.013 + 0.077 + 0.088 + 0.071 + 0.081 + 0.007 + 0.011 + 0.030
dL/dw1 = -0.012 + -0.000 + +0.088 + +0.128 + +0.146 + +0.144 + +0.399 + +0.362 + +0.386 + +0.353 + +0.117 + +0.119 + -0.206 = +2.023
dL/dw2 = -0.011 + +0.000 + -0.007 + +0.045 + +0.077 + +0.089 + +0.193 + +0.129 + +0.177 + +0.115 + +0.068 + +0.054 + -0.159 = +0.771
dL/dw4 = -0.006 + -0.001 + +0.094 + +0.108 + +0.111 + +0.102 + +0.308 + +0.307 + +0.307 + +0.308 + +0.084 + +0.095 + -0.124 = +1.691
dL/dw5 = -0.006 + -0.001 + +0.013 + +0.041 + +0.057 + +0.062 + +0.133 + +0.118 + +0.141 + +0.127 + +0.042 + +0.042 + -0.073 = +0.697






    



w1=-0.741, w2=+0.249, w3=+1.000, w4=-0.714, w5=+0.048







    




Iteration 12






    



Output: [+0.587, +0.517, +0.645, +0.721, +0.786, +0.839, +0.899, +0.870, +0.905, +0.877, +0.857, +0.819, +0.791]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.352, +0.069, +0.598, +0.950, +1.302, +1.654, +2.183, +1.900, +2.252, +1.968, +1.791, +1.508, +1.330]
Input : [-0.528, +0.425, -0.741, -0.528, -0.528, -0.528, -0.741, +0.425, -0.528, +0.425, +0.249, +0.425, +0.249]
Gate  : [-0.666, -0.666, -0.714, -0.666, -0.666, -0.666, -0.714, -0.666, -0.666, -0.666, -0.714, -0.666, -0.714]






    




Loss: 0.36830= 0.015 + 0.001 + 0.017 + 0.000 + 0.009 + 0.038 + 0.002 + 0.001 + 0.003 + 0.000 + 0.054 + 0.023 + 0.207
dL/dw1 = -0.060 + -0.001 + +0.067 + +0.015 + -0.119 + -0.309 + -0.064 + +0.032 + -0.087 + +0.010 + -0.380 + -0.210 + -0.696 = -1.803
dL/dw2 = -0.056 + +0.001 + -0.006 + +0.006 + -0.067 + -0.201 + -0.033 + +0.013 + -0.043 + +0.004 + -0.225 + -0.094 + -0.519 = -1.221
dL/dw4 = -0.046 + -0.002 + +0.073 + +0.014 + -0.105 + -0.263 + -0.057 + +0.030 + -0.079 + +0.010 + -0.328 + -0.191 + -0.561 = -1.504
dL/dw5 = -0.046 + -0.002 + +0.009 + +0.006 + -0.064 + -0.183 + -0.030 + +0.014 + -0.043 + +0.005 + -0.172 + -0.083 + -0.274 = -0.863






    



w1=-0.561, w2=+0.371, w3=+1.000, w4=-0.563, w5=+0.134







    




Iteration 13






    



Output: [+0.524, +0.510, +0.589, +0.612, +0.634, +0.656, +0.724, +0.713, +0.732, +0.721, +0.677, +0.665, +0.617]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.096, +0.042, +0.358, +0.454, +0.550, +0.646, +0.962, +0.908, +1.004, +0.950, +0.741, +0.687, +0.478]
Input : [-0.224, +0.126, -0.561, -0.224, -0.224, -0.224, -0.561, +0.126, -0.224, +0.126, +0.371, +0.126, +0.371]
Gate  : [-0.429, -0.429, -0.563, -0.429, -0.429, -0.429, -0.563, -0.429, -0.429, -0.429, -0.563, -0.429, -0.563]






    




Loss: 0.45267= 0.001 + 0.000 + 0.044 + 0.032 + 0.021 + 0.013 + 0.073 + 0.082 + 0.067 + 0.075 + 0.007 + 0.010 + 0.028
dL/dw1 = -0.011 + -0.000 + +0.087 + +0.126 + +0.145 + +0.146 + +0.394 + +0.354 + +0.380 + +0.343 + +0.116 + +0.115 + -0.205 = +1.989
dL/dw2 = -0.010 + +0.000 + -0.007 + +0.044 + +0.076 + +0.090 + +0.188 + +0.124 + +0.171 + +0.110 + +0.067 + +0.052 + -0.159 = +0.748
dL/dw4 = -0.005 + -0.001 + +0.094 + +0.105 + +0.107 + +0.100 + +0.297 + +0.297 + +0.296 + +0.297 + +0.080 + +0.090 + -0.117 = +1.641
dL/dw5 = -0.005 + -0.001 + +0.014 + +0.038 + +0.053 + +0.058 + +0.121 + +0.108 + +0.129 + +0.118 + +0.040 + +0.040 + -0.072 = +0.641






    



w1=-0.760, w2=+0.296, w3=+1.000, w4=-0.727, w5=+0.070







    




Iteration 14






    



Output: [+0.582, +0.518, +0.651, +0.722, +0.783, +0.834, +0.897, +0.871, +0.904, +0.879, +0.854, +0.819, +0.785]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.330, +0.073, +0.625, +0.955, +1.286, +1.616, +2.168, +1.911, +2.241, +1.983, +1.768, +1.511, +1.295]
Input : [-0.502, +0.392, -0.760, -0.502, -0.502, -0.502, -0.760, +0.392, -0.502, +0.392, +0.296, +0.392, +0.296]
Gate  : [-0.658, -0.658, -0.727, -0.658, -0.658, -0.658, -0.727, -0.658, -0.658, -0.658, -0.727, -0.658, -0.727]






    




Loss: 0.34592= 0.014 + 0.001 + 0.015 + 0.000 + 0.008 + 0.034 + 0.001 + 0.000 + 0.003 + 0.000 + 0.051 + 0.024 + 0.196
dL/dw1 = -0.056 + -0.001 + +0.063 + +0.013 + -0.113 + -0.293 + -0.059 + +0.029 + -0.084 + +0.005 + -0.372 + -0.212 + -0.687 = -1.767
dL/dw2 = -0.052 + +0.001 + -0.006 + +0.005 + -0.063 + -0.189 + -0.030 + +0.011 + -0.041 + +0.002 + -0.220 + -0.095 + -0.515 = -1.191
dL/dw4 = -0.041 + -0.002 + +0.069 + +0.012 + -0.098 + -0.245 + -0.052 + +0.027 + -0.075 + +0.005 + -0.315 + -0.191 + -0.533 = -1.440
dL/dw5 = -0.041 + -0.002 + +0.009 + +0.005 + -0.058 + -0.167 + -0.027 + +0.012 + -0.040 + +0.002 + -0.164 + -0.083 + -0.269 = -0.823






    



w1=-0.583, w2=+0.415, w3=+1.000, w4=-0.583, w5=+0.152







    




Iteration 49






    



Output: [+0.494, +0.523, +0.710, +0.705, +0.699, +0.694, +0.835, +0.850, +0.847, +0.862, +0.714, +0.737, +0.529]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [-0.024, +0.090, +0.894, +0.869, +0.845, +0.820, +1.624, +1.738, +1.714, +1.829, +0.914, +1.029, +0.115]
Input : [+0.052, -0.243, -0.847, +0.052, +0.052, +0.052, -0.847, -0.243, +0.052, -0.243, +0.965, -0.243, +0.965]
Gate  : [-0.472, -0.472, -0.948, -0.472, -0.472, -0.472, -0.948, -0.472, -0.472, -0.472, -0.948, -0.472, -0.948]






    




Loss: 0.03034= 0.000 + 0.001 + 0.001 + 0.002 + 0.002 + 0.003 + 0.008 + 0.004 + 0.005 + 0.002 + 0.001 + 0.000 + 0.002
dL/dw1 = +0.003 + -0.001 + +0.021 + +0.039 + +0.062 + +0.091 + +0.155 + +0.090 + +0.116 + +0.058 + +0.052 + -0.015 + -0.074 = +0.599
dL/dw2 = +0.003 + +0.001 + -0.001 + +0.011 + +0.027 + +0.048 + +0.060 + +0.025 + +0.042 + +0.015 + +0.029 + -0.007 + -0.062 = +0.192
dL/dw4 = -0.000 + -0.004 + +0.022 + +0.026 + +0.030 + +0.032 + +0.079 + +0.060 + +0.064 + +0.042 + +0.021 + -0.008 + -0.014 = +0.349
dL/dw5 = -0.000 + -0.004 + +0.004 + +0.004 + +0.003 + +0.001 + +0.002 + +0.008 + +0.008 + +0.009 + +0.008 + -0.004 + -0.020 = +0.018






    



w1=-0.907, w2=+0.945, w3=+1.000, w4=-0.983, w5=+0.474







    




Iteration 99






    



Output: [+0.495, +0.518, +0.734, +0.730, +0.726, +0.722, +0.869, +0.880, +0.878, +0.887, +0.730, +0.748, +0.505]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [-0.020, +0.074, +1.014, +0.995, +0.975, +0.955, +1.896, +1.989, +1.970, +2.063, +0.994, +1.088, +0.018]
Input : [+0.056, -0.266, -0.932, +0.056, +0.056, +0.056, -0.932, -0.266, +0.056, -0.266, +1.060, -0.266, +1.060]
Gate  : [-0.352, -0.352, -1.009, -0.352, -0.352, -0.352, -1.009, -0.352, -0.352, -0.352, -1.009, -0.352, -1.009]






    




Loss: 0.00266= 0.000 + 0.001 + 0.000 + 0.000 + 0.000 + 0.000 + 0.001 + 0.000 + 0.000 + 0.000 + 0.000 + 0.001 + 0.000
dL/dw1 = +0.002 + -0.001 + -0.003 + +0.001 + +0.009 + +0.019 + +0.036 + +0.003 + +0.010 + -0.019 + +0.003 + -0.043 + -0.012 = +0.007
dL/dw2 = +0.002 + +0.001 + +0.000 + +0.000 + +0.003 + +0.009 + +0.011 + +0.001 + +0.003 + -0.004 + +0.002 + -0.020 + -0.010 = -0.003
dL/dw4 = -0.000 + -0.004 + -0.003 + +0.001 + +0.005 + +0.009 + +0.022 + +0.002 + +0.007 + -0.015 + +0.002 + -0.027 + -0.002 = -0.005
dL/dw5 = -0.000 + -0.004 + -0.001 + +0.000 + +0.000 + +0.000 + +0.000 + +0.000 + +0.001 + -0.003 + +0.001 + -0.013 + -0.004 = -0.022






    



w1=-0.933, w2=+1.060, w3=+1.000, w4=-1.008, w5=+0.659







    




Iteration 149






    



Output: [+0.498, +0.514, +0.732, +0.730, +0.729, +0.727, +0.873, +0.880, +0.879, +0.886, +0.730, +0.742, +0.500]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [-0.008, +0.054, +1.004, +0.996, +0.988, +0.980, +1.930, +1.992, +1.984, +2.047, +0.993, +1.055, +0.001]
Input : [+0.032, -0.246, -0.961, +0.032, +0.032, +0.032, -0.961, -0.246, +0.032, -0.246, +1.066, -0.246, +1.066]
Gate  : [-0.253, -0.253, -0.988, -0.253, -0.253, -0.253, -0.988, -0.253, -0.253, -0.253, -0.988, -0.253, -0.988]






    




Loss: 0.00112= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.001 + -0.000 + -0.001 + +0.001 + +0.004 + +0.007 + +0.021 + +0.002 + +0.005 + -0.012 + +0.004 + -0.025 + -0.000 = +0.005
dL/dw2 = +0.000 + +0.000 + +0.000 + +0.000 + +0.001 + +0.003 + +0.005 + +0.000 + +0.001 + -0.002 + +0.002 + -0.012 + -0.000 = -0.001
dL/dw4 = -0.000 + -0.003 + -0.001 + +0.001 + +0.003 + +0.004 + +0.015 + +0.002 + +0.004 + -0.012 + +0.002 + -0.018 + -0.000 = -0.003
dL/dw5 = -0.000 + -0.003 + -0.000 + +0.000 + +0.000 + +0.000 + +0.001 + +0.000 + +0.001 + -0.003 + +0.001 + -0.009 + -0.000 = -0.011






    



w1=-0.962, w2=+1.066, w3=+1.000, w4=-0.988, w5=+0.736







    




Iteration 199






    



Output: [+0.499, +0.511, +0.731, +0.730, +0.730, +0.729, +0.875, +0.880, +0.880, +0.884, +0.730, +0.739, +0.499]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [-0.003, +0.042, +0.999, +0.997, +0.994, +0.991, +1.948, +1.993, +1.991, +2.035, +0.994, +1.038, -0.004]
Input : [+0.013, -0.229, -0.981, +0.013, +0.013, +0.013, -0.981, -0.229, +0.013, -0.229, +1.068, -0.229, +1.068]
Gate  : [-0.196, -0.196, -0.975, -0.196, -0.196, -0.196, -0.975, -0.196, -0.196, -0.196, -0.975, -0.196, -0.975]






    




Loss: 0.00060= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + -0.000 + +0.000 + +0.001 + +0.002 + +0.003 + +0.014 + +0.002 + +0.003 + -0.009 + +0.003 + -0.017 + +0.002 = +0.003
dL/dw2 = +0.000 + +0.000 + -0.000 + +0.000 + +0.000 + +0.001 + +0.003 + +0.000 + +0.001 + -0.001 + +0.002 + -0.008 + +0.002 = -0.000
dL/dw4 = -0.000 + -0.002 + +0.000 + +0.001 + +0.001 + +0.002 + +0.012 + +0.002 + +0.002 + -0.009 + +0.002 + -0.013 + +0.001 = -0.002
dL/dw5 = -0.000 + -0.002 + +0.000 + +0.000 + +0.000 + +0.000 + +0.001 + +0.000 + +0.000 + -0.002 + +0.001 + -0.006 + +0.001 = -0.007






    



w1=-0.981, w2=+1.068, w3=+1.000, w4=-0.975, w5=+0.781







    




Iteration 249






    



Output: [+0.500, +0.509, +0.730, +0.730, +0.730, +0.731, +0.876, +0.880, +0.880, +0.884, +0.730, +0.737, +0.499]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.000, +0.034, +0.997, +0.997, +0.997, +0.997, +1.960, +1.994, +1.994, +2.028, +0.995, +1.029, -0.005]
Input : [-0.001, -0.216, -0.995, -0.001, -0.001, -0.001, -0.995, -0.216, -0.001, -0.216, +1.069, -0.216, +1.069]
Gate  : [-0.157, -0.157, -0.967, -0.157, -0.157, -0.157, -0.967, -0.157, -0.157, -0.157, -0.967, -0.157, -0.967]






    




Loss: 0.00037= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = -0.000 + -0.000 + +0.001 + +0.001 + +0.001 + +0.001 + +0.011 + +0.002 + +0.002 + -0.007 + +0.002 + -0.012 + +0.002 = +0.003
dL/dw2 = -0.000 + +0.000 + -0.000 + +0.000 + +0.000 + +0.000 + +0.002 + +0.000 + +0.000 + -0.001 + +0.001 + -0.006 + +0.002 = -0.000
dL/dw4 = -0.000 + -0.002 + +0.001 + +0.001 + +0.001 + +0.001 + +0.010 + +0.002 + +0.002 + -0.008 + +0.002 + -0.010 + +0.001 = -0.001
dL/dw5 = -0.000 + -0.002 + +0.000 + +0.000 + +0.000 + +0.000 + +0.001 + +0.000 + +0.000 + -0.002 + +0.001 + -0.005 + +0.001 = -0.005






    



w1=-0.996, w2=+1.069, w3=+1.000, w4=-0.967, w5=+0.810







    




Iteration 299






    



Output: [+0.500, +0.507, +0.730, +0.730, +0.731, +0.731, +0.877, +0.880, +0.880, +0.883, +0.730, +0.735, +0.499]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.002, +0.028, +0.995, +0.997, +0.999, +1.000, +1.967, +1.994, +1.996, +2.023, +0.996, +1.022, -0.005]
Input : [-0.012, -0.206, -1.007, -0.012, -0.012, -0.012, -1.007, -0.206, -0.012, -0.206, +1.069, -0.206, +1.069]
Gate  : [-0.130, -0.130, -0.961, -0.130, -0.130, -0.130, -0.961, -0.130, -0.130, -0.130, -0.961, -0.130, -0.961]






    




Loss: 0.00024= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = -0.000 + -0.000 + +0.001 + +0.001 + +0.000 + -0.000 + +0.008 + +0.001 + +0.001 + -0.005 + +0.002 + -0.009 + +0.002 = +0.002
dL/dw2 = -0.000 + +0.000 + -0.000 + +0.000 + +0.000 + -0.000 + +0.001 + +0.000 + +0.000 + -0.000 + +0.001 + -0.005 + +0.002 = -0.000
dL/dw4 = -0.000 + -0.002 + +0.001 + +0.001 + +0.000 + -0.000 + +0.008 + +0.002 + +0.001 + -0.006 + +0.001 + -0.008 + +0.001 = -0.001
dL/dw5 = -0.000 + -0.002 + +0.000 + +0.000 + +0.000 + -0.000 + +0.001 + +0.000 + +0.000 + -0.002 + +0.001 + -0.004 + +0.001 = -0.004






    



w1=-1.007, w2=+1.069, w3=+1.000, w4=-0.961, w5=+0.831







    




Iteration 349






    



Output: [+0.501, +0.506, +0.730, +0.731, +0.731, +0.731, +0.878, +0.880, +0.880, +0.883, +0.730, +0.735, +0.499]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.002, +0.024, +0.995, +0.997, +1.000, +1.002, +1.973, +1.995, +1.997, +2.019, +0.996, +1.018, -0.004]
Input : [-0.022, -0.198, -1.016, -0.022, -0.022, -0.022, -1.016, -0.198, -0.022, -0.198, +1.069, -0.198, +1.069]
Gate  : [-0.109, -0.109, -0.956, -0.109, -0.109, -0.109, -0.956, -0.109, -0.109, -0.109, -0.956, -0.109, -0.956]






    




Loss: 0.00017= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = -0.000 + -0.000 + +0.001 + +0.001 + +0.000 + -0.001 + +0.006 + +0.001 + +0.001 + -0.004 + +0.002 + -0.007 + +0.002 = +0.002
dL/dw2 = -0.000 + +0.000 + -0.000 + +0.000 + +0.000 + -0.000 + +0.001 + +0.000 + +0.000 + -0.000 + +0.001 + -0.004 + +0.002 = -0.000
dL/dw4 = -0.000 + -0.001 + +0.001 + +0.001 + +0.000 + -0.001 + +0.007 + +0.001 + +0.001 + -0.005 + +0.001 + -0.007 + +0.001 = -0.001
dL/dw5 = -0.000 + -0.001 + +0.000 + +0.000 + +0.000 + -0.000 + +0.001 + +0.000 + +0.000 + -0.001 + +0.001 + -0.003 + +0.001 = -0.003






    



w1=-1.016, w2=+1.069, w3=+1.000, w4=-0.956, w5=+0.847







    




Iteration 399






    



Output: [+0.501, +0.505, +0.730, +0.731, +0.731, +0.732, +0.878, +0.880, +0.881, +0.882, +0.730, +0.734, +0.499]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.003, +0.020, +0.995, +0.998, +1.000, +1.003, +1.977, +1.995, +1.998, +2.015, +0.997, +1.015, -0.004]
Input : [-0.029, -0.191, -1.023, -0.029, -0.029, -0.029, -1.023, -0.191, -0.029, -0.191, +1.069, -0.191, +1.069]
Gate  : [-0.092, -0.092, -0.952, -0.092, -0.092, -0.092, -0.952, -0.092, -0.092, -0.092, -0.952, -0.092, -0.952]






    




Loss: 0.00012= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = -0.000 + -0.000 + +0.001 + +0.001 + -0.000 + -0.001 + +0.005 + +0.001 + +0.001 + -0.003 + +0.001 + -0.006 + +0.002 = +0.001
dL/dw2 = -0.000 + +0.000 + -0.000 + +0.000 + -0.000 + -0.000 + +0.001 + +0.000 + +0.000 + -0.000 + +0.001 + -0.003 + +0.002 = -0.000
dL/dw4 = -0.000 + -0.001 + +0.001 + +0.001 + -0.000 + -0.001 + +0.006 + +0.001 + +0.001 + -0.004 + +0.001 + -0.005 + +0.001 = -0.001
dL/dw5 = -0.000 + -0.001 + +0.000 + +0.000 + -0.000 + -0.000 + +0.001 + +0.000 + +0.000 + -0.001 + +0.000 + -0.003 + +0.001 = -0.002






    



w1=-1.023, w2=+1.069, w3=+1.000, w4=-0.952, w5=+0.860







    




Iteration 449






    



Output: [+0.501, +0.504, +0.730, +0.731, +0.731, +0.732, +0.879, +0.880, +0.881, +0.882, +0.731, +0.733, +0.499]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.003, +0.017, +0.995, +0.998, +1.001, +1.003, +1.981, +1.996, +1.998, +2.013, +0.998, +1.012, -0.003]
Input : [-0.036, -0.185, -1.030, -0.036, -0.036, -0.036, -1.030, -0.185, -0.036, -0.185, +1.069, -0.185, +1.069]
Gate  : [-0.079, -0.079, -0.950, -0.079, -0.079, -0.079, -0.950, -0.079, -0.079, -0.079, -0.950, -0.079, -0.950]






    




Loss: 0.00009= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = -0.000 + -0.000 + +0.001 + +0.000 + -0.000 + -0.001 + +0.004 + +0.001 + +0.000 + -0.003 + +0.001 + -0.005 + +0.002 = +0.001
dL/dw2 = -0.000 + +0.000 + -0.000 + +0.000 + -0.000 + -0.000 + +0.000 + +0.000 + +0.000 + -0.000 + +0.001 + -0.002 + +0.002 = -0.000
dL/dw4 = -0.000 + -0.001 + +0.001 + +0.001 + -0.000 + -0.001 + +0.005 + +0.001 + +0.000 + -0.004 + +0.001 + -0.005 + +0.001 = -0.001
dL/dw5 = -0.000 + -0.001 + +0.000 + +0.000 + -0.000 + -0.000 + +0.001 + +0.000 + +0.000 + -0.001 + +0.000 + -0.002 + +0.001 = -0.002






    



w1=-1.030, w2=+1.069, w3=+1.000, w4=-0.949, w5=+0.871







    




Iteration 499






    



Output: [+0.501, +0.504, +0.730, +0.731, +0.731, +0.732, +0.879, +0.880, +0.881, +0.882, +0.731, +0.733, +0.499]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.003, +0.015, +0.995, +0.998, +1.001, +1.004, +1.984, +1.996, +1.999, +2.011, +0.998, +1.010, -0.003]
Input : [-0.041, -0.180, -1.035, -0.041, -0.041, -0.041, -1.035, -0.180, -0.041, -0.180, +1.069, -0.180, +1.069]
Gate  : [-0.067, -0.067, -0.947, -0.067, -0.067, -0.067, -0.947, -0.067, -0.067, -0.067, -0.947, -0.067, -0.947]






    




Loss: 0.00007= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = -0.000 + -0.000 + +0.001 + +0.000 + -0.000 + -0.001 + +0.004 + +0.001 + +0.000 + -0.002 + +0.001 + -0.004 + +0.001 = +0.001
dL/dw2 = -0.000 + +0.000 + -0.000 + +0.000 + -0.000 + -0.000 + +0.000 + +0.000 + +0.000 + -0.000 + +0.000 + -0.002 + +0.001 = -0.000
dL/dw4 = -0.000 + -0.001 + +0.001 + +0.001 + -0.000 + -0.001 + +0.004 + +0.001 + +0.000 + -0.003 + +0.001 + -0.004 + +0.001 = -0.000
dL/dw5 = -0.000 + -0.001 + +0.000 + +0.000 + -0.000 + -0.000 + +0.001 + +0.000 + +0.000 + -0.001 + +0.000 + -0.002 + +0.001 = -0.002






    



w1=-1.035, w2=+1.069, w3=+1.000, w4=-0.947, w5=+0.880







    




Iteration 549






    



Output: [+0.501, +0.503, +0.730, +0.731, +0.731, +0.732, +0.879, +0.880, +0.881, +0.882, +0.731, +0.733, +0.499]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.003, +0.013, +0.996, +0.998, +1.001, +1.004, +1.986, +1.996, +1.999, +2.009, +0.998, +1.008, -0.002]
Input : [-0.046, -0.176, -1.040, -0.046, -0.046, -0.046, -1.040, -0.176, -0.046, -0.176, +1.069, -0.176, +1.069]
Gate  : [-0.058, -0.058, -0.945, -0.058, -0.058, -0.058, -0.945, -0.058, -0.058, -0.058, -0.945, -0.058, -0.945]






    




Loss: 0.00005= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = -0.000 + -0.000 + +0.001 + +0.000 + -0.000 + -0.001 + +0.003 + +0.001 + +0.000 + -0.002 + +0.001 + -0.003 + +0.001 = +0.001
dL/dw2 = -0.000 + +0.000 + -0.000 + +0.000 + -0.000 + -0.000 + +0.000 + +0.000 + +0.000 + -0.000 + +0.000 + -0.002 + +0.001 = -0.000
dL/dw4 = -0.000 + -0.001 + +0.001 + +0.000 + -0.000 + -0.001 + +0.004 + +0.001 + +0.000 + -0.003 + +0.001 + -0.003 + +0.001 = -0.000
dL/dw5 = -0.000 + -0.001 + +0.000 + +0.000 + -0.000 + -0.000 + +0.001 + +0.000 + +0.000 + -0.001 + +0.000 + -0.002 + +0.001 = -0.001






    



w1=-1.040, w2=+1.069, w3=+1.000, w4=-0.945, w5=+0.888







    




Iteration 599






    



Output: [+0.501, +0.503, +0.730, +0.731, +0.731, +0.732, +0.880, +0.880, +0.881, +0.882, +0.731, +0.732, +0.499]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.002, +0.011, +0.996, +0.998, +1.001, +1.003, +1.988, +1.997, +1.999, +2.008, +0.999, +1.007, -0.002]
Input : [-0.050, -0.172, -1.044, -0.050, -0.050, -0.050, -1.044, -0.172, -0.050, -0.172, +1.070, -0.172, +1.070]
Gate  : [-0.050, -0.050, -0.944, -0.050, -0.050, -0.050, -0.944, -0.050, -0.050, -0.050, -0.944, -0.050, -0.944]






    




Loss: 0.00004= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = -0.000 + -0.000 + +0.001 + +0.000 + -0.000 + -0.001 + +0.003 + +0.001 + +0.000 + -0.002 + +0.001 + -0.003 + +0.001 = +0.001
dL/dw2 = -0.000 + +0.000 + -0.000 + +0.000 + -0.000 + -0.000 + +0.000 + +0.000 + +0.000 + -0.000 + +0.000 + -0.001 + +0.001 = -0.000
dL/dw4 = -0.000 + -0.001 + +0.001 + +0.000 + -0.000 + -0.001 + +0.003 + +0.001 + +0.000 + -0.002 + +0.001 + -0.003 + +0.000 = -0.000
dL/dw5 = -0.000 + -0.001 + +0.000 + +0.000 + -0.000 + -0.000 + +0.000 + +0.000 + +0.000 + -0.001 + +0.000 + -0.001 + +0.001 = -0.001






    



w1=-1.044, w2=+1.070, w3=+1.000, w4=-0.944, w5=+0.894







    




Iteration 649






    



Output: [+0.501, +0.502, +0.730, +0.731, +0.731, +0.732, +0.880, +0.880, +0.881, +0.881, +0.731, +0.732, +0.500]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.002, +0.009, +0.996, +0.999, +1.001, +1.003, +1.990, +1.997, +1.999, +2.007, +0.999, +1.006, -0.002]
Input : [-0.054, -0.169, -1.047, -0.054, -0.054, -0.054, -1.047, -0.169, -0.054, -0.169, +1.070, -0.169, +1.070]
Gate  : [-0.043, -0.043, -0.942, -0.043, -0.043, -0.043, -0.942, -0.043, -0.043, -0.043, -0.942, -0.043, -0.942]






    




Loss: 0.00003= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = -0.000 + -0.000 + +0.001 + +0.000 + -0.000 + -0.001 + +0.002 + +0.001 + +0.000 + -0.001 + +0.000 + -0.002 + +0.001 = +0.001
dL/dw2 = -0.000 + +0.000 + -0.000 + +0.000 + -0.000 + -0.000 + +0.000 + +0.000 + +0.000 + -0.000 + +0.000 + -0.001 + +0.001 = -0.000
dL/dw4 = -0.000 + -0.001 + +0.001 + +0.000 + -0.000 + -0.001 + +0.003 + +0.001 + +0.000 + -0.002 + +0.000 + -0.002 + +0.000 = -0.000
dL/dw5 = -0.000 + -0.001 + +0.000 + +0.000 + -0.000 + -0.000 + +0.000 + +0.000 + +0.000 + -0.001 + +0.000 + -0.001 + +0.000 = -0.001






    



w1=-1.047, w2=+1.070, w3=+1.000, w4=-0.942, w5=+0.900







    




Iteration 699






    



Output: [+0.501, +0.502, +0.730, +0.731, +0.731, +0.732, +0.880, +0.881, +0.881, +0.881, +0.731, +0.732, +0.500]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.002, +0.008, +0.997, +0.999, +1.001, +1.003, +1.991, +1.997, +2.000, +2.006, +0.999, +1.005, -0.002]
Input : [-0.057, -0.166, -1.050, -0.057, -0.057, -0.057, -1.050, -0.166, -0.057, -0.166, +1.070, -0.166, +1.070]
Gate  : [-0.037, -0.037, -0.941, -0.037, -0.037, -0.037, -0.941, -0.037, -0.037, -0.037, -0.941, -0.037, -0.941]






    




Loss: 0.00002= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = -0.000 + -0.000 + +0.001 + +0.000 + -0.000 + -0.001 + +0.002 + +0.001 + +0.000 + -0.001 + +0.000 + -0.002 + +0.001 = +0.001
dL/dw2 = -0.000 + +0.000 + -0.000 + +0.000 + -0.000 + -0.000 + +0.000 + +0.000 + +0.000 + -0.000 + +0.000 + -0.001 + +0.001 = -0.000
dL/dw4 = -0.000 + -0.000 + +0.001 + +0.000 + -0.000 + -0.001 + +0.002 + +0.001 + +0.000 + -0.002 + +0.000 + -0.002 + +0.000 = -0.000
dL/dw5 = -0.000 + -0.000 + +0.000 + +0.000 + -0.000 + -0.000 + +0.000 + +0.000 + +0.000 + -0.000 + +0.000 + -0.001 + +0.000 = -0.001






    



w1=-1.050, w2=+1.070, w3=+1.000, w4=-0.941, w5=+0.905







    




Iteration 749






    



Output: [+0.500, +0.502, +0.730, +0.731, +0.731, +0.732, +0.880, +0.881, +0.881, +0.881, +0.731, +0.732, +0.500]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.002, +0.007, +0.997, +0.999, +1.001, +1.003, +1.993, +1.998, +2.000, +2.005, +0.999, +1.004, -0.001]
Input : [-0.060, -0.164, -1.053, -0.060, -0.060, -0.060, -1.053, -0.164, -0.060, -0.164, +1.070, -0.164, +1.070]
Gate  : [-0.031, -0.031, -0.940, -0.031, -0.031, -0.031, -0.940, -0.031, -0.031, -0.031, -0.940, -0.031, -0.940]






    




Loss: 0.00001= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = -0.000 + -0.000 + +0.001 + +0.000 + -0.000 + -0.001 + +0.002 + +0.000 + +0.000 + -0.001 + +0.000 + -0.002 + +0.001 = +0.000
dL/dw2 = -0.000 + +0.000 + -0.000 + +0.000 + -0.000 + -0.000 + +0.000 + +0.000 + +0.000 + -0.000 + +0.000 + -0.001 + +0.001 = -0.000
dL/dw4 = -0.000 + -0.000 + +0.001 + +0.000 + -0.000 + -0.001 + +0.002 + +0.001 + +0.000 + -0.001 + +0.000 + -0.002 + +0.000 = -0.000
dL/dw5 = -0.000 + -0.000 + +0.000 + +0.000 + -0.000 + -0.000 + +0.000 + +0.000 + +0.000 + -0.000 + +0.000 + -0.001 + +0.000 = -0.001






    



w1=-1.053, w2=+1.070, w3=+1.000, w4=-0.940, w5=+0.909







    Out[28]:





(-1.0529977543939428,
 1.0696492814766454,
 1.0,
 -0.9401480352201477,
 0.9087447843399465)