Long Short-Term Memory (LSTM)

This page attempts to explain why LSTM was first proposed, and what are the core features together with some examples.

This is based on the paper Hochreiter and Schmidhuber. 1997. Long Short-Term Memory

Constant-Error Carrousel

The core feature of an LSTM unit as first proposed is the constant-error carrousel (CEC) which solves the vanishing gradient problem with standard RNN.

A CEC is neural network unit which consists of a single neuron with self-loop with weight fixed to 1.0 to ensure constant error flow when doing backpropagation.

Fig 1. Diagram of a single CEC unit

Now let's see an example of CEC at work. We will use CEC to do very simple task: recognizing whether current character is inside a bracketed expression, with the opening bracket considered to be inside, and the closing bracket considered to be outside, for simplicity. This is solvable only using network that can store memory, since to recognize whether a character is inside a bracketed expression, we need to have the knowledge that there is an opening bracket to the left of current character which does not have the corresponding closing bracket.

The input alphabets are coming from the set: $\{`a`, `b`, `(`, `)`\}$ with the following 2-dimensional embedding:

$$ \begin{eqnarray} emb(`a`) &=& (1, 1) \nonumber\\ emb(`b`) &=& (-1, -1) \nonumber\\ emb(`(`) &=& (1, 0) \nonumber\\ emb(`)`) &=& (0, -1) \nonumber \end{eqnarray} $$

For this task, we define a very simple network with two input units, one CEC unit, and one output unit with sigmoid activation ($\sigma(x) = \frac{1}{1 + e^{-x}}$), as follows:

Fig 2. Network used for the bracketed expression recognition

For this task, we define the loss function as the cross-entropy (CE) between the predicted and the true one:

$$ \begin{eqnarray} \mathrm{CE}(x, y) = - (x\log(y) + (1-x)\log(1-y)) \nonumber\\ \mathrm{Loss}(\hat{o}_t, o_t) = \mathrm{CE}(\hat{o}_t, o_t) - \mathrm{CE}(\hat{o}_t, \hat{o}_t) \end{eqnarray} $$

with $\hat{o}_t$ and $o_t$ represent the target value (gold standard) and output value (network prediction), respectively, at time step $t$. The first term is the cross-entropy between the target value and the output value, and the second term is the entropy of the target value itself. Note that the second term is a constant, and serves just to make the minimum achievable loss to be 0 (perfect output).

More specifically, we have:

$$ \begin{equation} o_t = \sigma(w_3*s_t) \end{equation} $$

where $s_t$ is the output of the CEC unit (a.k.a. the memory), which depends on the previous value of the memory $s_{t-1}$, and the input $x_{t,1}$ and $x_{t,2}$ (representing the first and second dimension of the input at time step $t$):

$$ \begin{equation} s_t = \underbrace{w_s * s_{t-1}}_\text{previous value} + \underbrace{w_1 * x_{t,1} + w_2 * x_{t,2}}_\text{input} \end{equation} $$

where $w_s$ is the weight of the self-loop, which is 1.0. But for clarity of why this should be 1.0, the calculation later does not assume $w_s=1.0$.


In [14]:
import math
from IPython.display import Markdown, display

def printmd(string):
    display(Markdown(string))

# Embedding
embedding = {}
embedding['a'] = (1.0, 1)
embedding['b'] = (-1, -1)
embedding['('] = (1, 0)
embedding[')'] = (0, 1)

# embedding['a'] = (-1, 0)
# embedding['b'] = (-0.5, 0)
# embedding['('] = (1, 1)
# embedding[')'] = (1, -1)

# Weights
w1=1.0
w2=1.0
w3=1.0
ws=1.0

memory_history = [0]
output_history = [0]

def sigmoid(x):
    return 1.0/(1+math.exp(-x))

def gold(seq):
    result = [0]
    bracket_count = 0
    for char in seq:
        if char == '(':
            bracket_count += 1
        if char == ')':
            bracket_count -= 1
        result.append(sigmoid(bracket_count))
    return result

def activate_memory(x1, x2):
    prev_memory = memory_history[-1]
    memory_history.append(ws*prev_memory + w1*x1 + w2*x2)
    return memory_history[-1]

def activate_output(h):
    output_history.append(sigmoid(w3*h))
    return output_history[-1]

def predict(seq):
    for char in seq:
        activate_output(activate_memory(*embedding[char]))
    result = output_history[:]
    return result

def reset():
    global memory_history, output_history
    memory_history = [0]
    output_history = [0]
    
def loss(gold_seq, pred_seq):
    result = 0.0
    per_position_loss = []
    for idx, (corr, pred) in enumerate(zip(gold_seq, pred_seq)):
        cur_loss  = -(corr*math.log(pred) + (1-corr)*math.log(1-pred))
        cur_loss -= -(corr*math.log(corr) + (1-corr)*math.log(1-corr))
        result += cur_loss
        per_position_loss.append(cur_loss)
    return result, per_position_loss


def print_list(lst):
    '''A convenience method to print a list of real numbers'''
    as_str = ['{:+.3f}'.format(num) for num in lst]
    print('[{}]'.format(', '.join(as_str)))

In [15]:
# See typical values of sigmoid
for i in range(5):
    print('sigmoid({}) = {}'.format(i, sigmoid(i)))


sigmoid(0) = 0.5
sigmoid(1) = 0.7310585786300049
sigmoid(2) = 0.8807970779778823
sigmoid(3) = 0.9525741268224334
sigmoid(4) = 0.9820137900379085

Now let's check the function calculating the target value. Basically we want it to output $\sigma(0)$ or $\sigma(1)$ when the output is outside or inside a bracketed expression, respectively.


In [16]:
gold('a(a)a')[1:]  # The first element is dummy


Out[16]:
[0.5, 0.7310585786300049, 0.7310585786300049, 0.5, 0.5]

Which is $\sigma(0), \sigma(1), \sigma(1), \sigma(0), \sigma(0)$, which is what we expect. So far so good.


Now let's see what our network outputs


In [17]:
test_seq = 'ab(ab)ab'
reset()
w1 = 1.0
w2 = 1.0
w3 = 1.0
result = predict(test_seq)
correct = gold(test_seq)
print('Output: ', end='')
print_list(result[1:])
print('Target: ', end='')
print_list(correct[1:])
print('Loss  : {:.3f}'.format(loss(correct[1:], result[1:])[0]))


Output: [+0.881, +0.500, +0.731, +0.953, +0.731, +0.881, +0.982, +0.881]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Loss  : 2.900

We see that the loss is still non-zero, and we see that some values are incorrectly predicted.

Next we will see the gradient calculation in progress, so that we can update the weight to reduce the loss.

Calculating Gradients

To do the weight update, we need to calculate the partial derivative of the loss function with respect to the each weight. We have three weight parameters $w_1, w_2$, and $w_3$, so we need to compute three different partial derivatives.

For ease of notation, we denote $\mathrm{Loss}_t = \mathrm{Loss}(\hat{o}_t, o_t)$ as the loss at time step $t$ and $\mathrm{Loss} = \sum_t \mathrm{Loss}_t$ as the total loss over one sequence.

Remember that our objective is to reduce the total loss.

$$ \begin{eqnarray} \frac{\partial\mathrm{Loss}}{\partial w_i} & = & \sum_t\frac{\partial \mathrm{Loss}_t}{\partial w_i} \\ & = & \sum_t\frac{\partial \mathrm{Loss}_t}{\partial o_t} \cdot \frac{\partial o_t}{\partial w_i} \qquad \text{(by chain rule)} \\ \end{eqnarray} $$

for $w_3$, we can already compute the gradient here, which is:

$$ \require{cancel} \begin{eqnarray} \frac{\partial\mathrm{Loss}}{\partial w_3} & = & \sum_t\frac{\partial \mathrm{Loss}_t}{\partial o_t} \cdot \frac{\partial o_t}{\partial w_i} \\ & = & \sum_t\underbrace{\frac{o_t - \hat{o}_t}{\cancel{o_t(1-o_t)}}}_{=\frac{\partial \mathrm{Loss}_t}{\partial o_t}} \cdot \underbrace{s_t \cdot \cancel{o_t(1-o_t)}}_{=\frac{\partial o_t}{\partial w_i}} \\ & = & \sum_t(o_t-\hat{o}_t)s_t \end{eqnarray} $$

for $w_1$ and $w_2$, we have:

$$ \begin{eqnarray} \frac{\partial\mathrm{Loss}}{\partial w_i} & = & \sum_t\frac{\partial \mathrm{Loss}_t}{\partial o_t} \cdot \frac{\partial o_t}{\partial w_i} \\ & = & \sum_t \frac{o_t - \hat{o}_t}{o_t(1-o_t)} \cdot \frac{\partial o_t}{\partial s_t} \cdot \frac{\partial s_t}{\partial w_i} \\ & = & \sum_t \frac{o_t - \hat{o}_t}{\cancel{o_t(1-o_t)}} \cdot w_3\cdot \cancel{o_t(1-o_t)} \cdot \frac{\partial s_t}{\partial w_i} \\ & = & \sum_t (o_t - \hat{o}_t)w_3 \cdot \frac{\partial s_t}{\partial w_i} \\ & = & \sum_t (o_t - \hat{o}_t)w_3 \cdot \left(w_s\cdot\frac{\partial s_{t-1}}{\partial w_i} + x_{t,i}\right) \\ & = & \sum_t (o_t - \hat{o}_t)w_3 \cdot \left({w_s}^2\cdot\frac{\partial s_{t-2}}{\partial w_i} + w_s\cdot x_{t-1,i} + x_{t,i}\right) \\ & & \ldots \\ & = & \sum_t (o_t - \hat{o}_t)w_3 \cdot \left(\sum_{t'\leq t} {w_s}^{t-t'}x_{t',i}\right) \\ \end{eqnarray} $$

Important Note on $w_s$!

We see that the gradient with respect to $w_1$ and $w_2$ contains the factor ${w_s}^{t-t'}$, where $t-t'$ can be as large as the input sequence length. So if $w_s \neq 1.0$, then either the gradient will vanish or blow up as the input sequence gets longer.


In [18]:
def dLdw1(test_seq, gold_seq, pred_seq, state_seq, info):
    result = 0.0
    grad_str = '<div style="font-family:monaco; font-size:12px">dL/dw1 = '
    for time_step in range(1, len(gold_seq)):
        cur_dell = (pred_seq[time_step] - gold_seq[time_step]) * w3
        cur_dell *= sum(ws**(step-1)*embedding[test_seq[step-1]][0] for step in range(1, time_step+1))
        if cur_dell < 0:
            color = 'red'
        else:
            color = 'blue'
        grad_str += '{}<span style="color:{}">{:+.3f}</span>'.format(' + ' if time_step > 1 else '', color, cur_dell)
        result += cur_dell
    grad_str += ' = <span style="color:{}; text-decoration:underline">{:+.3f}</span></div>'.format(
                'red' if result < 0 else 'blue', result)
    # printmd(grad_str)
    info[0] += grad_str
    return result

def dLdw2(test_seq, gold_seq, pred_seq, state_seq, info):
    result = 0.0
    grad_str = '<div style="font-family:monaco; font-size:12px">dL/dw2 = '
    for time_step in range(1, len(gold_seq)):
        cur_dell = (pred_seq[time_step] - gold_seq[time_step]) * w3
        cur_dell *= sum(ws**(step-1)*embedding[test_seq[step-1]][1] for step in range(1, time_step+1))
        if cur_dell < 0:
            color = 'red'
        else:
            color = 'blue'
        grad_str += '{}<span style="color:{}">{:+.3f}</span>'.format(' + ' if time_step > 1 else '', color, cur_dell)
        result += cur_dell
    grad_str += ' = <span style="color:{}; text-decoration:underline">{:+.3f}</span></div>'.format(
                'red' if result < 0 else 'blue', result)
    # printmd(grad_str)
    info[0] += grad_str
    return result

def dLdw3(test_seq, gold_seq, pred_seq, state_seq, info):
    result = 0.0
    grad_str = '<div style="font-family:monaco; font-size:12px">dL/dw3 = '
    for time_step in range(1, len(gold_seq)):
        cur_dell = (pred_seq[time_step] - gold_seq[time_step]) * state_seq[time_step]
        if cur_dell < 0:
            color = 'red'
        else:
            color = 'blue'
        grad_str += '{}<span style="color:{}">{:+.3f}</span>'.format(' + ' if time_step > 1 else '', color, cur_dell)
        result += cur_dell
    grad_str += ' = <span style="color:{}; text-decoration:underline">{:+.3f}</span></div>'.format(
                'red' if result < 0 else 'blue', result)
    # printmd(grad_str)
    info[0] += grad_str
    return result

Experiment

Now we define an experiment which takes in initial values of all the weights, learning rate, and maximum number of iterations. We also want to experiment with fixing the weight $w_3$ (i.e., it is not learned).

The code below will print the total loss, the loss at each time step, the output, target, and memory at each time step, and also the gradient for each learned parameter at each time step.


In [19]:
def experiment(test_seq, _w1=1.0, _w2=1.0, _w3=1.0, alpha=1e-1, max_iter=250, fixed_w3=True):
    global w1, w2, w3
    reset()
    w1 = _w1
    w2 = _w2
    w3 = _w3
    correct = gold(test_seq)
    print('w1={:+.3f}, w2={:+.3f}, w3={:+.3f}'.format(w1, w2, w3))
    for iter_num in range(max_iter):
        result = predict(test_seq)
        if iter_num < 15 or (iter_num % 50 == 49):
            printmd('<div style="font-weight:bold">Iteration {}</div>'.format(iter_num))
            print('Output: ', end='')
            print_list(result[1:])
            print('Target: ', end='')
            print_list(correct[1:])
            print('Memory: ', end='')
            print_list(memory_history[1:])
        total_loss, per_position_loss = loss(correct[1:], result[1:])
        info = ['', iter_num]
        info[0] = ('<div>Loss: <span style="font-weight:bold">{:.5f}</span>' +
                   '= <span style="font-family:monaco; font-size:12px">').format(total_loss)
        for idx, per_pos_loss in enumerate(per_position_loss):
            info[0] += '{}{:.3f}'.format(' + ' if idx > 0 else '', per_pos_loss)
        info[0] += '</span></div>'
        # printmd(loss_str)
        w1 -= alpha * dLdw1(test_seq, correct, result, memory_history, info)
        w2 -= alpha * dLdw2(test_seq, correct, result, memory_history, info)
        if not fixed_w3:
            w3 -= alpha * dLdw3(test_seq, correct, result, memory_history, info)
        if iter_num < 15 or (iter_num % 50 == 49):
            printmd(info[0])
            print('w1={:+.3f}, w2={:+.3f}, w3={:+.3f}'.format(w1, w2, w3))
            print()
        reset()
    return w1, w2, w3

In [20]:
embedding['a'] = (1.0, 1)
embedding['b'] = (-1, -1)
embedding['('] = (1, 0)
embedding[')'] = (0, 1)

w1, w2, w3 = experiment('ab(ab)bb', _w1=1.0, _w2=1.0, max_iter=250, alpha=1e-1, fixed_w3=True)
printmd('## Test on longer sequence')
experiment('aabba(aba)bab', _w1=w1, _w2=w2, _w3=w3, alpha=1e-2, max_iter=100)


w1=+1.000, w2=+1.000, w3=+1.000
Iteration 0
Output: [+0.881, +0.500, +0.731, +0.953, +0.731, +0.881, +0.500, +0.119]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+2.000, +0.000, +1.000, +3.000, +1.000, +2.000, +0.000, -2.000]
Loss: 1.57455= 0.434 + 0.000 + 0.000 + 0.273 + 0.000 + 0.434 + 0.000 + 0.434
dL/dw1 = +0.381 + +0.000 + +0.000 + +0.443 + +0.000 + +0.381 + +0.000 + +0.381 = +1.585
dL/dw2 = +0.381 + +0.000 + +0.000 + +0.222 + +0.000 + +0.381 + +0.000 + +0.381 = +1.364
w1=+0.841, w2=+0.864, w3=+1.000

Iteration 1
Output: [+0.846, +0.500, +0.699, +0.927, +0.699, +0.846, +0.500, +0.154]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+1.705, +0.000, +0.841, +2.547, +0.841, +1.705, +0.000, -1.705]
Loss: 1.16233= 0.326 + 0.000 + 0.003 + 0.178 + 0.003 + 0.326 + 0.000 + 0.326
dL/dw1 = +0.346 + +0.000 + -0.032 + +0.393 + -0.032 + +0.346 + +0.000 + +0.346 = +1.367
dL/dw2 = +0.346 + +0.000 + -0.000 + +0.196 + -0.000 + +0.346 + +0.000 + +0.346 = +1.235
w1=+0.705, w2=+0.740, w3=+1.000

Iteration 2
Output: [+0.809, +0.500, +0.669, +0.896, +0.669, +0.809, +0.500, +0.191]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+1.445, +0.000, +0.705, +2.150, +0.705, +1.445, +0.000, -1.445]
Loss: 0.84706= 0.241 + 0.000 + 0.009 + 0.106 + 0.009 + 0.241 + 0.000 + 0.241
dL/dw1 = +0.309 + +0.000 + -0.062 + +0.329 + -0.062 + +0.309 + +0.000 + +0.309 = +1.133
dL/dw2 = +0.309 + +0.000 + -0.000 + +0.165 + -0.000 + +0.309 + +0.000 + +0.309 = +1.092
w1=+0.591, w2=+0.631, w3=+1.000

Iteration 3
Output: [+0.772, +0.500, +0.644, +0.860, +0.644, +0.772, +0.500, +0.228]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+1.222, -0.000, +0.591, +1.814, +0.591, +1.222, -0.000, -1.222]
Loss: 0.61998= 0.176 + 0.000 + 0.017 + 0.057 + 0.017 + 0.176 + 0.000 + 0.176
dL/dw1 = +0.272 + +0.000 + -0.087 + +0.258 + -0.087 + +0.272 + +0.000 + +0.272 = +0.900
dL/dw2 = +0.272 + +0.000 + -0.000 + +0.129 + -0.000 + +0.272 + +0.000 + +0.272 = +0.946
w1=+0.501, w2=+0.536, w3=+1.000

Iteration 4
Output: [+0.738, +0.500, +0.623, +0.823, +0.623, +0.738, +0.500, +0.262]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+1.038, +0.000, +0.501, +1.539, +0.501, +1.038, +0.000, -1.038]
Loss: 0.46541= 0.129 + 0.000 + 0.026 + 0.026 + 0.026 + 0.129 + 0.000 + 0.129
dL/dw1 = +0.238 + +0.000 + -0.108 + +0.185 + -0.108 + +0.238 + +0.000 + +0.238 = +0.683
dL/dw2 = +0.238 + +0.000 + -0.000 + +0.092 + -0.000 + +0.238 + +0.000 + +0.238 = +0.808
w1=+0.433, w2=+0.456, w3=+1.000

Iteration 5
Output: [+0.709, +0.500, +0.607, +0.789, +0.607, +0.709, +0.500, +0.291]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.889, +0.000, +0.433, +1.322, +0.433, +0.889, -0.000, -0.889]
Loss: 0.36481= 0.096 + 0.000 + 0.034 + 0.010 + 0.034 + 0.096 + 0.000 + 0.096
dL/dw1 = +0.209 + +0.000 + -0.124 + +0.117 + -0.124 + +0.209 + +0.000 + +0.209 = +0.494
dL/dw2 = +0.209 + +0.000 + -0.000 + +0.058 + -0.000 + +0.209 + +0.000 + +0.209 = +0.684
w1=+0.384, w2=+0.387, w3=+1.000

Iteration 6
Output: [+0.684, +0.500, +0.595, +0.760, +0.595, +0.684, +0.500, +0.316]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.771, +0.000, +0.384, +1.155, +0.384, +0.771, +0.000, -0.771]
Loss: 0.30095= 0.073 + 0.000 + 0.041 + 0.002 + 0.041 + 0.073 + 0.000 + 0.073
dL/dw1 = +0.184 + +0.000 + -0.136 + +0.059 + -0.136 + +0.184 + +0.000 + +0.184 = +0.337
dL/dw2 = +0.184 + +0.000 + -0.000 + +0.029 + -0.000 + +0.184 + +0.000 + +0.184 = +0.580
w1=+0.350, w2=+0.329, w3=+1.000

Iteration 7
Output: [+0.664, +0.500, +0.587, +0.737, +0.587, +0.664, +0.500, +0.336]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.679, +0.000, +0.350, +1.029, +0.350, +0.679, -0.000, -0.679]
Loss: 0.26040= 0.057 + 0.000 + 0.045 + 0.000 + 0.045 + 0.057 + 0.000 + 0.057
dL/dw1 = +0.164 + +0.000 + -0.144 + +0.011 + -0.144 + +0.164 + +0.000 + +0.164 = +0.213
dL/dw2 = +0.164 + +0.000 + -0.000 + +0.006 + -0.000 + +0.164 + +0.000 + +0.164 = +0.496
w1=+0.329, w2=+0.279, w3=+1.000

Iteration 8
Output: [+0.648, +0.500, +0.581, +0.718, +0.581, +0.648, +0.500, +0.352]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.608, +0.000, +0.329, +0.937, +0.329, +0.608, +0.000, -0.608]
Loss: 0.23388= 0.046 + 0.000 + 0.048 + 0.000 + 0.048 + 0.046 + 0.000 + 0.046
dL/dw1 = +0.148 + +0.000 + -0.150 + -0.025 + -0.150 + +0.148 + +0.000 + +0.148 = +0.118
dL/dw2 = +0.148 + +0.000 + -0.000 + -0.013 + -0.000 + +0.148 + +0.000 + +0.148 = +0.430
w1=+0.317, w2=+0.236, w3=+1.000

Iteration 9
Output: [+0.635, +0.500, +0.579, +0.705, +0.579, +0.635, +0.500, +0.365]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.553, -0.000, +0.317, +0.870, +0.317, +0.553, -0.000, -0.553]
Loss: 0.21552= 0.038 + 0.000 + 0.050 + 0.002 + 0.050 + 0.038 + 0.000 + 0.038
dL/dw1 = +0.135 + +0.000 + -0.152 + -0.053 + -0.152 + +0.135 + +0.000 + +0.135 = +0.047
dL/dw2 = +0.135 + +0.000 + -0.000 + -0.026 + -0.000 + +0.135 + +0.000 + +0.135 = +0.378
w1=+0.312, w2=+0.199, w3=+1.000

Iteration 10
Output: [+0.625, +0.500, +0.577, +0.695, +0.577, +0.625, +0.500, +0.375]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.511, -0.000, +0.312, +0.823, +0.312, +0.511, -0.000, -0.511]
Loss: 0.20184= 0.032 + 0.000 + 0.051 + 0.003 + 0.051 + 0.032 + 0.000 + 0.032
dL/dw1 = +0.125 + +0.000 + -0.154 + -0.072 + -0.154 + +0.125 + +0.000 + +0.125 = -0.005
dL/dw2 = +0.125 + +0.000 + -0.000 + -0.036 + -0.000 + +0.125 + +0.000 + +0.125 = +0.339
w1=+0.313, w2=+0.165, w3=+1.000

Iteration 11
Output: [+0.617, +0.500, +0.578, +0.688, +0.578, +0.617, +0.500, +0.383]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.477, +0.000, +0.313, +0.790, +0.313, +0.477, -0.000, -0.477]
Loss: 0.19087= 0.028 + 0.000 + 0.051 + 0.004 + 0.051 + 0.028 + 0.000 + 0.028
dL/dw1 = +0.117 + +0.000 + -0.154 + -0.086 + -0.154 + +0.117 + +0.000 + +0.117 = -0.042
dL/dw2 = +0.117 + +0.000 + -0.000 + -0.043 + -0.000 + +0.117 + +0.000 + +0.117 = +0.308
w1=+0.317, w2=+0.134, w3=+1.000

Iteration 12
Output: [+0.611, +0.500, +0.579, +0.683, +0.579, +0.611, +0.500, +0.389]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.451, -0.000, +0.317, +0.768, +0.317, +0.451, -0.000, -0.451]
Loss: 0.18151= 0.025 + 0.000 + 0.050 + 0.005 + 0.050 + 0.025 + 0.000 + 0.025
dL/dw1 = +0.111 + +0.000 + -0.152 + -0.096 + -0.152 + +0.111 + +0.000 + +0.111 = -0.069
dL/dw2 = +0.111 + +0.000 + -0.000 + -0.048 + -0.000 + +0.111 + +0.000 + +0.111 = +0.284
w1=+0.324, w2=+0.105, w3=+1.000

Iteration 13
Output: [+0.606, +0.500, +0.580, +0.680, +0.580, +0.606, +0.500, +0.394]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.429, +0.000, +0.324, +0.753, +0.324, +0.429, +0.000, -0.429]
Loss: 0.17315= 0.023 + 0.000 + 0.049 + 0.006 + 0.049 + 0.023 + 0.000 + 0.023
dL/dw1 = +0.106 + +0.000 + -0.151 + -0.102 + -0.151 + +0.106 + +0.000 + +0.106 = -0.087
dL/dw2 = +0.106 + +0.000 + -0.000 + -0.051 + -0.000 + +0.106 + +0.000 + +0.106 = +0.266
w1=+0.332, w2=+0.079, w3=+1.000

Iteration 14
Output: [+0.601, +0.500, +0.582, +0.678, +0.582, +0.601, +0.500, +0.399]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.411, -0.000, +0.332, +0.744, +0.332, +0.411, -0.000, -0.411]
Loss: 0.16547= 0.021 + 0.000 + 0.048 + 0.007 + 0.048 + 0.021 + 0.000 + 0.021
dL/dw1 = +0.101 + +0.000 + -0.149 + -0.106 + -0.149 + +0.101 + +0.000 + +0.101 = -0.100
dL/dw2 = +0.101 + +0.000 + -0.000 + -0.053 + -0.000 + +0.101 + +0.000 + +0.101 = +0.251
w1=+0.342, w2=+0.054, w3=+1.000

Iteration 49
Output: [+0.545, +0.500, +0.661, +0.700, +0.661, +0.545, +0.500, +0.455]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.181, +0.000, +0.667, +0.848, +0.667, +0.181, +0.000, -0.181]
Loss: 0.03748= 0.004 + 0.000 + 0.011 + 0.002 + 0.011 + 0.004 + 0.000 + 0.004
dL/dw1 = +0.045 + +0.000 + -0.070 + -0.062 + -0.070 + +0.045 + +0.000 + +0.045 = -0.067
dL/dw2 = +0.045 + +0.000 + -0.000 + -0.031 + -0.000 + +0.045 + +0.000 + +0.045 = +0.105
w1=+0.674, w2=-0.496, w3=+1.000

Iteration 99
Output: [+0.516, +0.500, +0.706, +0.720, +0.706, +0.516, +0.500, +0.484]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.065, +0.000, +0.878, +0.942, +0.878, +0.065, +0.000, -0.065]
Loss: 0.00490= 0.001 + 0.000 + 0.002 + 0.000 + 0.002 + 0.001 + 0.000 + 0.001
dL/dw1 = +0.016 + +0.000 + -0.025 + -0.023 + -0.025 + +0.016 + +0.000 + +0.016 = -0.024
dL/dw2 = +0.016 + +0.000 + -0.000 + -0.012 + -0.000 + +0.016 + +0.000 + +0.016 = +0.037
w1=+0.880, w2=-0.817, w3=+1.000

Iteration 149
Output: [+0.506, +0.500, +0.722, +0.727, +0.722, +0.506, +0.500, +0.494]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.024, +0.000, +0.954, +0.978, +0.954, +0.024, +0.000, -0.024]
Loss: 0.00067= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.006 + +0.000 + -0.009 + -0.009 + -0.009 + +0.006 + +0.000 + +0.006 = -0.009
dL/dw2 = +0.006 + +0.000 + -0.000 + -0.004 + -0.000 + +0.006 + +0.000 + +0.006 = +0.014
w1=+0.955, w2=-0.932, w3=+1.000

Iteration 199
Output: [+0.502, +0.500, +0.728, +0.729, +0.728, +0.502, +0.500, +0.498]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.009, +0.000, +0.983, +0.992, +0.983, +0.009, +0.000, -0.009]
Loss: 0.00009= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.002 + +0.000 + -0.003 + -0.003 + -0.003 + +0.002 + +0.000 + +0.002 = -0.003
dL/dw2 = +0.002 + +0.000 + -0.000 + -0.002 + -0.000 + +0.002 + +0.000 + +0.002 = +0.005
w1=+0.983, w2=-0.975, w3=+1.000

Iteration 249
Output: [+0.501, +0.500, +0.730, +0.730, +0.730, +0.501, +0.500, +0.499]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.003, +0.000, +0.994, +0.997, +0.994, +0.003, +0.000, -0.003]
Loss: 0.00001= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.001 + +0.000 + -0.001 + -0.001 + -0.001 + +0.001 + +0.000 + +0.001 = -0.001
dL/dw2 = +0.001 + +0.000 + -0.000 + -0.001 + -0.000 + +0.001 + +0.000 + +0.001 = +0.002
w1=+0.994, w2=-0.990, w3=+1.000

Test on longer sequence

w1=+0.994, w2=-0.990, w3=+1.000
Iteration 0
Output: [+0.501, +0.502, +0.501, +0.500, +0.501, +0.730, +0.731, +0.730, +0.731, +0.502, +0.502, +0.502, +0.502]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.003, +0.007, +0.003, +0.000, +0.003, +0.997, +1.000, +0.997, +1.000, +0.010, +0.007, +0.010, +0.007]
Loss: 0.00005= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.001 + +0.003 + +0.001 + +0.000 + +0.001 + -0.001 + +0.000 + -0.001 + +0.000 + +0.007 + +0.003 + +0.007 + +0.003 = +0.025
dL/dw2 = +0.001 + +0.003 + +0.001 + +0.000 + +0.001 + -0.001 + +0.000 + -0.001 + +0.000 + +0.007 + +0.003 + +0.007 + +0.003 = +0.026
w1=+0.993, w2=-0.991, w3=+1.000

Iteration 1
Output: [+0.501, +0.501, +0.501, +0.500, +0.501, +0.730, +0.731, +0.730, +0.731, +0.502, +0.501, +0.502, +0.501]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.003, +0.006, +0.003, +0.000, +0.003, +0.996, +0.999, +0.996, +0.999, +0.008, +0.006, +0.008, +0.006]
Loss: 0.00003= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.001 + +0.003 + +0.001 + +0.000 + +0.001 + -0.001 + -0.001 + -0.001 + -0.001 + +0.006 + +0.003 + +0.006 + +0.003 = +0.019
dL/dw2 = +0.001 + +0.003 + +0.001 + +0.000 + +0.001 + -0.001 + -0.000 + -0.001 + -0.000 + +0.006 + +0.003 + +0.006 + +0.003 = +0.020
w1=+0.993, w2=-0.991, w3=+1.000

Iteration 2
Output: [+0.501, +0.501, +0.501, +0.500, +0.501, +0.730, +0.731, +0.730, +0.731, +0.502, +0.501, +0.502, +0.501]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.002, +0.005, +0.002, +0.000, +0.002, +0.996, +0.998, +0.996, +0.998, +0.007, +0.005, +0.007, +0.005]
Loss: 0.00003= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.001 + +0.002 + +0.001 + +0.000 + +0.001 + -0.002 + -0.001 + -0.002 + -0.001 + +0.005 + +0.002 + +0.005 + +0.002 = +0.014
dL/dw2 = +0.001 + +0.002 + +0.001 + +0.000 + +0.001 + -0.001 + -0.001 + -0.001 + -0.001 + +0.005 + +0.002 + +0.005 + +0.002 = +0.016
w1=+0.993, w2=-0.991, w3=+1.000

Iteration 3
Output: [+0.501, +0.501, +0.501, +0.500, +0.501, +0.730, +0.731, +0.730, +0.731, +0.502, +0.501, +0.502, +0.501]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.002, +0.004, +0.002, +0.000, +0.002, +0.995, +0.997, +0.995, +0.997, +0.006, +0.004, +0.006, +0.004]
Loss: 0.00002= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.001 + +0.002 + +0.001 + +0.000 + +0.001 + -0.002 + -0.002 + -0.002 + -0.002 + +0.005 + +0.002 + +0.005 + +0.002 = +0.010
dL/dw2 = +0.001 + +0.002 + +0.001 + +0.000 + +0.001 + -0.001 + -0.001 + -0.001 + -0.001 + +0.005 + +0.002 + +0.005 + +0.002 = +0.013
w1=+0.993, w2=-0.991, w3=+1.000

Iteration 4
Output: [+0.500, +0.501, +0.500, +0.500, +0.500, +0.730, +0.730, +0.730, +0.730, +0.501, +0.501, +0.501, +0.501]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.002, +0.004, +0.002, +0.000, +0.002, +0.995, +0.997, +0.995, +0.997, +0.006, +0.004, +0.006, +0.004]
Loss: 0.00002= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + +0.002 + +0.000 + +0.000 + +0.000 + -0.002 + -0.002 + -0.002 + -0.002 + +0.004 + +0.002 + +0.004 + +0.002 = +0.007
dL/dw2 = +0.000 + +0.002 + +0.000 + +0.000 + +0.000 + -0.001 + -0.001 + -0.001 + -0.001 + +0.004 + +0.002 + +0.004 + +0.002 = +0.011
w1=+0.993, w2=-0.991, w3=+1.000

Iteration 5
Output: [+0.500, +0.501, +0.500, +0.500, +0.500, +0.730, +0.730, +0.730, +0.730, +0.501, +0.501, +0.501, +0.501]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.002, +0.003, +0.002, +0.000, +0.002, +0.995, +0.996, +0.995, +0.996, +0.005, +0.003, +0.005, +0.003]
Loss: 0.00002= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + +0.002 + +0.000 + +0.000 + +0.000 + -0.002 + -0.002 + -0.002 + -0.002 + +0.004 + +0.002 + +0.004 + +0.002 = +0.005
dL/dw2 = +0.000 + +0.002 + +0.000 + +0.000 + +0.000 + -0.001 + -0.001 + -0.001 + -0.001 + +0.004 + +0.002 + +0.004 + +0.002 = +0.009
w1=+0.993, w2=-0.991, w3=+1.000

Iteration 6
Output: [+0.500, +0.501, +0.500, +0.500, +0.500, +0.730, +0.730, +0.730, +0.730, +0.501, +0.501, +0.501, +0.501]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.002, +0.003, +0.002, +0.000, +0.002, +0.994, +0.996, +0.994, +0.996, +0.005, +0.003, +0.005, +0.003]
Loss: 0.00002= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + +0.002 + +0.000 + +0.000 + +0.000 + -0.002 + -0.002 + -0.002 + -0.002 + +0.003 + +0.002 + +0.003 + +0.002 = +0.003
dL/dw2 = +0.000 + +0.002 + +0.000 + +0.000 + +0.000 + -0.001 + -0.002 + -0.001 + -0.002 + +0.003 + +0.002 + +0.003 + +0.002 = +0.007
w1=+0.993, w2=-0.991, w3=+1.000

Iteration 7
Output: [+0.500, +0.501, +0.500, +0.500, +0.500, +0.730, +0.730, +0.730, +0.730, +0.501, +0.501, +0.501, +0.501]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.001, +0.003, +0.001, +0.000, +0.001, +0.994, +0.996, +0.994, +0.996, +0.004, +0.003, +0.004, +0.003]
Loss: 0.00002= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.002 + -0.003 + -0.002 + -0.003 + +0.003 + +0.001 + +0.003 + +0.001 = +0.002
dL/dw2 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.001 + -0.002 + -0.001 + -0.002 + +0.003 + +0.001 + +0.003 + +0.001 = +0.006
w1=+0.993, w2=-0.992, w3=+1.000

Iteration 8
Output: [+0.500, +0.501, +0.500, +0.500, +0.500, +0.730, +0.730, +0.730, +0.730, +0.501, +0.501, +0.501, +0.501]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.001, +0.003, +0.001, +0.000, +0.001, +0.994, +0.996, +0.994, +0.996, +0.004, +0.003, +0.004, +0.003]
Loss: 0.00002= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.002 + -0.003 + -0.002 + -0.003 + +0.003 + +0.001 + +0.003 + +0.001 = +0.001
dL/dw2 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.001 + -0.002 + -0.001 + -0.002 + +0.003 + +0.001 + +0.003 + +0.001 = +0.005
w1=+0.993, w2=-0.992, w3=+1.000

Iteration 9
Output: [+0.500, +0.501, +0.500, +0.500, +0.500, +0.730, +0.730, +0.730, +0.730, +0.501, +0.501, +0.501, +0.501]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.001, +0.003, +0.001, +0.000, +0.001, +0.994, +0.995, +0.994, +0.995, +0.004, +0.003, +0.004, +0.003]
Loss: 0.00002= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.002 + -0.003 + -0.002 + -0.003 + +0.003 + +0.001 + +0.003 + +0.001 = +0.000
dL/dw2 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.001 + -0.002 + -0.001 + -0.002 + +0.003 + +0.001 + +0.003 + +0.001 = +0.005
w1=+0.993, w2=-0.992, w3=+1.000

Iteration 10
Output: [+0.500, +0.501, +0.500, +0.500, +0.500, +0.730, +0.730, +0.730, +0.730, +0.501, +0.501, +0.501, +0.501]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.001, +0.002, +0.001, +0.000, +0.001, +0.994, +0.995, +0.994, +0.995, +0.004, +0.002, +0.004, +0.002]
Loss: 0.00002= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.002 + -0.003 + -0.002 + -0.003 + +0.003 + +0.001 + +0.003 + +0.001 = -0.000
dL/dw2 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.001 + -0.002 + -0.001 + -0.002 + +0.003 + +0.001 + +0.003 + +0.001 = +0.004
w1=+0.993, w2=-0.992, w3=+1.000

Iteration 11
Output: [+0.500, +0.501, +0.500, +0.500, +0.500, +0.730, +0.730, +0.730, +0.730, +0.501, +0.501, +0.501, +0.501]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.001, +0.002, +0.001, +0.000, +0.001, +0.994, +0.995, +0.994, +0.995, +0.004, +0.002, +0.004, +0.002]
Loss: 0.00002= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.002 + -0.003 + -0.002 + -0.003 + +0.003 + +0.001 + +0.003 + +0.001 = -0.001
dL/dw2 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.001 + -0.002 + -0.001 + -0.002 + +0.003 + +0.001 + +0.003 + +0.001 = +0.004
w1=+0.993, w2=-0.992, w3=+1.000

Iteration 12
Output: [+0.500, +0.501, +0.500, +0.500, +0.500, +0.730, +0.730, +0.730, +0.730, +0.501, +0.501, +0.501, +0.501]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.001, +0.002, +0.001, +0.000, +0.001, +0.994, +0.995, +0.994, +0.995, +0.003, +0.002, +0.003, +0.002]
Loss: 0.00002= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.002 + -0.003 + -0.002 + -0.003 + +0.003 + +0.001 + +0.003 + +0.001 = -0.001
dL/dw2 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.001 + -0.002 + -0.001 + -0.002 + +0.003 + +0.001 + +0.003 + +0.001 = +0.003
w1=+0.993, w2=-0.992, w3=+1.000

Iteration 13
Output: [+0.500, +0.501, +0.500, +0.500, +0.500, +0.730, +0.730, +0.730, +0.730, +0.501, +0.501, +0.501, +0.501]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.001, +0.002, +0.001, +0.000, +0.001, +0.994, +0.995, +0.994, +0.995, +0.003, +0.002, +0.003, +0.002]
Loss: 0.00002= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.002 + -0.003 + -0.002 + -0.003 + +0.003 + +0.001 + +0.003 + +0.001 = -0.001
dL/dw2 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.001 + -0.002 + -0.001 + -0.002 + +0.003 + +0.001 + +0.003 + +0.001 = +0.003
w1=+0.993, w2=-0.992, w3=+1.000

Iteration 14
Output: [+0.500, +0.501, +0.500, +0.500, +0.500, +0.730, +0.730, +0.730, +0.730, +0.501, +0.501, +0.501, +0.501]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.001, +0.002, +0.001, +0.000, +0.001, +0.994, +0.995, +0.994, +0.995, +0.003, +0.002, +0.003, +0.002]
Loss: 0.00002= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.002 + -0.003 + -0.002 + -0.003 + +0.002 + +0.001 + +0.002 + +0.001 = -0.001
dL/dw2 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.001 + -0.002 + -0.001 + -0.002 + +0.002 + +0.001 + +0.002 + +0.001 = +0.003
w1=+0.993, w2=-0.992, w3=+1.000

Iteration 49
Output: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.730, +0.730, +0.730, +0.730, +0.501, +0.500, +0.501, +0.500]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.001, +0.002, +0.001, +0.000, +0.001, +0.994, +0.995, +0.994, +0.995, +0.003, +0.002, +0.003, +0.002]
Loss: 0.00001= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.002 + -0.003 + -0.002 + -0.003 + +0.002 + +0.001 + +0.002 + +0.001 = -0.002
dL/dw2 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.001 + -0.002 + -0.001 + -0.002 + +0.002 + +0.001 + +0.002 + +0.001 = +0.002
w1=+0.994, w2=-0.993, w3=+1.000

Iteration 99
Output: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.730, +0.730, +0.730, +0.730, +0.501, +0.500, +0.501, +0.500]
Target: [+0.500, +0.500, +0.500, +0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500, +0.500]
Memory: [+0.001, +0.002, +0.001, +0.000, +0.001, +0.995, +0.996, +0.995, +0.996, +0.002, +0.002, +0.002, +0.002]
Loss: 0.00001= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.002 + -0.002 + -0.002 + -0.002 + +0.002 + +0.001 + +0.002 + +0.001 = -0.002
dL/dw2 = +0.000 + +0.001 + +0.000 + +0.000 + +0.000 + -0.001 + -0.002 + -0.001 + -0.002 + +0.002 + +0.001 + +0.002 + +0.001 = +0.002
w1=+0.994, w2=-0.994, w3=+1.000

Out[20]:
(0.9943962680848304, -0.9935690516603158, 1.0)

Let's Try Adding Input Gate

We saw in the experiment before that there is conflicting update (at one point of the sequence the gradient is positive, while at another point it is negative), which the original paper explains that it is caused by the weight into the memory cell needs to update the memory at one point (when we see brackets in this case) and retain information at another point (when we see any other characters).

Another core feature of LSTM that was designed to resolve this issue is that it adds gates: input gate and output gate, to control the flow of information through the memory cells.

In the following, we try adding an input gate, which the network should learn to activate (value = 1) only when it sees an opening bracket or closing bracket. So basically the input gate is telling the network which inputs are relevant and which are not.

Note: Below we have two versions for the input gate: linear with sigmoid, and bilinear with bias. The $w_4$ and $w_5$ have different interpretation depending on the input gate chosen. The bilinear gate was added because the input doesn't allow the linear gate to be useful.


In [21]:
w4 = 1.0
w5 = 1.0
input_history = [0]
gate_history = [0]

def reset_gated():
    global memory_history, output_history, input_history, gate_history
    memory_history = [0]
    output_history = [0]
    input_history = [0]
    gate_history = [0]

def activate_input(x1, x2):
    result = (w1*x1+w2*x2)
    input_history.append(result)
    return result

def activate_gate(x1, x2, bilinear_gate=True):
    if bilinear_gate:
        result = w4 + w5*x1*x2  # Bilinear gate
    else:
        result = sigmoid(w4*x1+w5*x2)  # The true linear gate
    gate_history.append(result)
    return result

def dLdw1_gated(test_seq, gold_seq, pred_seq, state_seq, input_seq, gate_seq, info, bilinear_gate=True):
    result = 0.0
    grad_str = '<div style="font-family:monaco; font-size:12px">dL/dw1 = '
    for time_step in range(1, len(gold_seq)):
        cur_dell = (pred_seq[time_step] - gold_seq[time_step]) * w3
        cur_dell *= sum(embedding[test_seq[step-1]][0]*gate_seq[step] for step in range(1, time_step+1))
        if cur_dell < 0:
            color = 'red'
        else:
            color = 'blue'
        grad_str += '{}<span style="color:{}">{:+.3f}</span>'.format(' + ' if time_step > 1 else '', color, cur_dell)
        result += cur_dell
    grad_str += ' = <span style="color:{}; text-decoration:underline">{:+.3f}</span></div>'.format(
                'red' if result < 0 else 'blue', result)
    # printmd(grad_str)
    info[0] += grad_str
    return result

def dLdw2_gated(test_seq, gold_seq, pred_seq, state_seq, input_seq, gate_seq, info, bilinear_gate=True):
    result = 0.0
    grad_str = '<div style="font-family:monaco; font-size:12px">dL/dw2 = '
    for time_step in range(1, len(gold_seq)):
        cur_dell = (pred_seq[time_step] - gold_seq[time_step]) * w3
        cur_dell *= sum(embedding[test_seq[step-1]][1]*gate_seq[step] for step in range(1, time_step+1))
        if cur_dell < 0:
            color = 'red'
        else:
            color = 'blue'
        grad_str += '{}<span style="color:{}">{:+.3f}</span>'.format(' + ' if time_step > 1 else '', color, cur_dell)
        result += cur_dell
    grad_str += ' = <span style="color:{}; text-decoration:underline">{:+.3f}</span></div>'.format(
                'red' if result < 0 else 'blue', result)
    # printmd(grad_str)
    info[0] += grad_str
    return result

def dLdw4_gated(test_seq, gold_seq, pred_seq, state_seq, input_seq, gate_seq, info, bilinear_gate=True):
    result = 0.0
    grad_str = '<div style="font-family:monaco; font-size:12px">dL/dw4 = '
    for time_step in range(1, len(gold_seq)):
        cur_dell = (pred_seq[time_step] - gold_seq[time_step]) * w3
        if bilinear_gate:
            cur_dell *= sum(input_seq[step] for step in range(1, time_step+1))
        else:
            cur_dell *= sum(embedding[test_seq[step-1]][0]*gate_seq[step]*input_seq[step]*(1-gate_seq[step])
                            for step in range(1,time_step+1))
        if cur_dell < 0:
            color = 'red'
        else:
            color = 'blue'
        grad_str += '{}<span style="color:{}">{:+.3f}</span>'.format(' + ' if time_step > 1 else '', color, cur_dell)
        result += cur_dell
    grad_str += ' = <span style="color:{}; text-decoration:underline">{:+.3f}</span></div>'.format(
                    'red' if result < 0 else 'blue', result)
    # printmd(grad_str)
    info[0] += grad_str
    return result

def dLdw5_gated(test_seq, gold_seq, pred_seq, state_seq, input_seq, gate_seq, info, bilinear_gate=True):
    result = 0.0
    grad_str = '<div style="font-family:monaco; font-size:12px">dL/dw5 = '
    for time_step in range(1, len(gold_seq)):
        cur_dell = (pred_seq[time_step] - gold_seq[time_step]) * w3
        if bilinear_gate:
            cur_dell *= sum(embedding[test_seq[step-1]][0]*embedding[test_seq[step-1]][1]*input_seq[step]
                            for step in range(1, time_step+1))
        else:
            cur_dell *= sum(embedding[test_seq[step-1]][1]*gate_seq[step]*input_seq[step]*(1-gate_seq[step])
                            for step in range(1,time_step+1))
        if cur_dell < 0:
            color = 'red'
        else:
            color = 'blue'
        grad_str += '{}<span style="color:{}">{:+.3f}</span>'.format(' + ' if time_step > 1 else '', color, cur_dell)
        result += cur_dell
    grad_str += ' = <span style="color:{}; text-decoration:underline">{:+.3f}</span></div>'.format(
                    'red' if result < 0 else 'blue', result)
    # printmd(grad_str)
    info[0] += grad_str
    return result

def activate_memory_gated():
    memory_history.append(ws*memory_history[-1] + input_history[-1]*gate_history[-1])
    return memory_history[-1]

def predict_gated(seq):
    for char in seq:
        activate_input(*embedding[char])
        activate_gate(*embedding[char])
        activate_output(activate_memory_gated())
    result = output_history[:]
    return result

def experiment_gated(test_seq, _w1=1.0, _w2=1.0, _w3=1.0, _w4=1.0, _w5=1.0, alpha=1e-1, max_iter=750,
                     bilinear_gate=True, fixed_w3=True, fixed_w4=False, fixed_w5=False):
    global w1, w2, w3, w4, w5
    reset_gated()
    w1 = _w1
    w2 = _w2
    w3 = _w3
    w4 = _w4
    w5 = _w5
    correct = gold(test_seq)
    print('w1={:+.3f}, w2={:+.3f}, w3={:+.3f}, w4={:+.3f}, w5={:+.3f}'.format(w1, w2, w3, w4, w5))
    for iter_num in range(max_iter):
        result = predict_gated(test_seq)
        if iter_num < 15 or (iter_num % 50 == 49):
            printmd('<div style="font-weight:bold">Iteration {}</div>'.format(iter_num))
            print('Output: ', end='')
            print_list(result[1:])
            print('Target: ', end='')
            print_list(correct[1:])
            print('Memory: ', end='')
            print_list(memory_history[1:])
            print('Input : ', end='')
            print_list(input_history[1:])
            print('Gate  : ', end='')
            print_list(gate_history[1:])
        total_loss, per_position_loss = loss(correct[1:], result[1:])
        info = ['', iter_num]
        info[0] = ('<div>Loss: <span style="font-weight:bold">{:.5f}</span>' +
                   '= <span style="font-family:monaco">').format(total_loss)
        for idx, per_pos_loss in enumerate(per_position_loss):
            info[0] += '{}{:.3f}'.format(' + ' if idx > 0 else '', per_pos_loss)
        info[0] += '</span></div>'
        # printmd(loss_str)
        w1 -= alpha * dLdw1_gated(test_seq, correct, result, memory_history, input_history, gate_history,
                                  info, bilinear_gate)
        w2 -= alpha * dLdw2_gated(test_seq, correct, result, memory_history, input_history, gate_history,
                                  info, bilinear_gate)
        if not fixed_w3:
            w3 -= alpha * dLdw3(test_seq, correct, result, memory_history, info, bilinear_gate)
        if not fixed_w4:
            w4 -= alpha * dLdw4_gated(test_seq, correct, result, memory_history, input_history, gate_history,
                                      info, bilinear_gate)
        if not fixed_w5:
            w5 -= alpha * dLdw5_gated(test_seq, correct, result, memory_history, input_history, gate_history,
                                      info, bilinear_gate)
        if iter_num < 15 or (iter_num % 50 == 49):
            printmd(info[0])
            print('w1={:+.3f}, w2={:+.3f}, w3={:+.3f}, w4={:+.3f}, w5={:+.3f}'.format(w1, w2, w3, w4, w5))
            print()
        reset_gated()
    return w1, w2, w3, w4, w5

In [22]:
embedding['a'] = (1.0, 1)
embedding['b'] = (-1, -1)
embedding['('] = (1, 0)
embedding[')'] = (0, 1)

experiment_gated('ab(ab)bb', _w1=1.0, _w2=1.0, _w4=1.0, _w5=1.0, alpha=1e-1, max_iter=250, fixed_w3=True)


w1=+1.000, w2=+1.000, w3=+1.000, w4=+1.000, w5=+1.000
Iteration 0
Output: [+0.982, +0.500, +0.731, +0.993, +0.731, +0.881, +0.119, +0.002]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+4.000, +0.000, +1.000, +5.000, +1.000, +2.000, -2.000, -6.000]
Input : [+2.000, -2.000, +1.000, +2.000, -2.000, +1.000, -2.000, -2.000]
Gate  : [+2.000, +2.000, +1.000, +2.000, +2.000, +1.000, +2.000, +2.000]
Loss: 5.27111= 1.325 + 0.000 + 0.000 + 0.769 + 0.000 + 0.434 + 0.434 + 2.309
dL/dw1 = +0.964 + +0.000 + +0.000 + +0.787 + +0.000 + +0.381 + +0.381 + +1.493 = +4.005
dL/dw2 = +0.964 + +0.000 + +0.000 + +0.524 + +0.000 + +0.381 + +0.381 + +1.493 = +3.743
dL/dw4 = +0.964 + +0.000 + +0.000 + +0.787 + +0.000 + +0.762 + -0.000 + +0.995 = +3.507
dL/dw5 = +0.964 + +0.000 + +0.000 + +0.524 + +0.000 + +0.000 + +0.762 + +1.990 = +4.240
w1=+0.600, w2=+0.626, w3=+1.000, w4=+0.649, w5=+0.576

Iteration 1
Output: [+0.818, +0.500, +0.596, +0.869, +0.596, +0.689, +0.331, +0.099]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+1.501, +0.000, +0.389, +1.890, +0.389, +0.795, -0.706, -2.207]
Input : [+1.225, -1.225, +0.600, +1.225, -1.225, +0.626, -1.225, -1.225]
Gate  : [+1.225, +1.225, +0.649, +1.225, +1.225, +0.649, +1.225, +1.225]
Loss: 1.05796= 0.259 + 0.000 + 0.040 + 0.067 + 0.040 + 0.077 + 0.061 + 0.515
dL/dw1 = +0.389 + +0.000 + -0.088 + +0.258 + -0.088 + +0.123 + +0.098 + +0.722 = +1.415
dL/dw2 = +0.389 + +0.000 + -0.000 + +0.169 + -0.000 + +0.123 + +0.098 + +0.722 = +1.500
dL/dw4 = +0.389 + +0.000 + -0.081 + +0.251 + -0.081 + +0.232 + -0.000 + +0.491 = +1.202
dL/dw5 = +0.389 + +0.000 + -0.000 + +0.169 + -0.000 + +0.000 + +0.208 + +0.982 = +1.748
w1=+0.458, w2=+0.476, w3=+1.000, w4=+0.529, w5=+0.401

Iteration 2
Output: [+0.704, +0.500, +0.560, +0.752, +0.560, +0.621, +0.407, +0.224]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.869, +0.000, +0.242, +1.111, +0.242, +0.494, -0.375, -1.243]
Input : [+0.934, -0.934, +0.458, +0.934, -0.934, +0.476, -0.934, -0.934]
Gate  : [+0.930, +0.930, +0.529, +0.930, +0.930, +0.529, +0.930, +0.930]
Loss: 0.44676= 0.091 + 0.000 + 0.062 + 0.001 + 0.062 + 0.030 + 0.017 + 0.182
dL/dw1 = +0.190 + +0.000 + -0.090 + +0.031 + -0.090 + +0.064 + +0.037 + +0.368 = +0.509
dL/dw2 = +0.190 + +0.000 + -0.000 + +0.020 + -0.000 + +0.064 + +0.037 + +0.368 = +0.679
dL/dw4 = +0.191 + +0.000 + -0.078 + +0.030 + -0.078 + +0.113 + +0.000 + +0.258 = +0.435
dL/dw5 = +0.191 + +0.000 + -0.000 + +0.020 + -0.000 + +0.000 + +0.086 + +0.516 = +0.813
w1=+0.407, w2=+0.408, w3=+1.000, w4=+0.486, w5=+0.320

Iteration 3
Output: [+0.658, +0.500, +0.549, +0.701, +0.549, +0.598, +0.435, +0.286]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.656, +0.000, +0.198, +0.854, +0.198, +0.396, -0.261, -0.917]
Input : [+0.815, -0.815, +0.407, +0.815, -0.815, +0.408, -0.815, -0.815]
Gate  : [+0.806, +0.806, +0.486, +0.806, +0.806, +0.486, +0.806, +0.806]
Loss: 0.32491= 0.053 + 0.000 + 0.070 + 0.002 + 0.070 + 0.019 + 0.008 + 0.102
dL/dw1 = +0.128 + +0.000 + -0.088 + -0.038 + -0.088 + +0.047 + +0.021 + +0.241 = +0.222
dL/dw2 = +0.128 + +0.000 + -0.000 + -0.024 + -0.000 + +0.047 + +0.021 + +0.241 = +0.413
dL/dw4 = +0.129 + +0.000 + -0.074 + -0.036 + -0.074 + +0.080 + -0.000 + +0.175 = +0.199
dL/dw5 = +0.129 + +0.000 + -0.000 + -0.024 + -0.000 + +0.000 + +0.053 + +0.350 = +0.507
w1=+0.385, w2=+0.366, w3=+1.000, w4=+0.466, w5=+0.269

Iteration 4
Output: [+0.635, +0.500, +0.545, +0.675, +0.545, +0.587, +0.450, +0.320]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.552, +0.000, +0.179, +0.731, +0.179, +0.350, -0.202, -0.754]
Input : [+0.751, -0.751, +0.385, +0.751, -0.751, +0.366, -0.751, -0.751]
Gate  : [+0.735, +0.735, +0.466, +0.735, +0.735, +0.466, +0.735, +0.735]
Loss: 0.28193= 0.038 + 0.000 + 0.074 + 0.007 + 0.074 + 0.015 + 0.005 + 0.070
dL/dw1 = +0.099 + +0.000 + -0.087 + -0.067 + -0.087 + +0.040 + +0.014 + +0.181 = +0.093
dL/dw2 = +0.099 + +0.000 + -0.000 + -0.041 + -0.000 + +0.040 + +0.014 + +0.181 = +0.293
dL/dw4 = +0.101 + +0.000 + -0.072 + -0.064 + -0.072 + +0.065 + -0.000 + +0.135 = +0.095
dL/dw5 = +0.101 + +0.000 + -0.000 + -0.042 + -0.000 + +0.000 + +0.038 + +0.271 = +0.368
w1=+0.376, w2=+0.337, w3=+1.000, w4=+0.456, w5=+0.232

Iteration 5
Output: [+0.620, +0.500, +0.543, +0.660, +0.543, +0.581, +0.459, +0.342]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.491, +0.000, +0.171, +0.662, +0.171, +0.325, -0.166, -0.657]
Input : [+0.713, -0.713, +0.376, +0.713, -0.713, +0.337, -0.713, -0.713]
Gate  : [+0.689, +0.689, +0.456, +0.689, +0.689, +0.456, +0.689, +0.689]
Loss: 0.26115= 0.030 + 0.000 + 0.075 + 0.012 + 0.075 + 0.013 + 0.003 + 0.053
dL/dw1 = +0.083 + +0.000 + -0.086 + -0.082 + -0.086 + +0.037 + +0.010 + +0.146 = +0.022
dL/dw2 = +0.083 + +0.000 + -0.000 + -0.049 + -0.000 + +0.037 + +0.010 + +0.146 = +0.226
dL/dw4 = +0.086 + +0.000 + -0.071 + -0.078 + -0.071 + +0.057 + +0.000 + +0.113 = +0.037
dL/dw5 = +0.086 + +0.000 + -0.000 + -0.051 + -0.000 + +0.000 + +0.029 + +0.226 = +0.290
w1=+0.373, w2=+0.315, w3=+1.000, w4=+0.453, w5=+0.203

Iteration 6
Output: [+0.611, +0.500, +0.542, +0.650, +0.542, +0.577, +0.465, +0.356]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.451, +0.000, +0.169, +0.620, +0.169, +0.311, -0.140, -0.591]
Input : [+0.688, -0.688, +0.373, +0.688, -0.688, +0.315, -0.688, -0.688]
Gate  : [+0.656, +0.656, +0.453, +0.656, +0.656, +0.453, +0.656, +0.656]
Loss: 0.24872= 0.025 + 0.000 + 0.075 + 0.015 + 0.075 + 0.012 + 0.002 + 0.043
dL/dw1 = +0.073 + +0.000 + -0.085 + -0.090 + -0.085 + +0.035 + +0.007 + +0.123 = -0.022
dL/dw2 = +0.073 + +0.000 + -0.000 + -0.053 + -0.000 + +0.035 + +0.007 + +0.123 = +0.185
dL/dw4 = +0.076 + +0.000 + -0.071 + -0.086 + -0.071 + +0.053 + -0.000 + +0.099 = +0.001
dL/dw5 = +0.076 + +0.000 + -0.000 + -0.056 + -0.000 + +0.000 + +0.024 + +0.198 = +0.242
w1=+0.376, w2=+0.296, w3=+1.000, w4=+0.452, w5=+0.179

Iteration 7
Output: [+0.604, +0.500, +0.542, +0.644, +0.542, +0.575, +0.470, +0.367]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.424, +0.000, +0.170, +0.594, +0.170, +0.304, -0.120, -0.545]
Input : [+0.672, -0.672, +0.376, +0.672, -0.672, +0.296, -0.672, -0.672]
Gate  : [+0.631, +0.631, +0.452, +0.631, +0.631, +0.452, +0.631, +0.631]
Loss: 0.23997= 0.022 + 0.000 + 0.075 + 0.017 + 0.075 + 0.011 + 0.002 + 0.037
dL/dw1 = +0.066 + +0.000 + -0.085 + -0.094 + -0.085 + +0.034 + +0.005 + +0.108 = -0.052
dL/dw2 = +0.066 + +0.000 + -0.000 + -0.055 + -0.000 + +0.034 + +0.005 + +0.108 = +0.158
dL/dw4 = +0.070 + +0.000 + -0.071 + -0.091 + -0.071 + +0.051 + -0.000 + +0.089 = -0.023
dL/dw5 = +0.070 + +0.000 + -0.000 + -0.058 + -0.000 + +0.000 + +0.020 + +0.178 = +0.211
w1=+0.381, w2=+0.280, w3=+1.000, w4=+0.455, w5=+0.158

Iteration 8
Output: [+0.600, +0.500, +0.543, +0.641, +0.543, +0.575, +0.474, +0.375]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.405, +0.000, +0.173, +0.578, +0.173, +0.301, -0.104, -0.510]
Input : [+0.661, -0.661, +0.381, +0.661, -0.661, +0.280, -0.661, -0.661]
Gate  : [+0.613, +0.613, +0.455, +0.613, +0.613, +0.455, +0.613, +0.613]
Loss: 0.23302= 0.020 + 0.000 + 0.075 + 0.019 + 0.075 + 0.011 + 0.001 + 0.032
dL/dw1 = +0.061 + +0.000 + -0.085 + -0.097 + -0.085 + +0.034 + +0.004 + +0.096 = -0.072
dL/dw2 = +0.061 + +0.000 + -0.000 + -0.055 + -0.000 + +0.034 + +0.004 + +0.096 = +0.140
dL/dw4 = +0.066 + +0.000 + -0.072 + -0.094 + -0.072 + +0.049 + +0.000 + +0.082 = -0.040
dL/dw5 = +0.066 + +0.000 + -0.000 + -0.060 + -0.000 + +0.000 + +0.017 + +0.165 = +0.188
w1=+0.388, w2=+0.266, w3=+1.000, w4=+0.459, w5=+0.139

Iteration 9
Output: [+0.597, +0.500, +0.544, +0.639, +0.544, +0.574, +0.477, +0.382]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.391, +0.000, +0.178, +0.569, +0.178, +0.300, -0.091, -0.482]
Input : [+0.654, -0.654, +0.388, +0.654, -0.654, +0.266, -0.654, -0.654]
Gate  : [+0.598, +0.598, +0.459, +0.598, +0.598, +0.459, +0.598, +0.598]
Loss: 0.22701= 0.019 + 0.000 + 0.074 + 0.019 + 0.074 + 0.011 + 0.001 + 0.029
dL/dw1 = +0.058 + +0.000 + -0.086 + -0.098 + -0.086 + +0.034 + +0.003 + +0.087 = -0.087
dL/dw2 = +0.058 + +0.000 + -0.000 + -0.055 + -0.000 + +0.034 + +0.003 + +0.087 = +0.127
dL/dw4 = +0.063 + +0.000 + -0.072 + -0.096 + -0.072 + +0.049 + -0.000 + +0.077 = -0.052
dL/dw5 = +0.063 + +0.000 + -0.000 + -0.061 + -0.000 + +0.000 + +0.015 + +0.155 = +0.172
w1=+0.397, w2=+0.254, w3=+1.000, w4=+0.464, w5=+0.122

Iteration 10
Output: [+0.594, +0.500, +0.546, +0.638, +0.546, +0.575, +0.480, +0.387]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.381, +0.000, +0.184, +0.565, +0.184, +0.302, -0.079, -0.460]
Input : [+0.650, -0.650, +0.397, +0.650, -0.650, +0.254, -0.650, -0.650]
Gate  : [+0.586, +0.586, +0.464, +0.586, +0.586, +0.464, +0.586, +0.586]
Loss: 0.22150= 0.018 + 0.000 + 0.073 + 0.020 + 0.073 + 0.011 + 0.001 + 0.026
dL/dw1 = +0.055 + +0.000 + -0.086 + -0.098 + -0.086 + +0.035 + +0.002 + +0.080 = -0.098
dL/dw2 = +0.055 + +0.000 + -0.000 + -0.055 + -0.000 + +0.035 + +0.002 + +0.080 = +0.118
dL/dw4 = +0.061 + +0.000 + -0.073 + -0.098 + -0.073 + +0.049 + +0.000 + +0.074 = -0.061
dL/dw5 = +0.061 + +0.000 + -0.000 + -0.061 + -0.000 + +0.000 + +0.013 + +0.147 = +0.160
w1=+0.406, w2=+0.242, w3=+1.000, w4=+0.470, w5=+0.106

Iteration 11
Output: [+0.592, +0.500, +0.548, +0.637, +0.548, +0.576, +0.483, +0.391]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.373, +0.000, +0.191, +0.564, +0.191, +0.305, -0.069, -0.442]
Input : [+0.648, -0.648, +0.406, +0.648, -0.648, +0.242, -0.648, -0.648]
Gate  : [+0.576, +0.576, +0.470, +0.576, +0.576, +0.470, +0.576, +0.576]
Loss: 0.21627= 0.017 + 0.000 + 0.071 + 0.020 + 0.071 + 0.012 + 0.001 + 0.024
dL/dw1 = +0.053 + +0.000 + -0.086 + -0.098 + -0.086 + +0.036 + +0.002 + +0.074 = -0.106
dL/dw2 = +0.053 + +0.000 + -0.000 + -0.054 + -0.000 + +0.036 + +0.002 + +0.074 = +0.111
dL/dw4 = +0.060 + +0.000 + -0.075 + -0.099 + -0.075 + +0.049 + -0.000 + +0.070 = -0.069
dL/dw5 = +0.060 + +0.000 + -0.000 + -0.061 + -0.000 + +0.000 + +0.011 + +0.141 = +0.151
w1=+0.417, w2=+0.231, w3=+1.000, w4=+0.477, w5=+0.091

Iteration 12
Output: [+0.591, +0.500, +0.550, +0.638, +0.550, +0.577, +0.485, +0.395]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.368, +0.000, +0.199, +0.567, +0.199, +0.309, -0.059, -0.427]
Input : [+0.648, -0.648, +0.417, +0.648, -0.648, +0.231, -0.648, -0.648]
Gate  : [+0.568, +0.568, +0.477, +0.568, +0.568, +0.477, +0.568, +0.568]
Loss: 0.21120= 0.017 + 0.000 + 0.070 + 0.020 + 0.070 + 0.012 + 0.000 + 0.023
dL/dw1 = +0.052 + +0.000 + -0.087 + -0.097 + -0.087 + +0.037 + +0.001 + +0.069 = -0.112
dL/dw2 = +0.052 + +0.000 + -0.000 + -0.053 + -0.000 + +0.037 + +0.001 + +0.069 = +0.106
dL/dw4 = +0.059 + +0.000 + -0.076 + -0.099 + -0.076 + +0.050 + -0.000 + +0.068 = -0.074
dL/dw5 = +0.059 + +0.000 + -0.000 + -0.060 + -0.000 + +0.000 + +0.010 + +0.136 = +0.144
w1=+0.428, w2=+0.220, w3=+1.000, w4=+0.484, w5=+0.076

Iteration 13
Output: [+0.590, +0.500, +0.552, +0.639, +0.552, +0.578, +0.488, +0.398]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.363, +0.000, +0.207, +0.571, +0.207, +0.314, -0.050, -0.413]
Input : [+0.648, -0.648, +0.428, +0.648, -0.648, +0.220, -0.648, -0.648]
Gate  : [+0.561, +0.561, +0.484, +0.561, +0.561, +0.484, +0.561, +0.561]
Loss: 0.20622= 0.016 + 0.000 + 0.068 + 0.019 + 0.068 + 0.012 + 0.000 + 0.021
dL/dw1 = +0.050 + +0.000 + -0.087 + -0.096 + -0.087 + +0.038 + +0.001 + +0.065 = -0.116
dL/dw2 = +0.050 + +0.000 + -0.000 + -0.052 + -0.000 + +0.038 + +0.001 + +0.065 = +0.102
dL/dw4 = +0.058 + +0.000 + -0.077 + -0.099 + -0.077 + +0.050 + -0.000 + +0.066 = -0.078
dL/dw5 = +0.058 + +0.000 + -0.000 + -0.060 + -0.000 + +0.000 + +0.008 + +0.132 = +0.139
w1=+0.440, w2=+0.210, w3=+1.000, w4=+0.492, w5=+0.063

Iteration 14
Output: [+0.589, +0.500, +0.554, +0.640, +0.554, +0.579, +0.490, +0.401]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.360, +0.000, +0.216, +0.577, +0.216, +0.320, -0.041, -0.401]
Input : [+0.650, -0.650, +0.440, +0.650, -0.650, +0.210, -0.650, -0.650]
Gate  : [+0.555, +0.555, +0.492, +0.555, +0.555, +0.492, +0.555, +0.555]
Loss: 0.20131= 0.016 + 0.000 + 0.067 + 0.019 + 0.067 + 0.013 + 0.000 + 0.020
dL/dw1 = +0.049 + +0.000 + -0.087 + -0.095 + -0.087 + +0.039 + +0.001 + +0.061 = -0.119
dL/dw2 = +0.049 + +0.000 + -0.000 + -0.050 + -0.000 + +0.039 + +0.001 + +0.061 = +0.100
dL/dw4 = +0.058 + +0.000 + -0.078 + -0.099 + -0.078 + +0.051 + -0.000 + +0.064 = -0.081
dL/dw5 = +0.058 + +0.000 + -0.000 + -0.059 + -0.000 + +0.000 + +0.007 + +0.129 = +0.134
w1=+0.452, w2=+0.200, w3=+1.000, w4=+0.500, w5=+0.049

Iteration 49
Output: [+0.572, +0.500, +0.637, +0.701, +0.637, +0.608, +0.537, +0.464]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.291, +0.000, +0.561, +0.852, +0.561, +0.440, +0.149, -0.143]
Input : [+0.595, -0.595, +0.759, +0.595, -0.595, -0.164, -0.595, -0.595]
Gate  : [+0.490, +0.490, +0.739, +0.490, +0.490, +0.739, +0.490, +0.490]
Loss: 0.08241= 0.011 + 0.000 + 0.020 + 0.002 + 0.020 + 0.024 + 0.003 + 0.003
dL/dw1 = +0.035 + +0.000 + -0.070 + -0.037 + -0.070 + +0.080 + +0.009 + +0.009 = -0.043
dL/dw2 = +0.035 + +0.000 + -0.000 + -0.015 + -0.000 + +0.080 + +0.009 + +0.009 = +0.119
dL/dw4 = +0.043 + +0.000 + -0.072 + -0.041 + -0.072 + +0.064 + +0.000 + +0.021 = -0.055
dL/dw5 = +0.043 + +0.000 + -0.000 + -0.018 + -0.000 + +0.000 + -0.022 + +0.042 = +0.046
w1=+0.763, w2=-0.175, w3=+1.000, w4=+0.745, w5=-0.254

Iteration 99
Output: [+0.528, +0.500, +0.700, +0.723, +0.700, +0.543, +0.515, +0.487]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.111, +0.000, +0.848, +0.960, +0.848, +0.171, +0.059, -0.052]
Input : [+0.168, -0.168, +0.837, +0.168, -0.168, -0.669, -0.168, -0.168]
Gate  : [+0.661, +0.661, +1.013, +0.661, +0.661, +1.013, +0.661, +0.661]
Loss: 0.01076= 0.002 + 0.000 + 0.002 + 0.000 + 0.002 + 0.004 + 0.000 + 0.000
dL/dw1 = +0.018 + +0.000 + -0.031 + -0.013 + -0.031 + +0.043 + +0.005 + +0.004 = -0.005
dL/dw2 = +0.018 + +0.000 + -0.000 + -0.005 + -0.000 + +0.043 + +0.005 + +0.004 = +0.065
dL/dw4 = +0.005 + +0.000 + -0.026 + -0.008 + -0.026 + +0.007 + +0.000 + +0.002 = -0.046
dL/dw5 = +0.005 + +0.000 + -0.000 + -0.001 + -0.000 + +0.000 + -0.002 + +0.004 = +0.005
w1=+0.838, w2=-0.675, w3=+1.000, w4=+1.018, w5=-0.353

Iteration 149
Output: [+0.504, +0.500, +0.726, +0.730, +0.726, +0.506, +0.502, +0.498]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.017, +0.000, +0.975, +0.993, +0.975, +0.025, +0.008, -0.009]
Input : [+0.022, -0.022, +0.856, +0.022, -0.022, -0.834, -0.022, -0.022]
Gate  : [+0.780, +0.780, +1.140, +0.780, +0.780, +1.140, +0.780, +0.780]
Loss: 0.00026= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.003 + +0.000 + -0.006 + -0.003 + -0.006 + +0.007 + +0.001 + +0.001 = -0.002
dL/dw2 = +0.003 + +0.000 + -0.000 + -0.001 + -0.000 + +0.007 + +0.001 + +0.001 = +0.011
dL/dw4 = +0.000 + +0.000 + -0.004 + -0.001 + -0.004 + +0.000 + +0.000 + +0.000 = -0.009
dL/dw5 = +0.000 + +0.000 + -0.000 + -0.000 + -0.000 + +0.000 + -0.000 + +0.000 = +0.000
w1=+0.856, w2=-0.835, w3=+1.000, w4=+1.141, w5=-0.360

Iteration 199
Output: [+0.500, +0.500, +0.730, +0.731, +0.730, +0.501, +0.500, +0.500]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.002, +0.000, +0.997, +0.999, +0.997, +0.003, +0.001, -0.001]
Input : [+0.002, -0.002, +0.859, +0.002, -0.002, -0.857, -0.002, -0.002]
Gate  : [+0.800, +0.800, +1.160, +0.800, +0.800, +1.160, +0.800, +0.800]
Loss: 0.00000= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + +0.000 + -0.001 + -0.000 + -0.001 + +0.001 + +0.000 + +0.000 = -0.000
dL/dw2 = +0.000 + +0.000 + -0.000 + -0.000 + -0.000 + +0.001 + +0.000 + +0.000 = +0.001
dL/dw4 = +0.000 + +0.000 + -0.000 + -0.000 + -0.000 + +0.000 + +0.000 + +0.000 = -0.001
dL/dw5 = +0.000 + +0.000 + -0.000 + -0.000 + -0.000 + +0.000 + -0.000 + +0.000 = +0.000
w1=+0.860, w2=-0.857, w3=+1.000, w4=+1.160, w5=-0.360

Iteration 249
Output: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.000, +0.000, +1.000, +1.000, +1.000, +0.000, +0.000, -0.000]
Input : [+0.000, -0.000, +0.860, +0.000, -0.000, -0.860, -0.000, -0.000]
Gate  : [+0.803, +0.803, +1.162, +0.803, +0.803, +1.162, +0.803, +0.803]
Loss: 0.00000= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + +0.000 + -0.000 + -0.000 + -0.000 + +0.000 + +0.000 + +0.000 = -0.000
dL/dw2 = +0.000 + +0.000 + -0.000 + -0.000 + -0.000 + +0.000 + +0.000 + +0.000 = +0.000
dL/dw4 = +0.000 + +0.000 + -0.000 + -0.000 + -0.000 + +0.000 + +0.000 + +0.000 = -0.000
dL/dw5 = +0.000 + +0.000 + -0.000 + -0.000 + -0.000 + +0.000 + -0.000 + +0.000 = +0.000
w1=+0.860, w2=-0.860, w3=+1.000, w4=+1.162, w5=-0.360

Out[22]:
(0.8599643112688363,
 -0.859699702729852,
 1.0,
 1.162480041288255,
 -0.3597956247833731)

Discussion

We see that after adding input gate (assuming it is possible for the input gate to exhibit the same properties as the true input gate, manifested by using bilinear gate here), can reach the optimal (loss = 0.0) faster (after iteration 199) compared to the one without input gate (only after iteration 249), although there are more parameters to learn with the input gate (two more: $w_4$ and $w_5$) and that the initial loss is higher with input gate (due to the incorrect gate value initially).

Also we see that the gate learned is actually not the true gate that we want. This is because the input is already separable even without input gate.

Noisy embedding

In previous experiment, the input gate learned is not the true gate that we want, but that's because the input embedding is ideal, i.e., it allows separation even without input gate.

Now let's experiment with noisy embedding, in which the true function cannot be obtained without input gate.


In [23]:
import random
a_1 = 1.0 + 0.2*(random.random()-0.5)
a_2 = 1.0/a_1
b_1 = -1.0 + 0.2*(random.random()-0.5)
b_2 = 1.0/b_1
embedding['a'] = (a_1, a_2)
embedding['b'] = (b_1, b_2)
embedding['('] = (1, 0)
embedding[')'] = (0, 1)

from pprint import pprint
pprint(embedding)


{'(': (1, 0),
 ')': (0, 1),
 'a': (1.0367606488244225, 0.9645427815319715),
 'b': (-0.9331636393562619, -1.0716234086122824)}

Here we make the input embedding such that the other characters have noise which should be ignored.

Let's see how the two models perform in this case.


In [24]:
embedding['a'] = (a_1, a_2)
embedding['b'] = (b_1, b_2)
embedding['('] = (1, 0)
embedding[')'] = (0, 1)

experiment('ab(ab)bb', _w1=1.0, _w2=1.0, alpha=1e-1, max_iter=250, fixed_w3=True)


w1=+1.000, w2=+1.000, w3=+1.000
Iteration 0
Output: [+0.881, +0.499, +0.730, +0.952, +0.730, +0.880, +0.497, +0.117]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+2.001, -0.003, +0.997, +2.998, +0.993, +1.993, -0.012, -2.017]
Loss: 1.57825= 0.434 + 0.000 + 0.000 + 0.273 + 0.000 + 0.431 + 0.000 + 0.440
dL/dw1 = +0.395 + -0.000 + -0.001 + +0.474 + -0.002 + +0.459 + -0.001 + +0.252 = +1.576
dL/dw2 = +0.367 + +0.000 + +0.000 + +0.190 + +0.000 + +0.299 + +0.001 + +0.519 = +1.376
w1=+0.842, w2=+0.862, w3=+1.000

Iteration 1
Output: [+0.846, +0.499, +0.698, +0.927, +0.697, +0.845, +0.496, +0.151]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+1.705, -0.005, +0.837, +2.542, +0.832, +1.695, -0.016, -1.726]
Loss: 1.16557= 0.326 + 0.000 + 0.003 + 0.177 + 0.003 + 0.323 + 0.000 + 0.334
dL/dw1 = +0.359 + -0.000 + -0.037 + +0.420 + -0.041 + +0.416 + -0.001 + +0.230 = +1.346
dL/dw2 = +0.334 + +0.000 + +0.004 + +0.168 + +0.007 + +0.271 + +0.001 + +0.474 = +1.259
w1=+0.708, w2=+0.736, w3=+1.000

Iteration 2
Output: [+0.809, +0.499, +0.669, +0.895, +0.667, +0.807, +0.496, +0.188]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+1.444, -0.006, +0.702, +2.146, +0.697, +1.433, -0.017, -1.466]
Loss: 0.85001= 0.241 + 0.000 + 0.009 + 0.106 + 0.009 + 0.237 + 0.000 + 0.248
dL/dw1 = +0.320 + -0.000 + -0.069 + +0.352 + -0.077 + +0.371 + -0.001 + +0.206 = +1.102
dL/dw2 = +0.298 + +0.000 + +0.007 + +0.141 + +0.014 + +0.242 + +0.001 + +0.424 = +1.126
w1=+0.598, w2=+0.624, w3=+1.000

Iteration 3
Output: [+0.772, +0.499, +0.644, +0.860, +0.643, +0.771, +0.496, +0.224]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+1.221, -0.005, +0.593, +1.814, +0.588, +1.212, -0.015, -1.241]
Loss: 0.62204= 0.176 + 0.000 + 0.017 + 0.057 + 0.018 + 0.173 + 0.000 + 0.181
dL/dw1 = +0.282 + -0.000 + -0.096 + +0.276 + -0.106 + +0.327 + -0.001 + +0.182 = +0.863
dL/dw2 = +0.263 + +0.000 + +0.009 + +0.110 + +0.019 + +0.213 + +0.001 + +0.374 = +0.989
w1=+0.511, w2=+0.525, w3=+1.000

Iteration 4
Output: [+0.738, +0.499, +0.624, +0.824, +0.624, +0.737, +0.498, +0.259]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+1.036, -0.003, +0.508, +1.544, +0.505, +1.030, -0.010, -1.050]
Loss: 0.46547= 0.129 + 0.000 + 0.025 + 0.027 + 0.026 + 0.127 + 0.000 + 0.132
dL/dw1 = +0.247 + -0.000 + -0.118 + +0.199 + -0.130 + +0.286 + -0.001 + +0.159 = +0.642
dL/dw2 = +0.230 + +0.000 + +0.011 + +0.080 + +0.023 + +0.186 + +0.001 + +0.327 = +0.858
w1=+0.447, w2=+0.439, w3=+1.000

Iteration 5
Output: [+0.708, +0.500, +0.610, +0.791, +0.610, +0.708, +0.499, +0.291]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.887, -0.001, +0.446, +1.333, +0.446, +0.885, -0.003, -0.891]
Loss: 0.36166= 0.095 + 0.000 + 0.033 + 0.010 + 0.033 + 0.095 + 0.000 + 0.096
dL/dw1 = +0.216 + -0.000 + -0.134 + +0.129 + -0.147 + +0.251 + -0.000 + +0.138 = +0.453
dL/dw2 = +0.201 + +0.000 + +0.013 + +0.052 + +0.026 + +0.163 + +0.000 + +0.284 = +0.739
w1=+0.402, w2=+0.365, w3=+1.000

Iteration 6
Output: [+0.683, +0.501, +0.600, +0.764, +0.600, +0.684, +0.501, +0.319]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.769, +0.003, +0.404, +1.173, +0.407, +0.772, +0.006, -0.761]
Loss: 0.29370= 0.072 + 0.000 + 0.038 + 0.003 + 0.037 + 0.073 + 0.000 + 0.071
dL/dw1 = +0.190 + +0.000 + -0.145 + +0.070 + -0.158 + +0.222 + +0.000 + +0.120 = +0.299
dL/dw2 = +0.177 + -0.000 + +0.014 + +0.028 + +0.028 + +0.145 + -0.000 + +0.246 = +0.637
w1=+0.372, w2=+0.302, w3=+1.000

Iteration 7
Output: [+0.663, +0.502, +0.593, +0.742, +0.595, +0.665, +0.504, +0.342]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.676, +0.006, +0.378, +1.054, +0.384, +0.686, +0.016, -0.654]
Loss: 0.24857= 0.056 + 0.000 + 0.041 + 0.000 + 0.040 + 0.058 + 0.000 + 0.053
dL/dw1 = +0.169 + +0.000 + -0.152 + +0.023 + -0.164 + +0.199 + +0.001 + +0.104 = +0.180
dL/dw2 = +0.157 + -0.000 + +0.015 + +0.009 + +0.029 + +0.130 + -0.001 + +0.214 = +0.553
w1=+0.354, w2=+0.246, w3=+1.000

Iteration 8
Output: [+0.647, +0.503, +0.590, +0.725, +0.593, +0.650, +0.507, +0.362]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.604, +0.010, +0.364, +0.968, +0.374, +0.621, +0.027, -0.567]
Loss: 0.21745= 0.045 + 0.000 + 0.043 + 0.000 + 0.042 + 0.047 + 0.000 + 0.040
dL/dw1 = +0.152 + +0.000 + -0.156 + -0.013 + -0.167 + +0.182 + +0.002 + +0.091 = +0.091
dL/dw2 = +0.141 + -0.000 + +0.015 + -0.005 + +0.030 + +0.118 + -0.002 + +0.188 = +0.484
w1=+0.345, w2=+0.198, w3=+1.000

Iteration 9
Output: [+0.634, +0.504, +0.589, +0.713, +0.592, +0.639, +0.509, +0.379]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.548, +0.015, +0.359, +0.908, +0.374, +0.572, +0.038, -0.496]
Loss: 0.19479= 0.037 + 0.000 + 0.044 + 0.001 + 0.042 + 0.040 + 0.000 + 0.030
dL/dw1 = +0.139 + +0.000 + -0.157 + -0.040 + -0.167 + +0.168 + +0.003 + +0.080 = +0.026
dL/dw2 = +0.129 + -0.000 + +0.015 + -0.016 + +0.030 + +0.109 + -0.003 + +0.165 = +0.429
w1=+0.342, w2=+0.155, w3=+1.000

Iteration 10
Output: [+0.623, +0.505, +0.589, +0.704, +0.594, +0.631, +0.512, +0.393]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.504, +0.019, +0.361, +0.865, +0.380, +0.535, +0.050, -0.436]
Loss: 0.17732= 0.031 + 0.000 + 0.044 + 0.002 + 0.041 + 0.035 + 0.000 + 0.024
dL/dw1 = +0.128 + +0.000 + -0.156 + -0.058 + -0.166 + +0.158 + +0.003 + +0.071 = -0.020
dL/dw2 = +0.119 + -0.001 + +0.015 + -0.023 + +0.029 + +0.103 + -0.004 + +0.146 = +0.384
w1=+0.344, w2=+0.116, w3=+1.000

Iteration 11
Output: [+0.615, +0.506, +0.591, +0.698, +0.596, +0.624, +0.515, +0.405]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.469, +0.023, +0.367, +0.837, +0.391, +0.507, +0.061, -0.385]
Loss: 0.16317= 0.027 + 0.000 + 0.043 + 0.003 + 0.040 + 0.032 + 0.000 + 0.018
dL/dw1 = +0.119 + +0.001 + -0.155 + -0.071 + -0.163 + +0.150 + +0.004 + +0.063 = -0.052
dL/dw2 = +0.111 + -0.001 + +0.015 + -0.029 + +0.029 + +0.098 + -0.004 + +0.129 = +0.348
w1=+0.349, w2=+0.082, w3=+1.000

Iteration 12
Output: [+0.609, +0.507, +0.593, +0.694, +0.600, +0.619, +0.518, +0.416]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.441, +0.027, +0.377, +0.818, +0.404, +0.486, +0.072, -0.341]
Loss: 0.15126= 0.024 + 0.000 + 0.042 + 0.003 + 0.038 + 0.029 + 0.001 + 0.014
dL/dw1 = +0.112 + +0.001 + -0.152 + -0.080 + -0.159 + +0.144 + +0.005 + +0.056 = -0.073
dL/dw2 = +0.105 + -0.001 + +0.015 + -0.032 + +0.028 + +0.094 + -0.005 + +0.115 = +0.318
w1=+0.357, w2=+0.050, w3=+1.000

Iteration 13
Output: [+0.603, +0.508, +0.596, +0.691, +0.603, +0.615, +0.521, +0.425]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.418, +0.032, +0.388, +0.806, +0.420, +0.470, +0.084, -0.303]
Loss: 0.14097= 0.022 + 0.000 + 0.040 + 0.004 + 0.036 + 0.027 + 0.001 + 0.011
dL/dw1 = +0.107 + +0.001 + -0.149 + -0.085 + -0.154 + +0.139 + +0.006 + +0.050 = -0.086
dL/dw2 = +0.099 + -0.001 + +0.014 + -0.034 + +0.027 + +0.091 + -0.006 + +0.102 = +0.293
w1=+0.365, w2=+0.021, w3=+1.000

Iteration 14
Output: [+0.598, +0.509, +0.599, +0.690, +0.607, +0.612, +0.524, +0.433]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.399, +0.036, +0.401, +0.800, +0.437, +0.457, +0.094, -0.269]
Loss: 0.13193= 0.020 + 0.000 + 0.038 + 0.004 + 0.034 + 0.026 + 0.001 + 0.009
dL/dw1 = +0.102 + +0.001 + -0.146 + -0.088 + -0.149 + +0.136 + +0.006 + +0.044 = -0.094
dL/dw2 = +0.095 + -0.001 + +0.014 + -0.035 + +0.026 + +0.088 + -0.007 + +0.091 = +0.271
w1=+0.375, w2=-0.007, w3=+1.000

Iteration 49
Output: [+0.552, +0.526, +0.663, +0.707, +0.685, +0.593, +0.567, +0.542]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.207, +0.102, +0.675, +0.882, +0.777, +0.376, +0.272, +0.167]
Loss: 0.05422= 0.005 + 0.001 + 0.011 + 0.001 + 0.005 + 0.018 + 0.009 + 0.003
dL/dw1 = +0.053 + +0.003 + -0.075 + -0.051 + -0.055 + +0.112 + +0.018 + -0.027 = -0.022
dL/dw2 = +0.050 + -0.003 + +0.007 + -0.020 + +0.010 + +0.073 + -0.019 + -0.056 = +0.041
w1=+0.575, w2=-0.405, w3=+1.000

Iteration 99
Output: [+0.544, +0.529, +0.675, +0.712, +0.699, +0.591, +0.576, +0.561]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.176, +0.115, +0.730, +0.906, +0.845, +0.367, +0.305, +0.244]
Loss: 0.05201= 0.004 + 0.002 + 0.007 + 0.001 + 0.002 + 0.017 + 0.012 + 0.007
dL/dw1 = +0.046 + +0.003 + -0.062 + -0.040 + -0.038 + +0.109 + +0.021 + -0.040 = -0.002
dL/dw2 = +0.042 + -0.003 + +0.006 + -0.016 + +0.007 + +0.071 + -0.022 + -0.082 = +0.003
w1=+0.615, w2=-0.478, w3=+1.000

Iteration 149
Output: [+0.543, +0.529, +0.676, +0.713, +0.701, +0.590, +0.576, +0.562]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.174, +0.116, +0.734, +0.908, +0.850, +0.366, +0.308, +0.250]
Loss: 0.05199= 0.004 + 0.002 + 0.007 + 0.001 + 0.002 + 0.017 + 0.012 + 0.008
dL/dw1 = +0.045 + +0.003 + -0.061 + -0.039 + -0.037 + +0.109 + +0.021 + -0.041 = -0.000
dL/dw2 = +0.042 + -0.003 + +0.006 + -0.016 + +0.007 + +0.071 + -0.022 + -0.084 = +0.000
w1=+0.618, w2=-0.484, w3=+1.000

Iteration 199
Output: [+0.543, +0.529, +0.676, +0.713, +0.701, +0.590, +0.576, +0.562]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.174, +0.116, +0.735, +0.908, +0.851, +0.366, +0.308, +0.250]
Loss: 0.05199= 0.004 + 0.002 + 0.007 + 0.001 + 0.002 + 0.017 + 0.012 + 0.008
dL/dw1 = +0.045 + +0.003 + -0.061 + -0.039 + -0.037 + +0.109 + +0.021 + -0.041 = -0.000
dL/dw2 = +0.042 + -0.003 + +0.006 + -0.016 + +0.007 + +0.071 + -0.022 + -0.085 = +0.000
w1=+0.619, w2=-0.485, w3=+1.000

Iteration 249
Output: [+0.543, +0.529, +0.676, +0.713, +0.701, +0.590, +0.576, +0.562]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.174, +0.116, +0.735, +0.908, +0.851, +0.366, +0.308, +0.250]
Loss: 0.05199= 0.004 + 0.002 + 0.007 + 0.001 + 0.002 + 0.017 + 0.012 + 0.008
dL/dw1 = +0.045 + +0.003 + -0.061 + -0.039 + -0.037 + +0.109 + +0.021 + -0.041 = -0.000
dL/dw2 = +0.042 + -0.003 + +0.006 + -0.016 + +0.007 + +0.071 + -0.022 + -0.085 = +0.000
w1=+0.619, w2=-0.485, w3=+1.000

Out[24]:
(0.6186425813596521, -0.48489064055179726, 1.0)

In [25]:
embedding['a'] = (a_1, a_2)
embedding['b'] = (b_1, b_2)
embedding['('] = (1, 0)
embedding[')'] = (0, 1)

experiment_gated('ab(ab)bb', _w1=1.0, _w2=1.0, _w4=1.0, _w5=1.0, alpha=1e-1, max_iter=250, fixed_w3=True)


w1=+1.000, w2=+1.000, w3=+1.000, w4=+1.000, w5=+1.000
Iteration 0
Output: [+0.982, +0.498, +0.730, +0.993, +0.728, +0.879, +0.117, +0.002]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+4.003, -0.007, +0.993, +4.996, +0.986, +1.986, -2.024, -6.033]
Input : [+2.001, -2.005, +1.000, +2.001, -2.005, +1.000, -2.005, -2.005]
Gate  : [+2.000, +2.000, +1.000, +2.000, +2.000, +1.000, +2.000, +2.000]
Loss: 5.29140= 1.326 + 0.000 + 0.000 + 0.768 + 0.000 + 0.428 + 0.443 + 2.326
dL/dw1 = +1.000 + -0.000 + -0.002 + +0.860 + -0.004 + +0.537 + +0.173 + +1.154 = +3.717
dL/dw2 = +0.930 + +0.000 + +0.000 + +0.450 + +0.001 + +0.217 + +0.602 + +1.849 = +4.049
dL/dw4 = +0.965 + +0.000 + -0.001 + +0.786 + -0.003 + +0.756 + +0.005 + +1.003 = +3.511
dL/dw5 = +0.965 + +0.000 + +0.000 + +0.524 + +0.000 + -0.003 + +0.771 + +1.999 = +4.256
w1=+0.628, w2=+0.595, w3=+1.000, w4=+0.649, w5=+0.574

Iteration 1
Output: [+0.817, +0.500, +0.601, +0.871, +0.601, +0.689, +0.332, +0.100]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+1.499, +0.002, +0.409, +1.908, +0.411, +0.797, -0.700, -2.198]
Input : [+1.225, -1.224, +0.628, +1.225, -1.224, +0.595, -1.224, -1.224]
Gate  : [+1.223, +1.223, +0.649, +1.223, +1.223, +0.649, +1.223, +1.223]
Loss: 1.04987= 0.258 + 0.000 + 0.037 + 0.069 + 0.037 + 0.077 + 0.060 + 0.511
dL/dw1 = +0.403 + +0.000 + -0.101 + +0.286 + -0.117 + +0.171 + +0.040 + +0.552 = +1.234
dL/dw2 = +0.375 + -0.000 + +0.017 + +0.147 + +0.034 + +0.073 + +0.155 + +0.894 = +1.695
dL/dw4 = +0.389 + +0.000 + -0.082 + +0.259 + -0.082 + +0.232 + -0.000 + +0.489 = +1.205
dL/dw5 = +0.389 + +0.000 + -0.000 + +0.171 + -0.000 + +0.001 + +0.205 + +0.978 = +1.744
w1=+0.505, w2=+0.426, w3=+1.000, w4=+0.528, w5=+0.400

Iteration 2
Output: [+0.704, +0.502, +0.568, +0.758, +0.569, +0.623, +0.412, +0.228]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.867, +0.006, +0.273, +1.140, +0.279, +0.504, -0.357, -1.218]
Input : [+0.934, -0.927, +0.505, +0.934, -0.927, +0.426, -0.927, -0.927]
Gate  : [+0.928, +0.928, +0.528, +0.928, +0.928, +0.528, +0.928, +0.928]
Loss: 0.42849= 0.091 + 0.000 + 0.057 + 0.002 + 0.056 + 0.031 + 0.016 + 0.175
dL/dw1 = +0.197 + +0.000 + -0.102 + +0.042 + -0.117 + +0.089 + +0.013 + +0.275 = +0.397
dL/dw2 = +0.183 + -0.000 + +0.016 + +0.021 + +0.032 + +0.041 + +0.059 + +0.451 = +0.803
dL/dw4 = +0.191 + +0.000 + -0.084 + +0.039 + -0.084 + +0.117 + -0.001 + +0.247 = +0.424
dL/dw5 = +0.191 + +0.000 + -0.001 + +0.025 + -0.002 + +0.002 + +0.081 + +0.500 = +0.795
w1=+0.465, w2=+0.345, w3=+1.000, w4=+0.486, w5=+0.321

Iteration 3
Output: [+0.659, +0.502, +0.559, +0.709, +0.561, +0.602, +0.441, +0.292]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.658, +0.009, +0.235, +0.893, +0.244, +0.412, -0.237, -0.885]
Input : [+0.815, -0.804, +0.465, +0.815, -0.804, +0.345, -0.804, -0.804]
Gate  : [+0.807, +0.807, +0.486, +0.807, +0.807, +0.486, +0.807, +0.807]
Loss: 0.30269= 0.053 + 0.000 + 0.064 + 0.001 + 0.062 + 0.021 + 0.007 + 0.095
dL/dw1 = +0.133 + +0.000 + -0.098 + -0.030 + -0.111 + +0.066 + +0.006 + +0.177 = +0.142
dL/dw2 = +0.123 + -0.000 + +0.015 + -0.015 + +0.029 + +0.032 + +0.032 + +0.294 = +0.511
dL/dw4 = +0.129 + +0.000 + -0.082 + -0.028 + -0.083 + +0.085 + -0.002 + +0.161 = +0.180
dL/dw5 = +0.129 + +0.000 + -0.002 + -0.018 + -0.004 + +0.002 + +0.046 + +0.330 = +0.484
w1=+0.451, w2=+0.294, w3=+1.000, w4=+0.468, w5=+0.272

Iteration 4
Output: [+0.636, +0.503, +0.555, +0.685, +0.558, +0.592, +0.457, +0.328]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.556, +0.011, +0.222, +0.778, +0.234, +0.371, -0.174, -0.718]
Input : [+0.751, -0.736, +0.451, +0.751, -0.736, +0.294, -0.736, -0.736]
Gate  : [+0.740, +0.740, +0.468, +0.740, +0.740, +0.468, +0.740, +0.740]
Loss: 0.25675= 0.038 + 0.000 + 0.066 + 0.005 + 0.064 + 0.017 + 0.004 + 0.063
dL/dw1 = +0.104 + +0.000 + -0.096 + -0.060 + -0.107 + +0.057 + +0.003 + +0.131 = +0.032
dL/dw2 = +0.097 + -0.000 + +0.014 + -0.029 + +0.027 + +0.028 + +0.021 + +0.220 = +0.378
dL/dw4 = +0.102 + +0.000 + -0.082 + -0.056 + -0.083 + +0.071 + -0.002 + +0.120 = +0.070
dL/dw5 = +0.102 + +0.000 + -0.003 + -0.035 + -0.005 + +0.003 + +0.031 + +0.248 = +0.341
w1=+0.448, w2=+0.256, w3=+1.000, w4=+0.461, w5=+0.238

Iteration 5
Output: [+0.622, +0.503, +0.555, +0.672, +0.558, +0.587, +0.467, +0.350]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.497, +0.013, +0.220, +0.717, +0.233, +0.351, -0.133, -0.617]
Input : [+0.712, -0.693, +0.448, +0.712, -0.693, +0.256, -0.693, -0.693]
Gate  : [+0.699, +0.699, +0.461, +0.699, +0.699, +0.461, +0.699, +0.699]
Loss: 0.23339= 0.031 + 0.000 + 0.066 + 0.008 + 0.064 + 0.015 + 0.002 + 0.047
dL/dw1 = +0.088 + +0.000 + -0.094 + -0.074 + -0.105 + +0.053 + +0.002 + +0.105 = -0.026
dL/dw2 = +0.082 + -0.000 + +0.013 + -0.035 + +0.026 + +0.027 + +0.015 + +0.178 = +0.305
dL/dw4 = +0.087 + +0.000 + -0.082 + -0.070 + -0.084 + +0.064 + -0.002 + +0.096 = +0.010
dL/dw5 = +0.087 + +0.000 + -0.003 + -0.043 + -0.007 + +0.003 + +0.022 + +0.202 = +0.260
w1=+0.450, w2=+0.226, w3=+1.000, w4=+0.460, w5=+0.212

Iteration 6
Output: [+0.613, +0.504, +0.555, +0.664, +0.559, +0.584, +0.474, +0.366]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.460, +0.015, +0.222, +0.682, +0.237, +0.341, -0.104, -0.549]
Input : [+0.685, -0.662, +0.450, +0.685, -0.662, +0.226, -0.662, -0.662]
Gate  : [+0.672, +0.672, +0.460, +0.672, +0.672, +0.460, +0.672, +0.672]
Loss: 0.21859= 0.026 + 0.000 + 0.066 + 0.010 + 0.063 + 0.014 + 0.001 + 0.037
dL/dw1 = +0.079 + +0.000 + -0.093 + -0.082 + -0.103 + +0.051 + +0.001 + +0.088 = -0.060
dL/dw2 = +0.073 + -0.000 + +0.013 + -0.038 + +0.025 + +0.027 + +0.010 + +0.151 = +0.260
dL/dw4 = +0.077 + +0.000 + -0.083 + -0.077 + -0.085 + +0.061 + -0.002 + +0.081 = -0.028
dL/dw5 = +0.077 + +0.000 + -0.004 + -0.047 + -0.008 + +0.004 + +0.016 + +0.171 = +0.210
w1=+0.456, w2=+0.200, w3=+1.000, w4=+0.463, w5=+0.191

Iteration 7
Output: [+0.607, +0.504, +0.557, +0.660, +0.561, +0.584, +0.480, +0.378]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.435, +0.017, +0.228, +0.663, +0.245, +0.337, -0.081, -0.500]
Input : [+0.666, -0.640, +0.456, +0.666, -0.640, +0.200, -0.640, -0.640]
Gate  : [+0.654, +0.654, +0.463, +0.654, +0.654, +0.463, +0.654, +0.654]
Loss: 0.20766= 0.024 + 0.000 + 0.065 + 0.012 + 0.062 + 0.014 + 0.001 + 0.031
dL/dw1 = +0.073 + +0.000 + -0.092 + -0.086 + -0.102 + +0.050 + +0.000 + +0.076 = -0.081
dL/dw2 = +0.068 + -0.000 + +0.012 + -0.040 + +0.024 + +0.027 + +0.008 + +0.132 = +0.230
dL/dw4 = +0.071 + +0.000 + -0.084 + -0.082 + -0.086 + +0.059 + -0.001 + +0.070 = -0.053
dL/dw5 = +0.071 + +0.000 + -0.005 + -0.049 + -0.009 + +0.004 + +0.012 + +0.150 = +0.176
w1=+0.464, w2=+0.177, w3=+1.000, w4=+0.468, w5=+0.174

Iteration 8
Output: [+0.603, +0.505, +0.559, +0.658, +0.563, +0.584, +0.484, +0.387]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.418, +0.019, +0.236, +0.654, +0.255, +0.338, -0.062, -0.462]
Input : [+0.652, -0.623, +0.464, +0.652, -0.623, +0.177, -0.623, -0.623]
Gate  : [+0.642, +0.642, +0.468, +0.642, +0.642, +0.468, +0.642, +0.642]
Loss: 0.19871= 0.022 + 0.000 + 0.063 + 0.012 + 0.060 + 0.014 + 0.000 + 0.026
dL/dw1 = +0.069 + +0.000 + -0.092 + -0.088 + -0.101 + +0.050 + -0.000 + +0.068 = -0.094
dL/dw2 = +0.064 + -0.000 + +0.012 + -0.040 + +0.023 + +0.028 + +0.006 + +0.118 = +0.210
dL/dw4 = +0.067 + +0.000 + -0.085 + -0.084 + -0.088 + +0.058 + -0.001 + +0.062 = -0.070
dL/dw5 = +0.067 + +0.000 + -0.005 + -0.050 + -0.010 + +0.005 + +0.009 + +0.135 = +0.151
w1=+0.474, w2=+0.156, w3=+1.000, w4=+0.475, w5=+0.158

Iteration 9
Output: [+0.600, +0.505, +0.561, +0.657, +0.566, +0.584, +0.489, +0.394]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.406, +0.021, +0.246, +0.652, +0.266, +0.340, -0.046, -0.432]
Input : [+0.642, -0.609, +0.474, +0.642, -0.609, +0.156, -0.609, -0.609]
Gate  : [+0.633, +0.633, +0.475, +0.633, +0.633, +0.475, +0.633, +0.633]
Loss: 0.19087= 0.021 + 0.000 + 0.062 + 0.013 + 0.058 + 0.014 + 0.000 + 0.023
dL/dw1 = +0.066 + +0.000 + -0.092 + -0.088 + -0.100 + +0.051 + -0.000 + +0.061 = -0.102
dL/dw2 = +0.061 + -0.000 + +0.012 + -0.040 + +0.022 + +0.029 + +0.004 + +0.108 = +0.195
dL/dw4 = +0.064 + +0.000 + -0.086 + -0.084 + -0.089 + +0.058 + -0.001 + +0.056 = -0.082
dL/dw5 = +0.064 + +0.000 + -0.006 + -0.050 + -0.011 + +0.005 + +0.006 + +0.123 = +0.133
w1=+0.484, w2=+0.136, w3=+1.000, w4=+0.483, w5=+0.145

Iteration 10
Output: [+0.598, +0.506, +0.564, +0.658, +0.569, +0.585, +0.492, +0.400]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.398, +0.022, +0.256, +0.654, +0.278, +0.344, -0.031, -0.407]
Input : [+0.633, -0.598, +0.484, +0.633, -0.598, +0.136, -0.598, -0.598]
Gate  : [+0.628, +0.628, +0.483, +0.628, +0.628, +0.483, +0.628, +0.628]
Loss: 0.18371= 0.020 + 0.000 + 0.060 + 0.012 + 0.056 + 0.015 + 0.000 + 0.021
dL/dw1 = +0.064 + +0.000 + -0.092 + -0.088 + -0.099 + +0.052 + -0.000 + +0.056 = -0.106
dL/dw2 = +0.059 + -0.000 + +0.011 + -0.039 + +0.022 + +0.030 + +0.003 + +0.100 = +0.185
dL/dw4 = +0.062 + +0.000 + -0.087 + -0.084 + -0.090 + +0.059 + -0.001 + +0.051 = -0.090
dL/dw5 = +0.062 + +0.000 + -0.006 + -0.049 + -0.012 + +0.006 + +0.004 + +0.113 = +0.119
w1=+0.495, w2=+0.118, w3=+1.000, w4=+0.492, w5=+0.133

Iteration 11
Output: [+0.597, +0.506, +0.566, +0.659, +0.572, +0.587, +0.496, +0.405]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.392, +0.024, +0.268, +0.659, +0.292, +0.350, -0.018, -0.386]
Input : [+0.626, -0.588, +0.495, +0.626, -0.588, +0.118, -0.588, -0.588]
Gate  : [+0.625, +0.625, +0.492, +0.625, +0.625, +0.492, +0.625, +0.625]
Loss: 0.17703= 0.019 + 0.000 + 0.058 + 0.012 + 0.054 + 0.015 + 0.000 + 0.018
dL/dw1 = +0.063 + +0.000 + -0.092 + -0.087 + -0.099 + +0.054 + -0.000 + +0.052 = -0.108
dL/dw2 = +0.058 + -0.000 + +0.011 + -0.039 + +0.021 + +0.031 + +0.001 + +0.094 = +0.178
dL/dw4 = +0.061 + +0.000 + -0.088 + -0.083 + -0.091 + +0.060 + -0.000 + +0.046 = -0.096
dL/dw5 = +0.061 + +0.000 + -0.006 + -0.048 + -0.012 + +0.007 + +0.002 + +0.105 = +0.108
w1=+0.505, w2=+0.100, w3=+1.000, w4=+0.502, w5=+0.122

Iteration 12
Output: [+0.596, +0.506, +0.569, +0.661, +0.576, +0.588, +0.499, +0.409]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.387, +0.026, +0.280, +0.667, +0.306, +0.356, -0.006, -0.367]
Input : [+0.621, -0.579, +0.505, +0.621, -0.579, +0.100, -0.579, -0.579]
Gate  : [+0.624, +0.624, +0.502, +0.624, +0.624, +0.502, +0.624, +0.624]
Loss: 0.17071= 0.019 + 0.000 + 0.056 + 0.011 + 0.052 + 0.016 + 0.000 + 0.017
dL/dw1 = +0.062 + +0.000 + -0.092 + -0.085 + -0.098 + +0.056 + -0.000 + +0.048 = -0.109
dL/dw2 = +0.058 + -0.000 + +0.011 + -0.038 + +0.021 + +0.032 + +0.000 + +0.088 = +0.172
dL/dw4 = +0.059 + +0.000 + -0.088 + -0.082 + -0.091 + +0.061 + -0.000 + +0.043 = -0.099
dL/dw5 = +0.059 + +0.000 + -0.007 + -0.047 + -0.013 + +0.007 + +0.001 + +0.097 = +0.099
w1=+0.516, w2=+0.083, w3=+1.000, w4=+0.512, w5=+0.113

Iteration 13
Output: [+0.595, +0.507, +0.572, +0.663, +0.579, +0.590, +0.502, +0.413]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.384, +0.028, +0.292, +0.676, +0.320, +0.362, +0.006, -0.350]
Input : [+0.615, -0.571, +0.516, +0.615, -0.571, +0.083, -0.571, -0.571]
Gate  : [+0.624, +0.624, +0.512, +0.624, +0.624, +0.512, +0.624, +0.624]
Loss: 0.16468= 0.018 + 0.000 + 0.054 + 0.011 + 0.050 + 0.016 + 0.000 + 0.015
dL/dw1 = +0.061 + +0.000 + -0.091 + -0.083 + -0.097 + +0.057 + +0.000 + +0.045 = -0.107
dL/dw2 = +0.057 + -0.000 + +0.011 + -0.037 + +0.020 + +0.034 + -0.000 + +0.083 = +0.168
dL/dw4 = +0.058 + +0.000 + -0.089 + -0.080 + -0.092 + +0.062 + +0.000 + +0.039 = -0.101
dL/dw5 = +0.058 + +0.000 + -0.007 + -0.045 + -0.014 + +0.008 + -0.001 + +0.091 = +0.091
w1=+0.527, w2=+0.066, w3=+1.000, w4=+0.522, w5=+0.103

Iteration 14
Output: [+0.594, +0.507, +0.576, +0.665, +0.583, +0.591, +0.504, +0.417]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.381, +0.030, +0.305, +0.686, +0.334, +0.369, +0.017, -0.335]
Input : [+0.610, -0.563, +0.527, +0.610, -0.563, +0.066, -0.563, -0.563]
Gate  : [+0.625, +0.625, +0.522, +0.625, +0.625, +0.522, +0.625, +0.625]
Loss: 0.15892= 0.018 + 0.000 + 0.052 + 0.010 + 0.048 + 0.017 + 0.000 + 0.014
dL/dw1 = +0.061 + +0.000 + -0.091 + -0.081 + -0.097 + +0.059 + +0.000 + +0.043 = -0.105
dL/dw2 = +0.057 + -0.000 + +0.010 + -0.035 + +0.020 + +0.035 + -0.001 + +0.079 = +0.164
dL/dw4 = +0.058 + +0.000 + -0.089 + -0.078 + -0.092 + +0.063 + +0.001 + +0.036 = -0.102
dL/dw5 = +0.058 + +0.000 + -0.007 + -0.043 + -0.014 + +0.009 + -0.002 + +0.085 = +0.085
w1=+0.538, w2=+0.050, w3=+1.000, w4=+0.532, w5=+0.095

Iteration 49
Output: [+0.565, +0.522, +0.652, +0.708, +0.671, +0.601, +0.559, +0.516]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.260, +0.087, +0.628, +0.887, +0.715, +0.408, +0.236, +0.063]
Input : [+0.336, -0.224, +0.687, +0.336, -0.224, -0.389, -0.224, -0.224]
Gate  : [+0.772, +0.772, +0.787, +0.772, +0.772, +0.787, +0.772, +0.772]
Loss: 0.06145= 0.008 + 0.001 + 0.014 + 0.001 + 0.008 + 0.021 + 0.007 + 0.000
dL/dw1 = +0.052 + +0.002 + -0.069 + -0.038 + -0.056 + +0.095 + +0.013 + -0.008 = -0.009
dL/dw2 = +0.048 + -0.002 + +0.007 + -0.015 + +0.010 + +0.063 + -0.012 + -0.016 = +0.082
dL/dw4 = +0.022 + +0.002 + -0.063 + -0.026 + -0.054 + +0.053 + +0.018 + +0.001 = -0.048
dL/dw5 = +0.022 + +0.002 + -0.009 + -0.010 + -0.013 + +0.023 + +0.000 + -0.003 = +0.011
w1=+0.688, w2=-0.398, w3=+1.000, w4=+0.792, w5=-0.016

Iteration 99
Output: [+0.536, +0.526, +0.678, +0.708, +0.700, +0.583, +0.573, +0.563]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.144, +0.103, +0.744, +0.888, +0.847, +0.333, +0.293, +0.252]
Input : [+0.188, -0.053, +0.712, +0.188, -0.053, -0.571, -0.053, -0.053]
Gate  : [+0.762, +0.762, +0.900, +0.762, +0.762, +0.900, +0.762, +0.762]
Loss: 0.04661= 0.003 + 0.001 + 0.007 + 0.001 + 0.002 + 0.014 + 0.011 + 0.008
dL/dw1 = +0.028 + +0.002 + -0.052 + -0.040 + -0.033 + +0.087 + +0.025 + -0.023 = -0.005
dL/dw2 = +0.026 + -0.002 + +0.004 + -0.015 + +0.005 + +0.061 + -0.006 + -0.056 = +0.018
dL/dw4 = +0.007 + +0.003 + -0.045 + -0.023 + -0.031 + +0.034 + +0.026 + +0.019 = -0.010
dL/dw5 = +0.007 + +0.003 + -0.007 + -0.007 + -0.008 + +0.022 + +0.016 + +0.010 = +0.036
w1=+0.713, w2=-0.572, w3=+1.000, w4=+0.901, w5=-0.141

Iteration 149
Output: [+0.518, +0.522, +0.688, +0.703, +0.707, +0.560, +0.565, +0.569]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.070, +0.089, +0.792, +0.862, +0.881, +0.241, +0.260, +0.279]
Input : [+0.118, +0.032, +0.744, +0.118, +0.032, -0.678, +0.032, +0.032]
Gate  : [+0.596, +0.596, +0.943, +0.596, +0.596, +0.943, +0.596, +0.596]
Loss: 0.03474= 0.001 + 0.001 + 0.004 + 0.002 + 0.001 + 0.007 + 0.008 + 0.010
dL/dw1 = +0.011 + +0.001 + -0.043 + -0.046 + -0.026 + +0.064 + +0.033 + -0.003 = -0.008
dL/dw2 = +0.010 + -0.001 + +0.003 + -0.014 + +0.003 + +0.049 + +0.011 + -0.032 = +0.028
dL/dw4 = +0.002 + +0.003 + -0.038 + -0.028 + -0.025 + +0.022 + +0.026 + +0.030 = -0.009
dL/dw5 = +0.002 + +0.003 + -0.006 + -0.008 + -0.007 + +0.018 + +0.021 + +0.025 = +0.049
w1=+0.745, w2=-0.681, w3=+1.000, w4=+0.944, w5=-0.352

Iteration 199
Output: [+0.501, +0.517, +0.703, +0.704, +0.717, +0.521, +0.537, +0.553]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [+0.004, +0.067, +0.863, +0.866, +0.930, +0.084, +0.147, +0.211]
Input : [+0.009, +0.164, +0.800, +0.009, +0.164, -0.850, +0.164, +0.164]
Gate  : [+0.385, +0.385, +0.995, +0.385, +0.385, +0.995, +0.385, +0.385]
Loss: 0.01388= 0.000 + 0.001 + 0.002 + 0.002 + 0.000 + 0.001 + 0.003 + 0.006
dL/dw1 = +0.000 + +0.001 + -0.029 + -0.039 + -0.015 + +0.023 + +0.026 + +0.019 = -0.014
dL/dw2 = +0.000 + -0.001 + +0.001 + -0.009 + +0.001 + +0.019 + +0.018 + +0.005 = +0.035
dL/dw4 = +0.000 + +0.003 + -0.027 + -0.027 + -0.016 + +0.006 + +0.017 + +0.033 = -0.011
dL/dw5 = +0.000 + +0.003 + -0.005 + -0.005 + -0.005 + +0.007 + +0.019 + +0.036 = +0.050
w1=+0.802, w2=-0.854, w3=+1.000, w4=+0.996, w5=-0.614

Iteration 249
Output: [+0.498, +0.512, +0.720, +0.718, +0.729, +0.497, +0.510, +0.524]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.500, +0.500, +0.500]
Memory: [-0.009, +0.047, +0.943, +0.934, +0.990, -0.014, +0.042, +0.097]
Input : [-0.037, +0.231, +0.868, -0.037, +0.231, -0.972, +0.231, +0.231]
Gate  : [+0.240, +0.240, +1.033, +0.240, +0.240, +1.033, +0.240, +0.240]
Loss: 0.00246= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.001
dL/dw1 = -0.001 + +0.000 + -0.012 + -0.017 + -0.002 + -0.004 + +0.009 + +0.015 = -0.011
dL/dw2 = -0.001 + -0.000 + +0.000 + -0.003 + +0.000 + -0.003 + +0.008 + +0.011 = +0.012
dL/dw4 = +0.000 + +0.002 + -0.012 + -0.013 + -0.003 + -0.001 + +0.005 + +0.018 = -0.003
dL/dw5 = +0.000 + +0.002 + -0.002 + -0.002 + -0.001 + -0.001 + +0.006 + +0.021 = +0.023
w1=+0.869, w2=-0.973, w3=+1.000, w4=+1.033, w5=-0.795

Out[25]:
(0.8691986382286676,
 -0.9728195862949829,
 1.0,
 1.0330824913276644,
 -0.7952812244235943)

Now we see that the input gate is closer to the true gate: it tries to ignore irrelevant input by setting the weights of those input closer to 0. Although in this case it is still far from the true gate (the irrelevant input still gets positive score), we see that it has good impact on the loss, reaching an order of magnitude lower. And actually if we run more iterations, we see later that the gate will be learned correctly ($w_4 = 1.0, w_5=-1.0$).

Notice that in the network without input gate, at the end the overall gradient is zero, but actually the gradient at each position in the sequence is not zero, and in fact the magnitude is not quite small, meaning the network ends up at a non-optimal position, while in the gated version, we see the gradient approaches zero in all position.


In [28]:
# Trying nested brackets
experiment_gated('ab(aaa(bab)b)')


w1=+1.000, w2=+1.000, w3=+1.000, w4=+1.000, w5=+1.000
Iteration 0
Output: [+0.982, +0.498, +0.730, +0.993, +1.000, +1.000, +1.000, +1.000, +1.000, +1.000, +1.000, +0.999, +1.000]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+4.003, -0.007, +0.993, +4.996, +8.998, +13.001, +14.001, +9.991, +13.994, +9.984, +10.984, +6.975, +7.975]
Input : [+2.001, -2.005, +1.000, +2.001, +2.001, +2.001, +1.000, -2.005, +2.001, -2.005, +1.000, -2.005, +1.000]
Gate  : [+2.000, +2.000, +1.000, +2.000, +2.000, +2.000, +1.000, +2.000, +2.000, +2.000, +1.000, +2.000, +1.000]
Loss: 18.06454= 1.326 + 0.000 + 0.000 + 0.768 + 1.838 + 2.914 + 1.304 + 0.826 + 1.303 + 0.825 + 2.372 + 1.295 + 3.295
dL/dw1 = +1.000 + -0.000 + -0.002 + +0.860 + +1.439 + +1.998 + +1.005 + +0.782 + +1.029 + +0.807 + +1.820 + +1.314 + +2.449 = +14.501
dL/dw2 = +0.930 + +0.000 + +0.000 + +0.450 + +0.980 + +1.499 + +0.664 + +0.409 + +0.639 + +0.383 + +1.134 + +0.555 + +1.535 = +9.178
dL/dw4 = +0.965 + +0.000 + -0.001 + +0.786 + +1.344 + +1.883 + +0.954 + +0.714 + +0.953 + +0.714 + +1.880 + +1.337 + +2.992 = +14.520
dL/dw5 = +0.965 + +0.000 + +0.000 + +0.524 + +1.075 + +1.614 + +0.715 + +0.476 + +0.715 + +0.476 + +1.074 + +0.533 + +0.993 = +9.159
w1=-0.450, w2=+0.082, w3=+1.000, w4=-0.452, w5=+0.084

Iteration 1
Output: [+0.536, +0.505, +0.556, +0.591, +0.625, +0.657, +0.702, +0.675, +0.706, +0.680, +0.672, +0.644, +0.636]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.142, +0.020, +0.224, +0.366, +0.509, +0.651, +0.855, +0.733, +0.875, +0.753, +0.716, +0.594, +0.557]
Input : [-0.387, +0.332, -0.450, -0.387, -0.387, -0.387, -0.450, +0.332, -0.387, +0.332, +0.082, +0.332, +0.082]
Gate  : [-0.368, -0.368, -0.452, -0.368, -0.368, -0.368, -0.452, -0.368, -0.368, -0.368, -0.452, -0.368, -0.452]
Loss: 0.61597= 0.003 + 0.000 + 0.065 + 0.043 + 0.025 + 0.013 + 0.091 + 0.114 + 0.087 + 0.110 + 0.008 + 0.017 + 0.038
dL/dw1 = -0.014 + -0.000 + +0.086 + +0.122 + +0.133 + +0.121 + +0.374 + +0.358 + +0.372 + +0.358 + +0.106 + +0.125 + -0.195 = +1.946
dL/dw2 = -0.013 + +0.000 + -0.007 + +0.044 + +0.071 + +0.076 + +0.184 + +0.130 + +0.172 + +0.119 + +0.062 + +0.056 + -0.149 = +0.746
dL/dw4 = -0.014 + -0.000 + +0.089 + +0.125 + +0.136 + +0.123 + +0.380 + +0.367 + +0.380 + +0.370 + +0.104 + +0.124 + -0.182 = +2.002
dL/dw5 = -0.014 + -0.000 + +0.010 + +0.062 + +0.088 + +0.090 + +0.218 + +0.182 + +0.223 + +0.189 + +0.056 + +0.053 + -0.083 = +1.074
w1=-0.645, w2=+0.008, w3=+1.000, w4=-0.652, w5=-0.023

Iteration 2
Output: [+0.610, +0.511, +0.614, +0.714, +0.796, +0.859, +0.903, +0.861, +0.906, +0.867, +0.866, +0.812, +0.812]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.446, +0.046, +0.466, +0.913, +1.359, +1.806, +2.226, +1.825, +2.272, +1.871, +1.866, +1.465, +1.460]
Input : [-0.661, +0.593, -0.645, -0.661, -0.661, -0.661, -0.645, +0.593, -0.661, +0.593, +0.008, +0.593, +0.008]
Gate  : [-0.675, -0.675, -0.652, -0.675, -0.675, -0.675, -0.652, -0.675, -0.675, -0.675, -0.652, -0.675, -0.652]
Loss: 0.46103= 0.025 + 0.000 + 0.030 + 0.001 + 0.012 + 0.056 + 0.003 + 0.002 + 0.004 + 0.001 + 0.063 + 0.020 + 0.246
dL/dw1 = -0.077 + -0.001 + +0.084 + +0.025 + -0.137 + -0.361 + -0.076 + +0.056 + -0.091 + +0.042 + -0.393 + -0.186 + -0.712 = -1.826
dL/dw2 = -0.072 + +0.001 + -0.008 + +0.010 + -0.079 + -0.240 + -0.041 + +0.023 + -0.047 + +0.015 + -0.235 + -0.082 + -0.519 = -1.274
dL/dw4 = -0.073 + -0.001 + +0.083 + +0.024 + -0.131 + -0.344 + -0.073 + +0.054 + -0.088 + +0.040 + -0.379 + -0.180 + -0.687 = -1.754
dL/dw5 = -0.073 + -0.001 + +0.008 + +0.013 + -0.090 + -0.262 + -0.045 + +0.029 + -0.054 + +0.022 + -0.206 + -0.076 + -0.290 = -1.025
w1=-0.462, w2=+0.135, w3=+1.000, w4=-0.477, w5=+0.079

Iteration 3
Output: [+0.535, +0.506, +0.561, +0.595, +0.628, +0.660, +0.707, +0.683, +0.712, +0.688, +0.674, +0.649, +0.634]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.139, +0.025, +0.245, +0.384, +0.522, +0.661, +0.881, +0.768, +0.906, +0.792, +0.728, +0.614, +0.550]
Input : [-0.349, +0.286, -0.462, -0.349, -0.349, -0.349, -0.462, +0.286, -0.349, +0.286, +0.135, +0.286, +0.135]
Gate  : [-0.398, -0.398, -0.477, -0.398, -0.398, -0.398, -0.477, -0.398, -0.398, -0.398, -0.477, -0.398, -0.477]
Loss: 0.57942= 0.002 + 0.000 + 0.062 + 0.041 + 0.024 + 0.012 + 0.086 + 0.107 + 0.082 + 0.103 + 0.008 + 0.015 + 0.037
dL/dw1 = -0.014 + -0.000 + +0.088 + +0.127 + +0.139 + +0.126 + +0.388 + +0.368 + +0.383 + +0.366 + +0.108 + +0.126 + -0.205 = +1.998
dL/dw2 = -0.013 + +0.000 + -0.007 + +0.046 + +0.075 + +0.079 + +0.192 + +0.135 + +0.180 + +0.123 + +0.063 + +0.057 + -0.156 = +0.774
dL/dw4 = -0.012 + -0.000 + +0.089 + +0.119 + +0.126 + +0.112 + +0.353 + +0.345 + +0.353 + +0.348 + +0.095 + +0.114 + -0.168 = +1.875
dL/dw5 = -0.012 + -0.000 + +0.011 + +0.056 + +0.079 + +0.079 + +0.193 + +0.163 + +0.197 + +0.170 + +0.050 + +0.049 + -0.080 = +0.954
w1=-0.662, w2=+0.058, w3=+1.000, w4=-0.664, w5=-0.016

Iteration 4
Output: [+0.606, +0.513, +0.620, +0.715, +0.794, +0.855, +0.902, +0.863, +0.906, +0.869, +0.864, +0.814, +0.808]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.429, +0.051, +0.491, +0.920, +1.349, +1.778, +2.218, +1.839, +2.268, +1.890, +1.852, +1.474, +1.435]
Input : [-0.631, +0.556, -0.662, -0.631, -0.631, -0.631, -0.662, +0.556, -0.631, +0.556, +0.058, +0.556, +0.058]
Gate  : [-0.681, -0.681, -0.664, -0.681, -0.681, -0.681, -0.664, -0.681, -0.681, -0.681, -0.664, -0.681, -0.664]
Loss: 0.44256= 0.023 + 0.000 + 0.027 + 0.001 + 0.011 + 0.052 + 0.002 + 0.001 + 0.004 + 0.001 + 0.062 + 0.020 + 0.238
dL/dw1 = -0.075 + -0.001 + +0.081 + +0.023 + -0.135 + -0.355 + -0.074 + +0.052 + -0.091 + +0.035 + -0.393 + -0.191 + -0.713 = -1.836
dL/dw2 = -0.069 + +0.001 + -0.008 + +0.009 + -0.078 + -0.236 + -0.040 + +0.021 + -0.046 + +0.013 + -0.234 + -0.085 + -0.521 = -1.274
dL/dw4 = -0.067 + -0.001 + +0.082 + +0.022 + -0.126 + -0.327 + -0.069 + +0.049 + -0.086 + +0.034 + -0.367 + -0.181 + -0.658 = -1.694
dL/dw5 = -0.067 + -0.001 + +0.008 + +0.011 + -0.084 + -0.245 + -0.041 + +0.025 + -0.052 + +0.018 + -0.198 + -0.077 + -0.286 = -0.987
w1=-0.478, w2=+0.185, w3=+1.000, w4=-0.495, w5=+0.083

Iteration 5
Output: [+0.533, +0.507, +0.566, +0.598, +0.629, +0.659, +0.710, +0.688, +0.716, +0.694, +0.675, +0.652, +0.631]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.131, +0.029, +0.265, +0.396, +0.527, +0.658, +0.895, +0.792, +0.923, +0.821, +0.729, +0.627, +0.536]
Input : [-0.317, +0.248, -0.478, -0.317, -0.317, -0.317, -0.478, +0.248, -0.317, +0.248, +0.185, +0.248, +0.185]
Gate  : [-0.412, -0.412, -0.495, -0.412, -0.412, -0.412, -0.495, -0.412, -0.412, -0.412, -0.495, -0.412, -0.495]
Loss: 0.55541= 0.002 + 0.000 + 0.058 + 0.039 + 0.023 + 0.012 + 0.084 + 0.103 + 0.079 + 0.097 + 0.007 + 0.014 + 0.035
dL/dw1 = -0.014 + -0.000 + +0.089 + +0.129 + +0.142 + +0.132 + +0.396 + +0.372 + +0.389 + +0.368 + +0.111 + +0.126 + -0.208 = +2.031
dL/dw2 = -0.013 + +0.000 + -0.007 + +0.047 + +0.077 + +0.083 + +0.196 + +0.136 + +0.182 + +0.124 + +0.065 + +0.057 + -0.158 = +0.789
dL/dw4 = -0.010 + -0.000 + +0.090 + +0.115 + +0.121 + +0.108 + +0.338 + +0.333 + +0.338 + +0.335 + +0.091 + +0.108 + -0.154 = +1.813
dL/dw5 = -0.010 + -0.000 + +0.011 + +0.052 + +0.072 + +0.074 + +0.175 + +0.149 + +0.180 + +0.157 + +0.048 + +0.047 + -0.078 = +0.875
w1=-0.681, w2=+0.106, w3=+1.000, w4=-0.676, w5=-0.005

Iteration 6
Output: [+0.601, +0.514, +0.626, +0.717, +0.792, +0.852, +0.901, +0.865, +0.906, +0.871, +0.863, +0.815, +0.804]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.412, +0.056, +0.517, +0.928, +1.340, +1.751, +2.212, +1.856, +2.268, +1.912, +1.840, +1.485, +1.413]
Input : [-0.604, +0.522, -0.681, -0.604, -0.604, -0.604, -0.681, +0.522, -0.604, +0.522, +0.106, +0.522, +0.106]
Gate  : [-0.681, -0.681, -0.676, -0.681, -0.681, -0.681, -0.676, -0.681, -0.681, -0.681, -0.676, -0.681, -0.676]
Loss: 0.42587= 0.021 + 0.000 + 0.025 + 0.001 + 0.011 + 0.049 + 0.002 + 0.001 + 0.004 + 0.000 + 0.060 + 0.021 + 0.231
dL/dw1 = -0.072 + -0.001 + +0.078 + +0.021 + -0.133 + -0.347 + -0.073 + +0.046 + -0.092 + +0.028 + -0.393 + -0.197 + -0.712 = -1.845
dL/dw2 = -0.067 + +0.001 + -0.008 + +0.008 + -0.076 + -0.230 + -0.039 + +0.019 + -0.046 + +0.010 + -0.234 + -0.088 + -0.523 = -1.271
dL/dw4 = -0.061 + -0.001 + +0.080 + +0.020 + -0.121 + -0.312 + -0.067 + +0.044 + -0.085 + +0.027 + -0.358 + -0.184 + -0.633 = -1.652
dL/dw5 = -0.061 + -0.001 + +0.009 + +0.010 + -0.079 + -0.229 + -0.039 + +0.022 + -0.050 + +0.014 + -0.192 + -0.078 + -0.283 = -0.959
w1=-0.497, w2=+0.233, w3=+1.000, w4=-0.511, w5=+0.091

Iteration 7
Output: [+0.530, +0.508, +0.571, +0.601, +0.629, +0.657, +0.712, +0.693, +0.719, +0.700, +0.675, +0.655, +0.627]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.122, +0.032, +0.286, +0.408, +0.530, +0.652, +0.906, +0.816, +0.938, +0.848, +0.729, +0.639, +0.520]
Input : [-0.290, +0.214, -0.497, -0.290, -0.290, -0.290, -0.497, +0.214, -0.290, +0.214, +0.233, +0.214, +0.233]
Gate  : [-0.420, -0.420, -0.511, -0.420, -0.420, -0.420, -0.511, -0.420, -0.420, -0.420, -0.511, -0.420, -0.511]
Loss: 0.53373= 0.002 + 0.000 + 0.055 + 0.037 + 0.023 + 0.013 + 0.082 + 0.098 + 0.077 + 0.092 + 0.008 + 0.013 + 0.033
dL/dw1 = -0.013 + -0.000 + +0.089 + +0.129 + +0.145 + +0.137 + +0.400 + +0.371 + +0.392 + +0.366 + +0.114 + +0.125 + -0.207 = +2.046
dL/dw2 = -0.012 + +0.000 + -0.007 + +0.047 + +0.078 + +0.086 + +0.197 + +0.135 + +0.183 + +0.122 + +0.067 + +0.056 + -0.159 = +0.794
dL/dw4 = -0.009 + -0.001 + +0.092 + +0.113 + +0.117 + +0.106 + +0.327 + +0.324 + +0.327 + +0.326 + +0.089 + +0.104 + -0.143 = +1.772
dL/dw5 = -0.009 + -0.001 + +0.012 + +0.048 + +0.067 + +0.070 + +0.160 + +0.137 + +0.166 + +0.146 + +0.046 + +0.046 + -0.076 = +0.812
w1=-0.702, w2=+0.154, w3=+1.000, w4=-0.688, w5=+0.010

Iteration 8
Output: [+0.597, +0.515, +0.633, +0.718, +0.791, +0.848, +0.901, +0.867, +0.906, +0.874, +0.861, +0.817, +0.800]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.393, +0.060, +0.543, +0.936, +1.329, +1.722, +2.205, +1.872, +2.265, +1.933, +1.827, +1.495, +1.389]
Input : [-0.579, +0.490, -0.702, -0.579, -0.579, -0.579, -0.702, +0.490, -0.579, +0.490, +0.154, +0.490, +0.154]
Gate  : [-0.679, -0.679, -0.688, -0.679, -0.679, -0.679, -0.688, -0.679, -0.679, -0.679, -0.688, -0.679, -0.688]
Loss: 0.40831= 0.019 + 0.000 + 0.022 + 0.000 + 0.010 + 0.045 + 0.002 + 0.001 + 0.003 + 0.000 + 0.058 + 0.022 + 0.224
dL/dw1 = -0.068 + -0.001 + +0.075 + +0.019 + -0.129 + -0.337 + -0.071 + +0.041 + -0.091 + +0.022 + -0.390 + -0.202 + -0.709 = -1.843
dL/dw2 = -0.063 + +0.001 + -0.007 + +0.007 + -0.074 + -0.222 + -0.038 + +0.016 + -0.046 + +0.008 + -0.232 + -0.090 + -0.523 = -1.262
dL/dw4 = -0.056 + -0.001 + +0.078 + +0.017 + -0.116 + -0.296 + -0.064 + +0.039 + -0.083 + +0.020 + -0.349 + -0.187 + -0.610 = -1.609
dL/dw5 = -0.056 + -0.001 + +0.009 + +0.009 + -0.074 + -0.214 + -0.036 + +0.019 + -0.048 + +0.010 + -0.186 + -0.080 + -0.281 = -0.931
w1=-0.517, w2=+0.280, w3=+1.000, w4=-0.527, w5=+0.103

Iteration 9
Output: [+0.528, +0.509, +0.576, +0.604, +0.630, +0.656, +0.715, +0.699, +0.722, +0.706, +0.675, +0.658, +0.624]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.113, +0.035, +0.308, +0.421, +0.534, +0.647, +0.920, +0.843, +0.956, +0.878, +0.730, +0.653, +0.505]
Input : [-0.266, +0.183, -0.517, -0.266, -0.266, -0.266, -0.517, +0.183, -0.266, +0.183, +0.280, +0.183, +0.280]
Gate  : [-0.425, -0.425, -0.527, -0.425, -0.425, -0.425, -0.527, -0.425, -0.425, -0.425, -0.527, -0.425, -0.527]
Loss: 0.50985= 0.002 + 0.000 + 0.052 + 0.036 + 0.023 + 0.013 + 0.080 + 0.093 + 0.074 + 0.087 + 0.007 + 0.012 + 0.032
dL/dw1 = -0.012 + -0.000 + +0.088 + +0.129 + +0.146 + +0.141 + +0.401 + +0.368 + +0.391 + +0.360 + +0.116 + +0.123 + -0.207 = +2.043
dL/dw2 = -0.012 + +0.000 + -0.007 + +0.046 + +0.078 + +0.088 + +0.196 + +0.132 + +0.180 + +0.119 + +0.068 + +0.055 + -0.159 = +0.787
dL/dw4 = -0.008 + -0.001 + +0.093 + +0.110 + +0.114 + +0.104 + +0.318 + +0.315 + +0.317 + +0.317 + +0.086 + +0.099 + -0.133 = +1.734
dL/dw5 = -0.008 + -0.001 + +0.013 + +0.045 + +0.062 + +0.066 + +0.146 + +0.127 + +0.153 + +0.137 + +0.044 + +0.044 + -0.074 = +0.754
w1=-0.722, w2=+0.201, w3=+1.000, w4=-0.701, w5=+0.027

Iteration 10
Output: [+0.592, +0.516, +0.639, +0.720, +0.789, +0.844, +0.900, +0.868, +0.905, +0.876, +0.859, +0.818, +0.796]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.373, +0.065, +0.570, +0.943, +1.316, +1.689, +2.195, +1.887, +2.260, +1.952, +1.811, +1.502, +1.361]
Input : [-0.554, +0.458, -0.722, -0.554, -0.554, -0.554, -0.722, +0.458, -0.554, +0.458, +0.201, +0.458, +0.201]
Gate  : [-0.673, -0.673, -0.701, -0.673, -0.673, -0.673, -0.701, -0.673, -0.673, -0.673, -0.701, -0.673, -0.701]
Loss: 0.38915= 0.017 + 0.001 + 0.019 + 0.000 + 0.009 + 0.042 + 0.002 + 0.001 + 0.003 + 0.000 + 0.056 + 0.023 + 0.216
dL/dw1 = -0.064 + -0.001 + +0.071 + +0.017 + -0.125 + -0.324 + -0.068 + +0.036 + -0.090 + +0.016 + -0.386 + -0.207 + -0.704 = -1.829
dL/dw2 = -0.060 + +0.001 + -0.007 + +0.007 + -0.071 + -0.212 + -0.036 + +0.014 + -0.045 + +0.006 + -0.229 + -0.092 + -0.522 = -1.245
dL/dw4 = -0.051 + -0.002 + +0.075 + +0.015 + -0.111 + -0.280 + -0.061 + +0.034 + -0.081 + +0.015 + -0.339 + -0.190 + -0.586 = -1.561
dL/dw5 = -0.051 + -0.002 + +0.009 + +0.007 + -0.069 + -0.199 + -0.033 + +0.016 + -0.046 + +0.007 + -0.179 + -0.082 + -0.278 = -0.899
w1=-0.539, w2=+0.326, w3=+1.000, w4=-0.545, w5=+0.117

Iteration 11
Output: [+0.526, +0.510, +0.582, +0.607, +0.632, +0.656, +0.719, +0.705, +0.727, +0.713, +0.676, +0.661, +0.620]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.104, +0.039, +0.332, +0.437, +0.541, +0.645, +0.939, +0.873, +0.977, +0.912, +0.734, +0.669, +0.491]
Input : [-0.244, +0.154, -0.539, -0.244, -0.244, -0.244, -0.539, +0.154, -0.244, +0.154, +0.326, +0.154, +0.326]
Gate  : [-0.427, -0.427, -0.545, -0.427, -0.427, -0.427, -0.545, -0.427, -0.427, -0.427, -0.545, -0.427, -0.545]
Loss: 0.48277= 0.001 + 0.000 + 0.048 + 0.034 + 0.022 + 0.013 + 0.077 + 0.088 + 0.071 + 0.081 + 0.007 + 0.011 + 0.030
dL/dw1 = -0.012 + -0.000 + +0.088 + +0.128 + +0.146 + +0.144 + +0.399 + +0.362 + +0.386 + +0.353 + +0.117 + +0.119 + -0.206 = +2.023
dL/dw2 = -0.011 + +0.000 + -0.007 + +0.045 + +0.077 + +0.089 + +0.193 + +0.129 + +0.177 + +0.115 + +0.068 + +0.054 + -0.159 = +0.771
dL/dw4 = -0.006 + -0.001 + +0.094 + +0.108 + +0.111 + +0.102 + +0.308 + +0.307 + +0.307 + +0.308 + +0.084 + +0.095 + -0.124 = +1.691
dL/dw5 = -0.006 + -0.001 + +0.013 + +0.041 + +0.057 + +0.062 + +0.133 + +0.118 + +0.141 + +0.127 + +0.042 + +0.042 + -0.073 = +0.697
w1=-0.741, w2=+0.249, w3=+1.000, w4=-0.714, w5=+0.048

Iteration 12
Output: [+0.587, +0.517, +0.645, +0.721, +0.786, +0.839, +0.899, +0.870, +0.905, +0.877, +0.857, +0.819, +0.791]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.352, +0.069, +0.598, +0.950, +1.302, +1.654, +2.183, +1.900, +2.252, +1.968, +1.791, +1.508, +1.330]
Input : [-0.528, +0.425, -0.741, -0.528, -0.528, -0.528, -0.741, +0.425, -0.528, +0.425, +0.249, +0.425, +0.249]
Gate  : [-0.666, -0.666, -0.714, -0.666, -0.666, -0.666, -0.714, -0.666, -0.666, -0.666, -0.714, -0.666, -0.714]
Loss: 0.36830= 0.015 + 0.001 + 0.017 + 0.000 + 0.009 + 0.038 + 0.002 + 0.001 + 0.003 + 0.000 + 0.054 + 0.023 + 0.207
dL/dw1 = -0.060 + -0.001 + +0.067 + +0.015 + -0.119 + -0.309 + -0.064 + +0.032 + -0.087 + +0.010 + -0.380 + -0.210 + -0.696 = -1.803
dL/dw2 = -0.056 + +0.001 + -0.006 + +0.006 + -0.067 + -0.201 + -0.033 + +0.013 + -0.043 + +0.004 + -0.225 + -0.094 + -0.519 = -1.221
dL/dw4 = -0.046 + -0.002 + +0.073 + +0.014 + -0.105 + -0.263 + -0.057 + +0.030 + -0.079 + +0.010 + -0.328 + -0.191 + -0.561 = -1.504
dL/dw5 = -0.046 + -0.002 + +0.009 + +0.006 + -0.064 + -0.183 + -0.030 + +0.014 + -0.043 + +0.005 + -0.172 + -0.083 + -0.274 = -0.863
w1=-0.561, w2=+0.371, w3=+1.000, w4=-0.563, w5=+0.134

Iteration 13
Output: [+0.524, +0.510, +0.589, +0.612, +0.634, +0.656, +0.724, +0.713, +0.732, +0.721, +0.677, +0.665, +0.617]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.096, +0.042, +0.358, +0.454, +0.550, +0.646, +0.962, +0.908, +1.004, +0.950, +0.741, +0.687, +0.478]
Input : [-0.224, +0.126, -0.561, -0.224, -0.224, -0.224, -0.561, +0.126, -0.224, +0.126, +0.371, +0.126, +0.371]
Gate  : [-0.429, -0.429, -0.563, -0.429, -0.429, -0.429, -0.563, -0.429, -0.429, -0.429, -0.563, -0.429, -0.563]
Loss: 0.45267= 0.001 + 0.000 + 0.044 + 0.032 + 0.021 + 0.013 + 0.073 + 0.082 + 0.067 + 0.075 + 0.007 + 0.010 + 0.028
dL/dw1 = -0.011 + -0.000 + +0.087 + +0.126 + +0.145 + +0.146 + +0.394 + +0.354 + +0.380 + +0.343 + +0.116 + +0.115 + -0.205 = +1.989
dL/dw2 = -0.010 + +0.000 + -0.007 + +0.044 + +0.076 + +0.090 + +0.188 + +0.124 + +0.171 + +0.110 + +0.067 + +0.052 + -0.159 = +0.748
dL/dw4 = -0.005 + -0.001 + +0.094 + +0.105 + +0.107 + +0.100 + +0.297 + +0.297 + +0.296 + +0.297 + +0.080 + +0.090 + -0.117 = +1.641
dL/dw5 = -0.005 + -0.001 + +0.014 + +0.038 + +0.053 + +0.058 + +0.121 + +0.108 + +0.129 + +0.118 + +0.040 + +0.040 + -0.072 = +0.641
w1=-0.760, w2=+0.296, w3=+1.000, w4=-0.727, w5=+0.070

Iteration 14
Output: [+0.582, +0.518, +0.651, +0.722, +0.783, +0.834, +0.897, +0.871, +0.904, +0.879, +0.854, +0.819, +0.785]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.330, +0.073, +0.625, +0.955, +1.286, +1.616, +2.168, +1.911, +2.241, +1.983, +1.768, +1.511, +1.295]
Input : [-0.502, +0.392, -0.760, -0.502, -0.502, -0.502, -0.760, +0.392, -0.502, +0.392, +0.296, +0.392, +0.296]
Gate  : [-0.658, -0.658, -0.727, -0.658, -0.658, -0.658, -0.727, -0.658, -0.658, -0.658, -0.727, -0.658, -0.727]
Loss: 0.34592= 0.014 + 0.001 + 0.015 + 0.000 + 0.008 + 0.034 + 0.001 + 0.000 + 0.003 + 0.000 + 0.051 + 0.024 + 0.196
dL/dw1 = -0.056 + -0.001 + +0.063 + +0.013 + -0.113 + -0.293 + -0.059 + +0.029 + -0.084 + +0.005 + -0.372 + -0.212 + -0.687 = -1.767
dL/dw2 = -0.052 + +0.001 + -0.006 + +0.005 + -0.063 + -0.189 + -0.030 + +0.011 + -0.041 + +0.002 + -0.220 + -0.095 + -0.515 = -1.191
dL/dw4 = -0.041 + -0.002 + +0.069 + +0.012 + -0.098 + -0.245 + -0.052 + +0.027 + -0.075 + +0.005 + -0.315 + -0.191 + -0.533 = -1.440
dL/dw5 = -0.041 + -0.002 + +0.009 + +0.005 + -0.058 + -0.167 + -0.027 + +0.012 + -0.040 + +0.002 + -0.164 + -0.083 + -0.269 = -0.823
w1=-0.583, w2=+0.415, w3=+1.000, w4=-0.583, w5=+0.152

Iteration 49
Output: [+0.494, +0.523, +0.710, +0.705, +0.699, +0.694, +0.835, +0.850, +0.847, +0.862, +0.714, +0.737, +0.529]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [-0.024, +0.090, +0.894, +0.869, +0.845, +0.820, +1.624, +1.738, +1.714, +1.829, +0.914, +1.029, +0.115]
Input : [+0.052, -0.243, -0.847, +0.052, +0.052, +0.052, -0.847, -0.243, +0.052, -0.243, +0.965, -0.243, +0.965]
Gate  : [-0.472, -0.472, -0.948, -0.472, -0.472, -0.472, -0.948, -0.472, -0.472, -0.472, -0.948, -0.472, -0.948]
Loss: 0.03034= 0.000 + 0.001 + 0.001 + 0.002 + 0.002 + 0.003 + 0.008 + 0.004 + 0.005 + 0.002 + 0.001 + 0.000 + 0.002
dL/dw1 = +0.003 + -0.001 + +0.021 + +0.039 + +0.062 + +0.091 + +0.155 + +0.090 + +0.116 + +0.058 + +0.052 + -0.015 + -0.074 = +0.599
dL/dw2 = +0.003 + +0.001 + -0.001 + +0.011 + +0.027 + +0.048 + +0.060 + +0.025 + +0.042 + +0.015 + +0.029 + -0.007 + -0.062 = +0.192
dL/dw4 = -0.000 + -0.004 + +0.022 + +0.026 + +0.030 + +0.032 + +0.079 + +0.060 + +0.064 + +0.042 + +0.021 + -0.008 + -0.014 = +0.349
dL/dw5 = -0.000 + -0.004 + +0.004 + +0.004 + +0.003 + +0.001 + +0.002 + +0.008 + +0.008 + +0.009 + +0.008 + -0.004 + -0.020 = +0.018
w1=-0.907, w2=+0.945, w3=+1.000, w4=-0.983, w5=+0.474

Iteration 99
Output: [+0.495, +0.518, +0.734, +0.730, +0.726, +0.722, +0.869, +0.880, +0.878, +0.887, +0.730, +0.748, +0.505]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [-0.020, +0.074, +1.014, +0.995, +0.975, +0.955, +1.896, +1.989, +1.970, +2.063, +0.994, +1.088, +0.018]
Input : [+0.056, -0.266, -0.932, +0.056, +0.056, +0.056, -0.932, -0.266, +0.056, -0.266, +1.060, -0.266, +1.060]
Gate  : [-0.352, -0.352, -1.009, -0.352, -0.352, -0.352, -1.009, -0.352, -0.352, -0.352, -1.009, -0.352, -1.009]
Loss: 0.00266= 0.000 + 0.001 + 0.000 + 0.000 + 0.000 + 0.000 + 0.001 + 0.000 + 0.000 + 0.000 + 0.000 + 0.001 + 0.000
dL/dw1 = +0.002 + -0.001 + -0.003 + +0.001 + +0.009 + +0.019 + +0.036 + +0.003 + +0.010 + -0.019 + +0.003 + -0.043 + -0.012 = +0.007
dL/dw2 = +0.002 + +0.001 + +0.000 + +0.000 + +0.003 + +0.009 + +0.011 + +0.001 + +0.003 + -0.004 + +0.002 + -0.020 + -0.010 = -0.003
dL/dw4 = -0.000 + -0.004 + -0.003 + +0.001 + +0.005 + +0.009 + +0.022 + +0.002 + +0.007 + -0.015 + +0.002 + -0.027 + -0.002 = -0.005
dL/dw5 = -0.000 + -0.004 + -0.001 + +0.000 + +0.000 + +0.000 + +0.000 + +0.000 + +0.001 + -0.003 + +0.001 + -0.013 + -0.004 = -0.022
w1=-0.933, w2=+1.060, w3=+1.000, w4=-1.008, w5=+0.659

Iteration 149
Output: [+0.498, +0.514, +0.732, +0.730, +0.729, +0.727, +0.873, +0.880, +0.879, +0.886, +0.730, +0.742, +0.500]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [-0.008, +0.054, +1.004, +0.996, +0.988, +0.980, +1.930, +1.992, +1.984, +2.047, +0.993, +1.055, +0.001]
Input : [+0.032, -0.246, -0.961, +0.032, +0.032, +0.032, -0.961, -0.246, +0.032, -0.246, +1.066, -0.246, +1.066]
Gate  : [-0.253, -0.253, -0.988, -0.253, -0.253, -0.253, -0.988, -0.253, -0.253, -0.253, -0.988, -0.253, -0.988]
Loss: 0.00112= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.001 + -0.000 + -0.001 + +0.001 + +0.004 + +0.007 + +0.021 + +0.002 + +0.005 + -0.012 + +0.004 + -0.025 + -0.000 = +0.005
dL/dw2 = +0.000 + +0.000 + +0.000 + +0.000 + +0.001 + +0.003 + +0.005 + +0.000 + +0.001 + -0.002 + +0.002 + -0.012 + -0.000 = -0.001
dL/dw4 = -0.000 + -0.003 + -0.001 + +0.001 + +0.003 + +0.004 + +0.015 + +0.002 + +0.004 + -0.012 + +0.002 + -0.018 + -0.000 = -0.003
dL/dw5 = -0.000 + -0.003 + -0.000 + +0.000 + +0.000 + +0.000 + +0.001 + +0.000 + +0.001 + -0.003 + +0.001 + -0.009 + -0.000 = -0.011
w1=-0.962, w2=+1.066, w3=+1.000, w4=-0.988, w5=+0.736

Iteration 199
Output: [+0.499, +0.511, +0.731, +0.730, +0.730, +0.729, +0.875, +0.880, +0.880, +0.884, +0.730, +0.739, +0.499]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [-0.003, +0.042, +0.999, +0.997, +0.994, +0.991, +1.948, +1.993, +1.991, +2.035, +0.994, +1.038, -0.004]
Input : [+0.013, -0.229, -0.981, +0.013, +0.013, +0.013, -0.981, -0.229, +0.013, -0.229, +1.068, -0.229, +1.068]
Gate  : [-0.196, -0.196, -0.975, -0.196, -0.196, -0.196, -0.975, -0.196, -0.196, -0.196, -0.975, -0.196, -0.975]
Loss: 0.00060= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = +0.000 + -0.000 + +0.000 + +0.001 + +0.002 + +0.003 + +0.014 + +0.002 + +0.003 + -0.009 + +0.003 + -0.017 + +0.002 = +0.003
dL/dw2 = +0.000 + +0.000 + -0.000 + +0.000 + +0.000 + +0.001 + +0.003 + +0.000 + +0.001 + -0.001 + +0.002 + -0.008 + +0.002 = -0.000
dL/dw4 = -0.000 + -0.002 + +0.000 + +0.001 + +0.001 + +0.002 + +0.012 + +0.002 + +0.002 + -0.009 + +0.002 + -0.013 + +0.001 = -0.002
dL/dw5 = -0.000 + -0.002 + +0.000 + +0.000 + +0.000 + +0.000 + +0.001 + +0.000 + +0.000 + -0.002 + +0.001 + -0.006 + +0.001 = -0.007
w1=-0.981, w2=+1.068, w3=+1.000, w4=-0.975, w5=+0.781

Iteration 249
Output: [+0.500, +0.509, +0.730, +0.730, +0.730, +0.731, +0.876, +0.880, +0.880, +0.884, +0.730, +0.737, +0.499]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.000, +0.034, +0.997, +0.997, +0.997, +0.997, +1.960, +1.994, +1.994, +2.028, +0.995, +1.029, -0.005]
Input : [-0.001, -0.216, -0.995, -0.001, -0.001, -0.001, -0.995, -0.216, -0.001, -0.216, +1.069, -0.216, +1.069]
Gate  : [-0.157, -0.157, -0.967, -0.157, -0.157, -0.157, -0.967, -0.157, -0.157, -0.157, -0.967, -0.157, -0.967]
Loss: 0.00037= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = -0.000 + -0.000 + +0.001 + +0.001 + +0.001 + +0.001 + +0.011 + +0.002 + +0.002 + -0.007 + +0.002 + -0.012 + +0.002 = +0.003
dL/dw2 = -0.000 + +0.000 + -0.000 + +0.000 + +0.000 + +0.000 + +0.002 + +0.000 + +0.000 + -0.001 + +0.001 + -0.006 + +0.002 = -0.000
dL/dw4 = -0.000 + -0.002 + +0.001 + +0.001 + +0.001 + +0.001 + +0.010 + +0.002 + +0.002 + -0.008 + +0.002 + -0.010 + +0.001 = -0.001
dL/dw5 = -0.000 + -0.002 + +0.000 + +0.000 + +0.000 + +0.000 + +0.001 + +0.000 + +0.000 + -0.002 + +0.001 + -0.005 + +0.001 = -0.005
w1=-0.996, w2=+1.069, w3=+1.000, w4=-0.967, w5=+0.810

Iteration 299
Output: [+0.500, +0.507, +0.730, +0.730, +0.731, +0.731, +0.877, +0.880, +0.880, +0.883, +0.730, +0.735, +0.499]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.002, +0.028, +0.995, +0.997, +0.999, +1.000, +1.967, +1.994, +1.996, +2.023, +0.996, +1.022, -0.005]
Input : [-0.012, -0.206, -1.007, -0.012, -0.012, -0.012, -1.007, -0.206, -0.012, -0.206, +1.069, -0.206, +1.069]
Gate  : [-0.130, -0.130, -0.961, -0.130, -0.130, -0.130, -0.961, -0.130, -0.130, -0.130, -0.961, -0.130, -0.961]
Loss: 0.00024= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = -0.000 + -0.000 + +0.001 + +0.001 + +0.000 + -0.000 + +0.008 + +0.001 + +0.001 + -0.005 + +0.002 + -0.009 + +0.002 = +0.002
dL/dw2 = -0.000 + +0.000 + -0.000 + +0.000 + +0.000 + -0.000 + +0.001 + +0.000 + +0.000 + -0.000 + +0.001 + -0.005 + +0.002 = -0.000
dL/dw4 = -0.000 + -0.002 + +0.001 + +0.001 + +0.000 + -0.000 + +0.008 + +0.002 + +0.001 + -0.006 + +0.001 + -0.008 + +0.001 = -0.001
dL/dw5 = -0.000 + -0.002 + +0.000 + +0.000 + +0.000 + -0.000 + +0.001 + +0.000 + +0.000 + -0.002 + +0.001 + -0.004 + +0.001 = -0.004
w1=-1.007, w2=+1.069, w3=+1.000, w4=-0.961, w5=+0.831

Iteration 349
Output: [+0.501, +0.506, +0.730, +0.731, +0.731, +0.731, +0.878, +0.880, +0.880, +0.883, +0.730, +0.735, +0.499]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.002, +0.024, +0.995, +0.997, +1.000, +1.002, +1.973, +1.995, +1.997, +2.019, +0.996, +1.018, -0.004]
Input : [-0.022, -0.198, -1.016, -0.022, -0.022, -0.022, -1.016, -0.198, -0.022, -0.198, +1.069, -0.198, +1.069]
Gate  : [-0.109, -0.109, -0.956, -0.109, -0.109, -0.109, -0.956, -0.109, -0.109, -0.109, -0.956, -0.109, -0.956]
Loss: 0.00017= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = -0.000 + -0.000 + +0.001 + +0.001 + +0.000 + -0.001 + +0.006 + +0.001 + +0.001 + -0.004 + +0.002 + -0.007 + +0.002 = +0.002
dL/dw2 = -0.000 + +0.000 + -0.000 + +0.000 + +0.000 + -0.000 + +0.001 + +0.000 + +0.000 + -0.000 + +0.001 + -0.004 + +0.002 = -0.000
dL/dw4 = -0.000 + -0.001 + +0.001 + +0.001 + +0.000 + -0.001 + +0.007 + +0.001 + +0.001 + -0.005 + +0.001 + -0.007 + +0.001 = -0.001
dL/dw5 = -0.000 + -0.001 + +0.000 + +0.000 + +0.000 + -0.000 + +0.001 + +0.000 + +0.000 + -0.001 + +0.001 + -0.003 + +0.001 = -0.003
w1=-1.016, w2=+1.069, w3=+1.000, w4=-0.956, w5=+0.847

Iteration 399
Output: [+0.501, +0.505, +0.730, +0.731, +0.731, +0.732, +0.878, +0.880, +0.881, +0.882, +0.730, +0.734, +0.499]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.003, +0.020, +0.995, +0.998, +1.000, +1.003, +1.977, +1.995, +1.998, +2.015, +0.997, +1.015, -0.004]
Input : [-0.029, -0.191, -1.023, -0.029, -0.029, -0.029, -1.023, -0.191, -0.029, -0.191, +1.069, -0.191, +1.069]
Gate  : [-0.092, -0.092, -0.952, -0.092, -0.092, -0.092, -0.952, -0.092, -0.092, -0.092, -0.952, -0.092, -0.952]
Loss: 0.00012= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = -0.000 + -0.000 + +0.001 + +0.001 + -0.000 + -0.001 + +0.005 + +0.001 + +0.001 + -0.003 + +0.001 + -0.006 + +0.002 = +0.001
dL/dw2 = -0.000 + +0.000 + -0.000 + +0.000 + -0.000 + -0.000 + +0.001 + +0.000 + +0.000 + -0.000 + +0.001 + -0.003 + +0.002 = -0.000
dL/dw4 = -0.000 + -0.001 + +0.001 + +0.001 + -0.000 + -0.001 + +0.006 + +0.001 + +0.001 + -0.004 + +0.001 + -0.005 + +0.001 = -0.001
dL/dw5 = -0.000 + -0.001 + +0.000 + +0.000 + -0.000 + -0.000 + +0.001 + +0.000 + +0.000 + -0.001 + +0.000 + -0.003 + +0.001 = -0.002
w1=-1.023, w2=+1.069, w3=+1.000, w4=-0.952, w5=+0.860

Iteration 449
Output: [+0.501, +0.504, +0.730, +0.731, +0.731, +0.732, +0.879, +0.880, +0.881, +0.882, +0.731, +0.733, +0.499]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.003, +0.017, +0.995, +0.998, +1.001, +1.003, +1.981, +1.996, +1.998, +2.013, +0.998, +1.012, -0.003]
Input : [-0.036, -0.185, -1.030, -0.036, -0.036, -0.036, -1.030, -0.185, -0.036, -0.185, +1.069, -0.185, +1.069]
Gate  : [-0.079, -0.079, -0.950, -0.079, -0.079, -0.079, -0.950, -0.079, -0.079, -0.079, -0.950, -0.079, -0.950]
Loss: 0.00009= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = -0.000 + -0.000 + +0.001 + +0.000 + -0.000 + -0.001 + +0.004 + +0.001 + +0.000 + -0.003 + +0.001 + -0.005 + +0.002 = +0.001
dL/dw2 = -0.000 + +0.000 + -0.000 + +0.000 + -0.000 + -0.000 + +0.000 + +0.000 + +0.000 + -0.000 + +0.001 + -0.002 + +0.002 = -0.000
dL/dw4 = -0.000 + -0.001 + +0.001 + +0.001 + -0.000 + -0.001 + +0.005 + +0.001 + +0.000 + -0.004 + +0.001 + -0.005 + +0.001 = -0.001
dL/dw5 = -0.000 + -0.001 + +0.000 + +0.000 + -0.000 + -0.000 + +0.001 + +0.000 + +0.000 + -0.001 + +0.000 + -0.002 + +0.001 = -0.002
w1=-1.030, w2=+1.069, w3=+1.000, w4=-0.949, w5=+0.871

Iteration 499
Output: [+0.501, +0.504, +0.730, +0.731, +0.731, +0.732, +0.879, +0.880, +0.881, +0.882, +0.731, +0.733, +0.499]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.003, +0.015, +0.995, +0.998, +1.001, +1.004, +1.984, +1.996, +1.999, +2.011, +0.998, +1.010, -0.003]
Input : [-0.041, -0.180, -1.035, -0.041, -0.041, -0.041, -1.035, -0.180, -0.041, -0.180, +1.069, -0.180, +1.069]
Gate  : [-0.067, -0.067, -0.947, -0.067, -0.067, -0.067, -0.947, -0.067, -0.067, -0.067, -0.947, -0.067, -0.947]
Loss: 0.00007= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = -0.000 + -0.000 + +0.001 + +0.000 + -0.000 + -0.001 + +0.004 + +0.001 + +0.000 + -0.002 + +0.001 + -0.004 + +0.001 = +0.001
dL/dw2 = -0.000 + +0.000 + -0.000 + +0.000 + -0.000 + -0.000 + +0.000 + +0.000 + +0.000 + -0.000 + +0.000 + -0.002 + +0.001 = -0.000
dL/dw4 = -0.000 + -0.001 + +0.001 + +0.001 + -0.000 + -0.001 + +0.004 + +0.001 + +0.000 + -0.003 + +0.001 + -0.004 + +0.001 = -0.000
dL/dw5 = -0.000 + -0.001 + +0.000 + +0.000 + -0.000 + -0.000 + +0.001 + +0.000 + +0.000 + -0.001 + +0.000 + -0.002 + +0.001 = -0.002
w1=-1.035, w2=+1.069, w3=+1.000, w4=-0.947, w5=+0.880

Iteration 549
Output: [+0.501, +0.503, +0.730, +0.731, +0.731, +0.732, +0.879, +0.880, +0.881, +0.882, +0.731, +0.733, +0.499]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.003, +0.013, +0.996, +0.998, +1.001, +1.004, +1.986, +1.996, +1.999, +2.009, +0.998, +1.008, -0.002]
Input : [-0.046, -0.176, -1.040, -0.046, -0.046, -0.046, -1.040, -0.176, -0.046, -0.176, +1.069, -0.176, +1.069]
Gate  : [-0.058, -0.058, -0.945, -0.058, -0.058, -0.058, -0.945, -0.058, -0.058, -0.058, -0.945, -0.058, -0.945]
Loss: 0.00005= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = -0.000 + -0.000 + +0.001 + +0.000 + -0.000 + -0.001 + +0.003 + +0.001 + +0.000 + -0.002 + +0.001 + -0.003 + +0.001 = +0.001
dL/dw2 = -0.000 + +0.000 + -0.000 + +0.000 + -0.000 + -0.000 + +0.000 + +0.000 + +0.000 + -0.000 + +0.000 + -0.002 + +0.001 = -0.000
dL/dw4 = -0.000 + -0.001 + +0.001 + +0.000 + -0.000 + -0.001 + +0.004 + +0.001 + +0.000 + -0.003 + +0.001 + -0.003 + +0.001 = -0.000
dL/dw5 = -0.000 + -0.001 + +0.000 + +0.000 + -0.000 + -0.000 + +0.001 + +0.000 + +0.000 + -0.001 + +0.000 + -0.002 + +0.001 = -0.001
w1=-1.040, w2=+1.069, w3=+1.000, w4=-0.945, w5=+0.888

Iteration 599
Output: [+0.501, +0.503, +0.730, +0.731, +0.731, +0.732, +0.880, +0.880, +0.881, +0.882, +0.731, +0.732, +0.499]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.002, +0.011, +0.996, +0.998, +1.001, +1.003, +1.988, +1.997, +1.999, +2.008, +0.999, +1.007, -0.002]
Input : [-0.050, -0.172, -1.044, -0.050, -0.050, -0.050, -1.044, -0.172, -0.050, -0.172, +1.070, -0.172, +1.070]
Gate  : [-0.050, -0.050, -0.944, -0.050, -0.050, -0.050, -0.944, -0.050, -0.050, -0.050, -0.944, -0.050, -0.944]
Loss: 0.00004= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = -0.000 + -0.000 + +0.001 + +0.000 + -0.000 + -0.001 + +0.003 + +0.001 + +0.000 + -0.002 + +0.001 + -0.003 + +0.001 = +0.001
dL/dw2 = -0.000 + +0.000 + -0.000 + +0.000 + -0.000 + -0.000 + +0.000 + +0.000 + +0.000 + -0.000 + +0.000 + -0.001 + +0.001 = -0.000
dL/dw4 = -0.000 + -0.001 + +0.001 + +0.000 + -0.000 + -0.001 + +0.003 + +0.001 + +0.000 + -0.002 + +0.001 + -0.003 + +0.000 = -0.000
dL/dw5 = -0.000 + -0.001 + +0.000 + +0.000 + -0.000 + -0.000 + +0.000 + +0.000 + +0.000 + -0.001 + +0.000 + -0.001 + +0.001 = -0.001
w1=-1.044, w2=+1.070, w3=+1.000, w4=-0.944, w5=+0.894

Iteration 649
Output: [+0.501, +0.502, +0.730, +0.731, +0.731, +0.732, +0.880, +0.880, +0.881, +0.881, +0.731, +0.732, +0.500]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.002, +0.009, +0.996, +0.999, +1.001, +1.003, +1.990, +1.997, +1.999, +2.007, +0.999, +1.006, -0.002]
Input : [-0.054, -0.169, -1.047, -0.054, -0.054, -0.054, -1.047, -0.169, -0.054, -0.169, +1.070, -0.169, +1.070]
Gate  : [-0.043, -0.043, -0.942, -0.043, -0.043, -0.043, -0.942, -0.043, -0.043, -0.043, -0.942, -0.043, -0.942]
Loss: 0.00003= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = -0.000 + -0.000 + +0.001 + +0.000 + -0.000 + -0.001 + +0.002 + +0.001 + +0.000 + -0.001 + +0.000 + -0.002 + +0.001 = +0.001
dL/dw2 = -0.000 + +0.000 + -0.000 + +0.000 + -0.000 + -0.000 + +0.000 + +0.000 + +0.000 + -0.000 + +0.000 + -0.001 + +0.001 = -0.000
dL/dw4 = -0.000 + -0.001 + +0.001 + +0.000 + -0.000 + -0.001 + +0.003 + +0.001 + +0.000 + -0.002 + +0.000 + -0.002 + +0.000 = -0.000
dL/dw5 = -0.000 + -0.001 + +0.000 + +0.000 + -0.000 + -0.000 + +0.000 + +0.000 + +0.000 + -0.001 + +0.000 + -0.001 + +0.000 = -0.001
w1=-1.047, w2=+1.070, w3=+1.000, w4=-0.942, w5=+0.900

Iteration 699
Output: [+0.501, +0.502, +0.730, +0.731, +0.731, +0.732, +0.880, +0.881, +0.881, +0.881, +0.731, +0.732, +0.500]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.002, +0.008, +0.997, +0.999, +1.001, +1.003, +1.991, +1.997, +2.000, +2.006, +0.999, +1.005, -0.002]
Input : [-0.057, -0.166, -1.050, -0.057, -0.057, -0.057, -1.050, -0.166, -0.057, -0.166, +1.070, -0.166, +1.070]
Gate  : [-0.037, -0.037, -0.941, -0.037, -0.037, -0.037, -0.941, -0.037, -0.037, -0.037, -0.941, -0.037, -0.941]
Loss: 0.00002= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = -0.000 + -0.000 + +0.001 + +0.000 + -0.000 + -0.001 + +0.002 + +0.001 + +0.000 + -0.001 + +0.000 + -0.002 + +0.001 = +0.001
dL/dw2 = -0.000 + +0.000 + -0.000 + +0.000 + -0.000 + -0.000 + +0.000 + +0.000 + +0.000 + -0.000 + +0.000 + -0.001 + +0.001 = -0.000
dL/dw4 = -0.000 + -0.000 + +0.001 + +0.000 + -0.000 + -0.001 + +0.002 + +0.001 + +0.000 + -0.002 + +0.000 + -0.002 + +0.000 = -0.000
dL/dw5 = -0.000 + -0.000 + +0.000 + +0.000 + -0.000 + -0.000 + +0.000 + +0.000 + +0.000 + -0.000 + +0.000 + -0.001 + +0.000 = -0.001
w1=-1.050, w2=+1.070, w3=+1.000, w4=-0.941, w5=+0.905

Iteration 749
Output: [+0.500, +0.502, +0.730, +0.731, +0.731, +0.732, +0.880, +0.881, +0.881, +0.881, +0.731, +0.732, +0.500]
Target: [+0.500, +0.500, +0.731, +0.731, +0.731, +0.731, +0.881, +0.881, +0.881, +0.881, +0.731, +0.731, +0.500]
Memory: [+0.002, +0.007, +0.997, +0.999, +1.001, +1.003, +1.993, +1.998, +2.000, +2.005, +0.999, +1.004, -0.001]
Input : [-0.060, -0.164, -1.053, -0.060, -0.060, -0.060, -1.053, -0.164, -0.060, -0.164, +1.070, -0.164, +1.070]
Gate  : [-0.031, -0.031, -0.940, -0.031, -0.031, -0.031, -0.940, -0.031, -0.031, -0.031, -0.940, -0.031, -0.940]
Loss: 0.00001= 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000 + 0.000
dL/dw1 = -0.000 + -0.000 + +0.001 + +0.000 + -0.000 + -0.001 + +0.002 + +0.000 + +0.000 + -0.001 + +0.000 + -0.002 + +0.001 = +0.000
dL/dw2 = -0.000 + +0.000 + -0.000 + +0.000 + -0.000 + -0.000 + +0.000 + +0.000 + +0.000 + -0.000 + +0.000 + -0.001 + +0.001 = -0.000
dL/dw4 = -0.000 + -0.000 + +0.001 + +0.000 + -0.000 + -0.001 + +0.002 + +0.001 + +0.000 + -0.001 + +0.000 + -0.002 + +0.000 = -0.000
dL/dw5 = -0.000 + -0.000 + +0.000 + +0.000 + -0.000 + -0.000 + +0.000 + +0.000 + +0.000 + -0.000 + +0.000 + -0.001 + +0.000 = -0.001
w1=-1.053, w2=+1.070, w3=+1.000, w4=-0.940, w5=+0.909

Out[28]:
(-1.0529977543939428,
 1.0696492814766454,
 1.0,
 -0.9401480352201477,
 0.9087447843399465)