Spreading of activations

The activation of a unit in a Hopfield network is updated by evaluating the sign of the weighted sum of its inputs:

$$ x_i = \begin{cases} 1 & \quad \text{if } \sum_{j} w_{ij}x_j > 0\\ -1 & \quad \text{otherwise}\\ \end{cases} $$

There are two ways to implement the update of the units: synchronous and asynchronous update.

Syncronous update

The units are updated altogether at each step, using as inputs the freezed activations of themselves at the previous step.

In python you can write:

# n = len(x)
for i in xrange(n): 
    w_sum = 0
    for j in xrange(n):
        w_sum += w[i,j]*x_previous[j]
    x[i] = sgn(w_sum)
...
...
x_previous = x

Using linear algebra we can rewrite it in a shorter form as: $$ \mathbf{x} = sgn(\mathbf{W}\mathbf{x}) $$ where $\mathbf{x}$ is the vector of activations, $\mathbf{W}$ is the matrix of inner weights and $\mathbf{W}\mathbf{x}$ is the dot product, a linear algebra operator that allows to calculate the weighted sum at once.

Given two vectors of the same size $\mathbf{a} \in \mathbb{R}^n$ and $\mathbf{b} \in \mathbb{R}^n$, the dot product $\mathbf{a}^T\mathbf{b}$ produces the scalar value $a_0 b_0 + \dots +a_n b_n$.

Linear algebra notation is a far better way to implement neural networks in a program than iterating units. In particular, writing formulas with linear algebra notation in python (through the numpy library) improves both speed and readability. In the case of units activations in python you can write:

x = sgn(dot(w,x))

Using linear algebra in a neural network implementation is far simpler than writing loops, it is less error prone and also produces a much efficient code in terms of speed!!

Asynchronous update

At each step a unit is randomly chosen for the update, while the others remain immutated. In python you write:

i = chosen_index
w_sum = 0
for j in xrange(n):
    w_sum += w[i,j]*x[j]
x[i] = sgn(w_sum)

Using linear algebra leads to a shorter form also in this case:

i = chosen_index
x[i] = sgn(w[i,:],x)

Note how we don't need to store previous values here.

In the simulation described below we will use asynchronous update (in a way it seems more biologicaly plausible).

Learning

The learning of the weights is done offline at the start of the simulation. To update the weights you just treat each pattern $\mathbf{p}_k$ (with $k = 1,...,n_p$) as if it were the current vector of activations of the network units and increase those weights where the activity at the two endings are high (hebbian learning). This is done in practice by multiplying the two activities at the endings of each weight. The total amount of update is then calculated by adding together all the increments due to the different patterns to be learned. Finally the result is divided times the number of patterns to learn so that the weights are normalized. Putting it all together you have: $$ w_{i,j} = \frac{1}{n_p} \sum_k p^k_i p^k_j $$ In python we can write:

w = zeros([n,n])
# P is an array of p vectors 
for p in P:    
    # np = len(p)
    for i in xrange(np):  
        for j in xrange(np):
            # autoconnections are 
            # forbidden in Hopfield nets
            if i != j :     
                w[i,j] += (1/float(np))*p[i]*p[j]

Where P is an array with np rows of p patterns.

Once again linear algebra can help us and we can rewrite the learning rule as: $$ \mathbf{W} = \frac{1}{n_p}\sum_k \mathbf{p}_k \mathbf{p}_k^T $$ Where $ \mathbf{p}_k \mathbf{p}_k^T$ is the outer product of $\mathbf{p}_k$ with itself.

Given two vectors $\mathbf{a} \in \mathbb{R}^m$ and $\mathbf{b} \in \mathbb{R}^n$, the outer product $\mathbf{a}\mathbf{b}^T$ produces a matrix $\mathbf{M} \in \mathbb{R}^{m\times n}$: $$ \begin{bmatrix} a_{0} b_{0} & \dots & a_{0} b_{n} \\ \vdots & \ddots & \vdots \\ a_{m} b_{0} & \dots & a_{m} b_{n} \end{bmatrix} $$

In python it becomes:

# accumulate dot products
for p in P:
    w += (1/float(np))*outer(p,p)

# cut autoconnections afterward
w *= 1.0 - eye(n,n)

In this case we have only one loop instead of three!!

A general rule: the dot product is for spreading the activity throughout a network, while the outer product is for weight update

Energy function

In each state of the network we can calculate a value that is given by: $$ E = -\frac{1}{2}\sum_{i,j}w_{i,j}x_{i}x_{j} $$ or in linear algebra notation: $$ E = -\frac{1}{2} \mathbf{x}^T\mathbf{W}\mathbf{x} $$ Hopfield proved that this value can only decrease or stay the same. When $E$ ceases to change the network has reached one of its attractor states. if the learned patterns are sufficiently orthogonal with each other attractor states correspond to the recalled patterns.

In python it is:

E = -0.5*dot(x,dot(w,x))

The Hopfield network

Spreading of activations

Syncronous update

Asynchronous update

Learning

Energy function