The Hopfield network

A Hopfield network is a form of recurrent artificial neural network popularized by John Hopfield in 1982. It is an autoassociative network, which means that it can store a piece of data given as the activation of its own units and then retrieve it when you trigger a subpopulation of its units that is equal to a tiny sample of the same piece of data. Learning is done by changing the weights of the connections between its units in a fake hebbian way. You treat each input pattern as if it were the vector of the activations of units and highten the weights of the connections between concurrently active units.

figure 1: an abstract example showing the functioning of an autoassociative network. On the left the network weights after learning. On the center, we trigger only one unit. On the right, all the associated units become activated.

Table of contents

Spreading of activations

The activation of a unit in a Hopfield network is updated by evaluating the sign of the weighted sum of its inputs:

$$ x_i = \begin{cases} 1 & \quad \text{if } \sum_{j} w_{ij}x_j > 0\\ -1 & \quad \text{otherwise}\\ \end{cases} $$

There are two ways to implement the update of the units: synchronous and asynchronous update.

Syncronous update

The units are updated altogether at each step, using as inputs the freezed activations of themselves at the previous step.

In python you can write:

# n = len(x)
for i in xrange(n): 
    w_sum = 0
    for j in xrange(n):
        w_sum += w[i,j]*x_previous[j]
    x[i] = sgn(w_sum)
...
...
x_previous = x

Using linear algebra we can rewrite it in a shorter form as: $$ \mathbf{x} = sgn(\mathbf{W}\mathbf{x}) $$ where $\mathbf{x}$ is the vector of activations, $\mathbf{W}$ is the matrix of inner weights and $\mathbf{W}\mathbf{x}$ is the dot product, a linear algebra operator that allows to calculate the weighted sum at once.

Given two vectors of the same size $\mathbf{a} \in \mathbb{R}^n$ and $\mathbf{b} \in \mathbb{R}^n$, the dot product $\mathbf{a}^T\mathbf{b}$ produces the scalar value $a_0 b_0 + \dots +a_n b_n$.

Linear algebra notation is a far better way to implement neural networks in a program than iterating units. In particular, writing formulas with linear algebra notation in python (through the numpy library) improves both speed and readability. In the case of units activations in python you can write:

x = sgn(dot(w,x))

Using linear algebra in a neural network implementation is far simpler than writing loops, it is less error prone and also produces a much efficient code in terms of speed!!

Asynchronous update

At each step a unit is randomly chosen for the update, while the others remain immutated. In python you write:

i = chosen_index
w_sum = 0
for j in xrange(n):
    w_sum += w[i,j]*x[j]
x[i] = sgn(w_sum)

Using linear algebra leads to a shorter form also in this case:

i = chosen_index
x[i] = sgn(w[i,:],x)

Note how we don't need to store previous values here.

In the simulation described below we will use asynchronous update (in a way it seems more biologicaly plausible).

Learning

The learning of the weights is done offline at the start of the simulation. To update the weights you just treat each pattern $\mathbf{p}_k$ (with $k = 1,...,n_p$) as if it were the current vector of activations of the network units and increase those weights where the activity at the two endings are high (hebbian learning). This is done in practice by multiplying the two activities at the endings of each weight. The total amount of update is then calculated by adding together all the increments due to the different patterns to be learned. Finally the result is divided times the number of patterns to learn so that the weights are normalized. Putting it all together you have: $$ w_{i,j} = \frac{1}{n_p} \sum_k p^k_i p^k_j $$ In python we can write:

w = zeros([n,n])
# P is an array of p vectors 
for p in P:    
    # np = len(p)
    for i in xrange(np):  
        for j in xrange(np):
            # autoconnections are 
            # forbidden in Hopfield nets
            if i != j :     
                w[i,j] += (1/float(np))*p[i]*p[j]

Where P is an array with np rows of p patterns.

Once again linear algebra can help us and we can rewrite the learning rule as: $$ \mathbf{W} = \frac{1}{n_p}\sum_k \mathbf{p}_k \mathbf{p}_k^T $$ Where $ \mathbf{p}_k \mathbf{p}_k^T$ is the outer product of $\mathbf{p}_k$ with itself.

Given two vectors $\mathbf{a} \in \mathbb{R}^m$ and $\mathbf{b} \in \mathbb{R}^n$, the outer product $\mathbf{a}\mathbf{b}^T$ produces a matrix $\mathbf{M} \in \mathbb{R}^{m\times n}$: $$ \begin{bmatrix} a_{0} b_{0} & \dots & a_{0} b_{n} \\ \vdots & \ddots & \vdots \\ a_{m} b_{0} & \dots & a_{m} b_{n} \end{bmatrix} $$

In python it becomes:

# accumulate dot products
for p in P:
    w += (1/float(np))*outer(p,p)

# cut autoconnections afterward
w *= 1.0 - eye(n,n)

In this case we have only one loop instead of three!!

  • A general rule: the dot product is for spreading the activity throughout a network, while the outer product is for weight update

Energy function

In each state of the network we can calculate a value that is given by: $$ E = -\frac{1}{2}\sum_{i,j}w_{i,j}x_{i}x_{j} $$ or in linear algebra notation: $$ E = -\frac{1}{2} \mathbf{x}^T\mathbf{W}\mathbf{x} $$ Hopfield proved that this value can only decrease or stay the same. When $E$ ceases to change the network has reached one of its attractor states. if the learned patterns are sufficiently orthogonal with each other attractor states correspond to the recalled patterns.

In python it is:

E = -0.5*dot(x,dot(w,x))











































Next cell is just for styling


In [1]:
from IPython.core.display import HTML
def css_styling():
    styles = open("../style/ipybn.css", "r").read()
    return HTML(styles)
css_styling()


Out[1]: