Here I test the idea of binary, or very discrete, coding in an adaption of Larry Abbott's FORCE learning [0].
I want to make sense of a world where the neural code is binary [6] (or very discrete) but organisms must still make smooth regular movements, and have conitnous feeling thoughts. A coarse code and a smooth world seem, naively, to contradict. I started playing with FORCE really out of curiosity, and because of all the learning rules I know it was the least likely to work! And then it work really well. ....Science! I'm surprised and pleased that my adaption, discrete FORCE, or DFORFCE, works so well.
While the learing part of FORCE is not biologically plausible (Recursive least squares, at least as implemented here), keys ideas do have theoretical and empirical support. FORCE uses the idea of high dimensional random (non-)linear encoding tied to linear decoding, which is found in all the 'echo state' and 'liquid state' learning systems (google for more) and which seem to be supported by data from visual areas[1], and motor cortex [3], among other areas [5]. Separately, Nengo/NEF also used non-linear/linear methods to make SPAUN [4].
[0] Original FORCE learning paper
[1] Application of recurrent neural network learning to model dynamics in primate visual cortex
[2] How recurrent neural networks respond to sinusoidal input
[3] Motor movement and dynamics
[4] SPAUN, a real big functional 'brain' model
[5] Reservoir computing and reinforcement learning
[6] Evidence for discrete decisions
FORCE is a variant 'liquid state machine' or 'echo state machine' supervised learning system, that can tame chaotic patterns to match a target time series. Read [0] before you go any farther.
In the model neurons here are replaced units, or neural mass models, that represent the aggregrate firing rates of many neurons. This is a common, and useful, theoretical abstraction, in case you're not familiar with it.
First I'm going to show you classic (continous) FORCE, the my discerte version.
We simulate the model:
$$\frac{d\mathbf{x}}{dt} = -\mathbf{x} + g J \tanh{[\mathbf{x}]} $$with $x \in \mathcal{R}^N$ (vector), $J \in \mathcal{R}^{N \times N}$ (matrix), $g \in \mathcal{R}$ (scalar). Randomly draw each element of $J$ from a Gaussian distribution with zero mean and variance $1/N$. Characterize the output of the system for increasing values of $g$. If $g$ is greater than 1.5 the system will behave chaotically. If you take $g$ = 1 it reduces the system from FORCE to a tradtional 'echo state' system.
In [4]:
%pylab inline
%scipy inline
In [3]:
def f1(x,t0):
return -x + g*dot(J,tanh(x))
N = 500
J = normal(0,sqrt(1/N),(N,N))
x0 = uniform(-0.5,0.5,N)
t = linspace(0,50,500)
plt.figure(figsize=(10,5))
for s,g in enumerate(linspace(0.5,1.5,3)):
plt.subplot(1,3,s+1)
x = odeint(f1,x0,t)
plt.plot(t,x[:,choice(N,10)])
plt.title('g = '+str(g),fontweight='bold')
plt.show()
You can linearly reweight the chaos into doing very useful computation. We call this learner the readout readout unit.
Model an output or readout unit for the network as:
$$z = \mathbf{w}^T \tanh[\mathbf{x}]$$The output $z$ is a scalar formed by the dot product of two N-dimensional vectors ($\mathbf{w}^T$ denotes the transpose of $\mathbf{w}$). We will implement the FORCE learning rule (Susillo & Abbott, 2009), by adjusting the readout weights, $w_i$, so that $z$ matches a target function:
$$f(t) = \cos\left(\frac{2 \pi t}{50} \right)$$The rule works by implementing recursive least-squares:
$$\mathbf{w} \rightarrow \mathbf{w} + c(f-z) \mathbf{q}$$$$\mathbf{q} = P \tanh [\mathbf{x}]$$$$c = \frac{1}{1+ \mathbf{q}^T \tanh(\mathbf{x})}$$$$P_{ij} \rightarrow P_{ij} - c q_i q_j$$
In [56]:
target = lambda t0: cos(2 * pi * t0 / 50) # target pattern
def f3(t0, x):
return -x + g * dot(J, tanh_x) + dot(w, tanh_x) * u
dt = 1 # time step
tmax = 800 # simulation length
tstop = 300
N = 300
J = normal(0, sqrt(1 / N), (N, N))
x0 = uniform(-0.5, 0.5, N)
t = linspace(0, 50, 500)
g = 1.5
u = uniform(-1, 1, N)
w = uniform(-1 / sqrt(N), 1 / sqrt(N), N) # initial weights
P = eye(N) # Running estimate of the inverse correlation matrix
lr = 1.0 # learning rate
# simulation data: state, output, time, weight updates
x, z, t, wu = [x0], [], [0], [0]
# Set up ode solver
solver = ode(f3)
solver.set_initial_value(x0)
# Integrate ode, update weights, repeat
while t[-1] < tmax:
tanh_x = tanh(x[-1]) # cache
z.append(dot(w, tanh_x))
error = target(t[-1]) - z[-1]
q = dot(P, tanh_x)
c = lr / (1 + dot(q, tanh_x))
P = P - c * outer(q, q)
w = w + c * error * q
# Stop leaning here
if t[-1] > tstop:
lr = 0
wu.append(np.sum(np.abs(c * error * q)))
solver.integrate(solver.t + dt)
x.append(solver.y)
t.append(solver.t)
# last update for readout neuron
z.append(dot(w, tanh_x))
x = np.array(x)
t = np.array(t)
plt.figure(figsize=(10, 5))
plt.subplot(2, 1, 1)
plt.plot(t, target(t), '-r', lw=2)
plt.plot(t, z, '-b')
plt.legend(('target', 'output'))
plt.ylim([-1.1, 3])
plt.xticks([])
plt.subplot(2, 1, 2)
plt.plot(t, wu, '-k')
plt.yscale('log')
plt.ylabel('$|\Delta w|$', fontsize=20)
plt.xlabel('time', fontweight='bold', fontsize=16)
plt.show()
FORCE does a pretty nice job learning how to be a sin wave. If you rerun this a few times, you'll see the quality of the fits varies. Such if woring with randomness and chaos.
In [34]:
def decode(x, rho):
xd = zeros_like(x)
xd[x > rho] = 1
xd[x < -rho] = -1
return xd
U) selected thresholds to convert from rates to binary codes. Don't have any idea how this really works, so random seems as good a guess as any.
In [39]:
def f1(x,t0):
return -x + g*dot(J,tanh(x))
N = 500
J = normal(0,sqrt(1/N),(N,N))
x0 = uniform(-0.5,0.5,N)
t = linspace(0,50,500)
rho = uniform(0,0.1,N) # Rand thresholds!
# rho = 0.5 # fixed threshold!
plt.figure(figsize=(10,5))
for s,g in enumerate(linspace(0.5,1.5,3)):
plt.subplot(1,3,s+1)
x = odeint(f1,x0,t)
xd = decode(x, rho)
plt.plot(t,xd[:,choice(N,10)])
plt.title('g = '+str(g),fontweight='bold')
plt.ylim(-2,2)
plt.show()
In [59]:
target = lambda t0: cos(2 * pi * t0 / 50) # target pattern
def f3(t0, x):
return -x + g * dot(J, tanh_x) + dot(w, tanh_x) * u
dt = 1 # time step
tmax = 800 # simulation length
tstop = 500
N = 300
J = normal(0, sqrt(1 / N), (N, N))
x0 = uniform(-0.5, 0.5, N)
t = linspace(0, 50, 500)
rho = 0.1 # Set and rand vec
rho = uniform(0, 0.1, N)
g = 1.5
u = uniform(-1, 1, N)
w = uniform(-1 / sqrt(N), 1 / sqrt(N), N) # initial weights
P = eye(N) # Running estimate of the inverse correlation matrix
lr = .4 # learning rate
rho = repeat(0.05, N)
# simulation data: state,
# output, time, weight updates
x, z, t, wu = [x0], [], [0], [0]
# Set up ode solver
solver = ode(f3)
solver.set_initial_value(x0)
# Integrate ode, update weights, repeat
while t[-1] < tmax:
tanh_x = tanh(x[-1])
tanh_xd = decode(tanh_x, rho) # BINARY CODE INTRODUCED HERE!
z.append(dot(w, tanh_xd))
error = target(t[-1]) - z[-1]
q = dot(P, tanh_xd)
c = lr / (1 + dot(q, tanh_xd))
P = P - c * outer(q, q)
w = w + c * error * q
# Stop training time
if t[-1] > tstop:
lr = 0
wu.append(np.sum(np.abs(c * error * q)))
solver.integrate(solver.t + dt)
x.append(solver.y)
t.append(solver.t)
# last update for readout neuron
z.append(dot(w, tanh_x))
# plot
x = np.array(x)
t = np.array(t)
plt.figure(figsize=(10, 5))
plt.subplot(2, 1, 1)
plt.plot(t, target(t), '-r', lw=2)
plt.plot(t, z, '-b')
plt.legend(('target', 'output'))
plt.ylim([-1.1, 3])
plt.xticks([])
plt.subplot(2, 1, 2)
plt.plot(t, wu, '-k')
plt.yscale('log')
plt.ylabel('$|\Delta w|$', fontsize=20)
plt.xlabel('time', fontweight='bold', fontsize=16)
plt.show()
In [ ]: