SYDE 556/750: Simulating Neurobiological Systems

Learning

What do we mean by learning?
- When we use an integrator to keep track of location, is that learning?
- What about the learning used to complete a pattern in the Raven's Progressive Matrices task?
- Neither of these require any connection weights to change in the model
- But both allow future performance to be affected by past performance
- I suggest the term 'adaptation' to capture all such future-affected-by-past phenomena
So, we'll stick with a simple definition of learning
- Changing connection weights between groups of neurons

Why might we want to change connection weights?
This is what traditional neural network approaches do
- Change connection weights until it performs the desired task
- Once it's doing the task, stop changing the weights
But we have a method for just solving for the optimal connection weights
- So why bother learning?

Why learning might be useful

We might not know the function at the beginning of the task
- Example: a creature explores its environment and learns that eating red objects is bad, but eating green objects is good
  - what are the inputs and outputs here?
The desired function might change
- Example: an ensemble whose input is a desired hand position, but the output is the muscle tension (or joint angles) needed to get there
  - why would this change?
The optimal weights we solve for might not be optimal
- How could they not be optimal?
- What assumptions are we making?

The simplest approach

What's the easiest way to deal with this, given what we know?

If we need new decoders
- Let's solve for them while the model's running
- Gather data to build up our $\Gamma$ and $\Upsilon$ matrices
Example: eating red but not green objects
- Decoder from state to $Q$ value (utility of action) for eating
- State is some high-dimensional vector that includes the colour of what we're looking for
  - And probably some other things, like whether it's small enough to be eaten
- Initially doesn't use colour to get output
- But we might experience a few bad outcomes after red, and good after green
- These become new $x$ samples, with corresponding $f(x)$ outputs
- Gather a few, recompute decoder
  - Could even do this after every timestep
Example: converting hand position to muscle commands
- Send random signals to muscles
- Observe hand position
- Use that to train decoders
Example: going from optimal to even more optimal
- As the model runs, we gather $x$ values
- Recompute decoder for those $x$ values

What's wrong with this approach

Feels like cheating
Why?

Two kinds of problems:
- Not biologically realistic
  - How are neurons supposed to do all this?
  - store data
  - solve decoders
  - timing
- Computationally expensive
  - Even if we're not worried about realism

Traditional neural networks

Traditionally, learning is the main method of constructing a model network
How?
Incremental learning
- As you get examples, shift the connection weights slightly based on that example
- Don't have to consider all the data when making an update
Example: Perceptron learning (1957)
- $\Delta w_j = \alpha(y_d - y)x_i$

Problems with perceptron
- Can't do all possible functions
- Effectively just linear functions of $x$ (with a threshold; i.e. a linear classifier)
- Is that a problem (X)OR not?

Backprop and the NEF

How are nonlinear functions included?
- Multiple layers

But now a new rule is needed
- Standard answer: backprop
- Same as perceptron for first layer
- Backprop adds: Estimate correct "hidden layer" input, and repeat
What would this be in NEF terms?

Remember that we're already fine with linear decoding
- Encoders (and $\alpha$ and $J^{bias}$) are first layer of weights, decoders are second layer
- Note that in the NEF, we combine many of these layers together
We can just use the standard perceptron rule for decoders
- As long as there are lots of neurons, and we've initialized them well with the desired intercepts, maximum rates, and encoders we should be able to decode
- So, what might backprop add to that?

Biologically realistic perceptron learning

(MacNeil & Eliasmith, 2011) derive a simple, plausible learning rule starting with a delta rule
$E = 1/2 \int (x-\hat{x})^2 dx$
$\delta E/\delta d_i = (x-\hat{x})a_i$ (as usual for finding decoders)
So, to move down the gradient:
- $\Delta d_i = -\kappa (x - \hat{x})a_i$ (NEF notation)
- $\Delta d_i = \kappa (y_d - y)a_i$ (the standard perceptron/delta rule)
How do we make it realistic?
Decoders don't exist in the brain
- Need weights
The NEF tells us:
- $\omega_{ij} = \alpha_j d_i \cdot e_j$
- $\Delta \omega_{ij} = \alpha_j \kappa (y_d - y)a_i \cdot e_j$
Let's write $(y_d - y)$ as $E$
- $\Delta \omega_{ij} = \alpha_j \kappa a_i E \cdot e_j$
- $\Delta \omega_{ij} = \kappa a_i (\alpha_j E \cdot e_j)$
What's $\alpha_j E \cdot e_j$?
- That's the current that this neuron would get if it had $E$ as an input
- But we don't want this current to drive the neuron
- Rather, we want it to change the weight
- It's a modulatory input
This is the "Prescribed Error Sensitivity" PES rule
- Any model in the NEF could use this instead of computing decoders
- Requires some other neural group computing the error $E$
- Used in Spaun for Q-value learning (reinforcement task)
- Can even be used to learn circular convolution
  - Only demonstrated up to 3 dimensions in (Bekolay et al, 2013)
  - Why not more? Patience.
Is this realistic?
- Local information only
- Need an error signal
  - Does it look like anything like this happens in the brain?
  - Yes
    - Retinal slip error is computed in oculomotor system
    - Dopamine seems to act as prediction error
- Weight changes proportional to pre-synaptic activity
- Most evidence about weight changes is fairly circumstantial



In [120]:

    
#From the learning examples in nengo - a Communication Channel
%pylab inline
import nengo
from nengo.processes import WhiteSignal

model = nengo.Network('Learn a Communication Channel')
with model:
    stim = nengo.Node(output=WhiteSignal(10, high=5, rms=0.5), size_out=2)
    
    pre = nengo.Ensemble(60, dimensions=2)
    post = nengo.Ensemble(60, dimensions=2)
    
    nengo.Connection(stim, pre)
    conn = nengo.Connection(pre, post, function=lambda x: np.random.random(2))
    
    inp_p = nengo.Probe(stim)
    pre_p = nengo.Probe(pre, synapse=0.01)
    post_p = nengo.Probe(post, synapse=0.01)

sim = nengo.Simulator(model)
sim.run(10.0)



In [121]:

    
t=sim.trange()

figure(figsize=(12, 8))
subplot(2, 1, 1)
plot(t, sim.data[inp_p].T[0], c='k', label='Input')
plot(t, sim.data[pre_p].T[0], c='b', label='Pre')
plot(t, sim.data[post_p].T[0], c='r', label='Post')
ylabel("Dimension 1")
legend(loc='best')
title('Random function computation')
    
subplot(2, 1, 2)
plot(t, sim.data[inp_p].T[1], c='k', label='Input')
plot(t, sim.data[pre_p].T[1], c='b', label='Pre')
plot(t, sim.data[post_p].T[1], c='r', label='Post')
ylabel("Dimension 2")
legend(loc='best');



In [122]:

    
from nengo_gui.ipython import IPythonViz
IPythonViz(model,'pre_learn.py.cfg')



In [123]:

    
#Now learn
with model:
    error = nengo.Ensemble(60, dimensions=2)
    error_p = nengo.Probe(error, synapse=0.03)
    
    # Error = actual - target = post - pre
    nengo.Connection(post, error)
    nengo.Connection(pre, error, transform=-1)
    
    # Add the learning rule to the connection
    conn.learning_rule_type = nengo.PES()
    
    # Connect the error into the learning rule
    learn_conn = nengo.Connection(error, conn.learning_rule)

sim = nengo.Simulator(model)
sim.run(10.0)



In [124]:

    
t=sim.trange()

figure(figsize=(12, 8))
subplot(3, 1, 1)
plot(t, sim.data[inp_p].T[0], c='k', label='Input')
plot(t, sim.data[pre_p].T[0], c='b', label='Pre')
plot(t, sim.data[post_p].T[0], c='r', label='Post')
ylabel("Dimension 1")
legend(loc='best')
title('Learn a communication channel')
    
subplot(3, 1, 2)
plot(t, sim.data[inp_p].T[1], c='k', label='Input')
plot(t, sim.data[pre_p].T[1], c='b', label='Pre')
plot(t, sim.data[post_p].T[1], c='r', label='Post')
ylabel("Dimension 2")
legend(loc='best');

subplot(3, 1, 3)
plot(sim.trange(), sim.data[error_p], c='b')
ylim(-1, 1)
legend(("Error[0]", "Error[1]"), loc='best');
title('Error')









    Out[124]:





<matplotlib.text.Text at 0x1286952d0>



In [125]:

    
from nengo_gui.ipython import IPythonViz
IPythonViz(model,'simple_learn.py.cfg')



In [109]:

    
#Turning learning on and off to test generalization
def inhibit(t):
    return 2.0 if t > 10.0 else 0.0

with model:
    inhib = nengo.Node(inhibit)
    inhib_conn = nengo.Connection(inhib, error.neurons, transform=[[-1]] * error.n_neurons)
    
sim = nengo.Simulator(model)
sim.run(16.0)



In [110]:

    
t=sim.trange()

figure(figsize=(12, 8))
subplot(3, 1, 1)
plot(t, sim.data[inp_p].T[0], c='k', label='Input')
plot(t, sim.data[pre_p].T[0], c='b', label='Pre')
plot(t, sim.data[post_p].T[0], c='r', label='Post')
ylabel("Dimension 1")
legend(loc='best')
title('Learn a communication channel')
    
subplot(3, 1, 2)
plot(t, sim.data[inp_p].T[1], c='k', label='Input')
plot(t, sim.data[pre_p].T[1], c='b', label='Pre')
plot(t, sim.data[post_p].T[1], c='r', label='Post')
ylabel("Dimension 2")
legend(loc='best');

subplot(3, 1, 3)
plot(sim.trange(), sim.data[error_p], c='b')
ylim(-1, 1)
legend(("Error[0]", "Error[1]"), loc='best');
title('Error')









    Out[110]:





<matplotlib.text.Text at 0x11bc87dd0>



In [111]:

    
from nengo_gui.ipython import IPythonViz
IPythonViz(model,'control_learn.py.cfg')



In [112]:

    
#Compute a nonlinear functions
#model.connections.remove(err_fcn) #uncomment to try other fcns
#del err_fcn
model.connections.remove(inhib_conn)
del inhib_conn
model.nodes.remove(inhib)
model.connections.remove(learn_conn)
del learn_conn

def nonlinear(x):
    return x[0]*x[0], x[1]*x[1]

with model:
    err_fcn = nengo.Connection(pre, error, function=nonlinear, transform=-1)
    
    conn.learning_rule_type = nengo.PES(learning_rate=1e-4)
    # Connect the error into the learning rule
    learn_conn = nengo.Connection(error, conn.learning_rule)    
    
sim = nengo.Simulator(model)
sim.run(26.0)



In [113]:

    
t=sim.trange()

figure(figsize=(12, 8))
subplot(3, 1, 1)
plot(t, sim.data[inp_p].T[0], c='k', label='Input')
plot(t, sim.data[pre_p].T[0], c='b', label='Pre')
plot(t, sim.data[post_p].T[0], c='r', label='Post')
ylabel("Dimension 1")
legend(loc='best')
title('Learn a nonlinear function')
    
subplot(3, 1, 2)
plot(t, sim.data[inp_p].T[1], c='k', label='Input')
plot(t, sim.data[pre_p].T[1], c='b', label='Pre')
plot(t, sim.data[post_p].T[1], c='r', label='Post')
ylabel("Dimension 2")
legend(loc='best');

subplot(3, 1, 3)
plot(sim.trange(), sim.data[error_p], c='b')
ylim(-1, 1)
legend(("Error[0]", "Error[1]"), loc='best');
title('Error')









    Out[113]:





<matplotlib.text.Text at 0x124b3cc90>



In [116]:

    
from nengo_gui.ipython import IPythonViz
IPythonViz(model,'square_learn.py.cfg')

This rule can be used to learn any nonlinear vector function
It does as well, or better, than the NEF decoder optimization
It's a 'spike-based' rule... meaning it works in a spiking network
It has been used for 'constant supervision' as well as 'reinforcement learning' (occasional supervision) tasks (Spaun uses it for the RL task)
It moves the focus of learning research from weight changes or 'learning rules' to error signals
Backprop is one way of propagating error signals (unfortunately not bio-plausible)
Pretty much ignores encoders (which should maybe be about capturing all the incoming information, so as to compute any function over that information)

Applications of PES

Classical conditioning

Classical or Pavlovian conditioning uses an unconditioned stimuli (US) (meat for a dog) that ellicits an unconditioned response (UR) (salivating) to cause a conditioned response (CR) (salivating after learning) to be ellicited by a conditioned stimulus (CS) (ringing a bell).

The best known model of this is the Resorla-Wagner model that states: $\Delta V_x = \alpha (\lambda - \sum_x V)$ where $V_x$ is the value of x, $\alpha$ is a learning rate and salience parameter, $\lambda$ is the max value (usually 1). In the model below there is only 1 element in $\sum V$. The difference in brackets is like a reward prediction error.

In this model:

There are three different US that are provided to the model, one after the other.
Each has a different hardwired UR.
There is also a CS provided (a different one for each US)
The model attempts to learn to trigger the correct CR in response to the CS.
After learning, the CR should start to respond before the corresponding UR.



In [12]:

    
import nengo
import numpy as np

D = 3
N = D*50

def us_stim(t):
    # cycle through the three US
    t = t % 3
    if 0.9 < t< 1: return [1, 0, 0]
    if 1.9 < t< 2: return [0, 1, 0]
    if 2.9 < t< 3: return [0, 0, 1]
    return [0, 0, 0]

def cs_stim(t):
    # cycle through the three CS
    t = t % 3
    if 0.7 < t< 1: return [0.7, 0, 0.5]
    if 1.7 < t< 2: return [0.6, 0.7, 0.8]
    if 2.7 < t< 3: return [0, 1, 0]
    return [0, 0, 0]

model = nengo.Network(label="Classical Conditioning")
with model:
    us_stim = nengo.Node(us_stim)
    cs_stim = nengo.Node(cs_stim)

    us = nengo.Ensemble(N, D)
    cs = nengo.Ensemble(N*2, D*2)

    nengo.Connection(us_stim, us[:D])
    nengo.Connection(cs_stim, cs[:D])
    nengo.Connection(cs[:D], cs[D:], synapse=0.2)

    ur = nengo.Ensemble(N, D)
    nengo.Connection(us, ur)
    
    cr = nengo.Ensemble(N, D)
    learn_conn = nengo.Connection(cs, cr, function=lambda x: [0]*D)
    learn_conn.learning_rule_type = nengo.PES(learning_rate=3e-4)

    error = nengo.Ensemble(N, D)
    nengo.Connection(error, learn_conn.learning_rule)
    nengo.Connection(ur, error, transform=-1)
    nengo.Connection(cr, error, transform=1, synapse=0.1)

    stop_learn = nengo.Node([0])
    nengo.Connection(stop_learn, error.neurons, transform=-10*np.ones((N, 1)))



In [13]:

    
from nengo_gui.ipython import IPythonViz
IPythonViz(model,'learning2-conditioning.py.cfg')

Cortical Consolidation

There is evidence that when you first learn a skill, it takes a lot of effort and you tend to perform fairly slowly. We would think of this as requiring a lot of intervention from the basal ganglia in selecting actions. As you get better at the skill you become much faster, and BG is used less because cortex 'takes over' cental aspects of that skill, consolidating it into cortico-cortical connections. This model shows a toy version of this kind of behaviour.

In this model:

there is a slow mapping from pre->wm->target (because of long synaptic time constants)
there is a fast, direct connection from pre->post
the fast connection is trained using the error signal from the slow system
the fast system learns to produce the correct output before the slow system
if you change the 'context' the fast system will learn the output



In [7]:

    
import nengo
import numpy as np

tau_slow = 0.2

model = nengo.Network("Cortical Consolidation")
with model:
    pre_value = nengo.Node(lambda t: np.sin(t))
    
    pre = nengo.Ensemble(100, 1)
    post = nengo.Ensemble(100, 1)
    target = nengo.Ensemble(100, 1)
    nengo.Connection(pre_value, pre)

    conn = nengo.Connection(pre, post, function=lambda x: np.random.random(),
                learning_rule_type=nengo.PES())
    
    wm = nengo.Ensemble(300, 2, radius=1.4)
    context = nengo.Node(1)
    nengo.Connection(context, wm[1])
    nengo.Connection(pre, wm[0], synapse=tau_slow)
    
    nengo.Connection(wm, target, synapse=tau_slow, 
                     function=lambda x: x[0]*x[1])
                     
    error = nengo.Ensemble(n_neurons=100, dimensions=1)
    nengo.Connection(post, error, synapse=tau_slow*2, transform=1) #Delay the fast connection so they line up
    nengo.Connection(target, error, transform=-1)
    
    nengo.Connection(error, conn.learning_rule)

    stop_learn = nengo.Node([0])
    nengo.Connection(stop_learn, error.neurons, transform=-10*np.ones((100,1)))
    
    both = nengo.Node(None, size_in=2) #For plotting
    nengo.Connection(post, both[0], synapse=None)
    nengo.Connection(target, both[1], synapse=None)



In [8]:

    
from nengo_gui.ipython import IPythonViz
IPythonViz(model,'learning3-consolidation.py.cfg')









    



/Users/celiasmi/Documents/nengo/nengo_gui/nengo_gui/ipython.py:58: ConfigReuseWarning: Reusing config. Only the most recent visualization will update the config.
  "Reusing config. Only the most recent visualization will "

Reinforcement Learning

As mentioned in the last lecture, RL is a useful way to think about action selection. You have a set of actions and a set of states, and you figure out the value of each action in each state, letting you construct a big table $Q(s,a)$ which you can use to pick good actions. RL figures out what those values are through trial and error.

$\Delta Q(s,a) = \alpha (R + \gamma Q_{predicted} - Q_{old})$ where $R$ is reward, $\alpha$ is a learning rate and $\gamma$ is a discount factor.

In the model:

the agent has three actions (go forward, turn left, and turn right)
its only sense are three range finders (radar)
initially it should always go forward
it gets reward proportial to its forward speed, but a large negative reward for hitting walls
the error signal is simply the difference between the computed utility and the instantaneous reward
$\Delta Q(s,a) = \alpha (R - Q_{current})$
this error will only be applied to whatever action is currently being chosen, which means it cannot learn to do actions that will lead to future rewards



In [5]:

    
# requires CCMSuite https://github.com/tcstewar/ccmsuite/
import ccm.lib.grid
import ccm.lib.continuous
import ccm.ui.nengo

mymap="""
#########
#       #
#       #
#   ##  #
#   ##  #
#       #
#########

"""

class Cell(ccm.lib.grid.Cell):
    def color(self):
        return 'black' if self.wall else None
    def load(self, char):
        if char == '#':
            self.wall = True

world = ccm.lib.grid.World(Cell, map=mymap, directions=4)

body = ccm.lib.continuous.Body()
world.add(body, x=1, y=3, dir=2)

import nengo
import numpy as np    

def move(t, x):
    speed, rotation = x
    dt = 0.001
    max_speed = 20.0
    max_rotate = 10.0
    body.turn(rotation * dt * max_rotate)
    success = body.go_forward(speed * dt * max_speed)
    if not success: #Hit a wall
        return -1
    else:
        return speed

model = nengo.Network("Simple RL", seed=2)
with model:
    env = ccm.ui.nengo.GridNode(world, dt=0.005)
    
    movement_node = nengo.Node(move, size_in=2, label='reward')
    movement = nengo.Ensemble(n_neurons=100, dimensions=2, radius=1.4)    
    nengo.Connection(movement, movement_node)

    def detect(t):
        angles = (np.linspace(-0.5, 0.5, 3) + body.dir ) % world.directions
        return [body.detect(d, max_distance=4)[0] for d in angles]
    stim_radar = nengo.Node(detect)

    radar = nengo.Ensemble(n_neurons=50, dimensions=3, radius=4)
    nengo.Connection(stim_radar, radar)
    
    
    bg = nengo.networks.actionselection.BasalGanglia(3)
    thal = nengo.networks.actionselection.Thalamus(3)
    nengo.Connection(bg.output, thal.input)
    
    def u_fwd(x):
        return 0.8
    def u_left(x):
        return 0.6
    def u_right(x):
        return 0.7

    conn_fwd = nengo.Connection(radar, bg.input[0], function=u_fwd, learning_rule_type=nengo.PES())
    conn_left = nengo.Connection(radar, bg.input[1], function=u_left, learning_rule_type=nengo.PES())
    conn_right = nengo.Connection(radar, bg.input[2], function=u_right, learning_rule_type=nengo.PES())
        
    nengo.Connection(thal.output[0], movement, transform=[[1],[0]])
    nengo.Connection(thal.output[1], movement, transform=[[0],[1]])
    nengo.Connection(thal.output[2], movement, transform=[[0],[-1]])
    
    errors = nengo.networks.EnsembleArray(n_neurons=50, n_ensembles=3)
    nengo.Connection(movement_node, errors.input, transform=-np.ones((3,1)))
    nengo.Connection(bg.output[0], errors.ensembles[0].neurons, transform=np.ones((50,1))*4)    
    nengo.Connection(bg.output[1], errors.ensembles[1].neurons, transform=np.ones((50,1))*4)    
    nengo.Connection(bg.output[2], errors.ensembles[2].neurons, transform=np.ones((50,1))*4)    
    nengo.Connection(bg.input, errors.input, transform=1)
    
    nengo.Connection(errors.ensembles[0], conn_fwd.learning_rule)
    nengo.Connection(errors.ensembles[1], conn_left.learning_rule)
    nengo.Connection(errors.ensembles[2], conn_right.learning_rule)



In [2]:

    
from nengo_gui.ipython import IPythonViz
IPythonViz(model,'learning5-utility.py.cfg')

Better RL

To improve our RL it would be good to predict future rewards more accurately. It would be good to learn the function $Q(s,a)$. Let's assume that your policy is fixed, so future actions are fixed. As well, future rewards are 90\% as good as current rewards (i.e. they are discounted). Consequently, we have

$Q(s,t) = R(s,t) + 0.9 R(s+1, t+1) + 0.9^2 R(s+2, t+2) + ...$.

So also,

$Q(s+1,t+1) = R(s+1,t+1) + 0.9 R(s+2, t+2) + 0.9^2 R(s+3, t+3) + ...$. $0.9 Q(s+1,t+1) = 0.9 R(s+1,t+1) + 0.9^2 R(s+2, t+2) + 0.9^3 R(s+3, t+3) + ...$.

Substituting this last equation into the first gives $Q(s,t) = R(s,t) + 0.9 Q(s+1, t+1)$. This suggests an error rule: $Error(t) = Q(s-1) - R(s-1) - 0.9 Q(s)$

In this model:

the agent always moves randomly, it's not using what it learns to change its movement (it is just trying to anticipate future rewards)
the agent is given a reward whenever it is in the green square, and a punishment (negative reward) whenever it is in the red square
it learns to anticipate the reward/punishment as shown in the value graph
we convert the error rule into the continuous domain by using a long time constant for s-1 and a short time constant for s (assuming we switch states at each time step).



In [16]:

    
# requires CCMSuite https://github.com/tcstewar/ccmsuite/
import ccm.lib.grid
import ccm.lib.continuous
import ccm.ui.nengo

mymap="""
#######
#     #
# # # #
# # # #
#G   R#
#######
"""

class Cell(ccm.lib.grid.Cell):
    def color(self):
        if self.wall:
            return 'black'
        elif self.reward > 0:
            return 'green'
        elif self.reward < 0:
            return 'red'
        return None
    def load(self, char):
        self.reward = 0
        if char == '#':
            self.wall = True
        if char == 'G':
            self.reward = 10
        elif char == 'R':
            self.reward = -10

world = ccm.lib.grid.World(Cell, map=mymap, directions=4)

body = ccm.lib.continuous.Body()
world.add(body, x=1, y=2, dir=2)

import nengo
import numpy as np 

tau=0.1

def move(t, x):
    speed, rotation = x
    dt = 0.001
    max_speed = 20.0
    max_rotate = 10.0
    body.turn(rotation * dt * max_rotate)
    body.go_forward(speed * dt * max_speed)
    
    if int(body.x) == 1:
        world.grid[4][4].wall = True
        world.grid[4][2].wall = False
    if int(body.x) == 4:
        world.grid[4][2].wall = True
        world.grid[4][4].wall = False

model = nengo.Network("Predict Value", seed=2)
with model:
    env = ccm.ui.nengo.GridNode(world, dt=0.005)

    movement = nengo.Node(move, size_in=2)
    
    def detect(t):
        angles = (np.linspace(-0.5, 0.5, 3) + body.dir) % world.directions
        return [body.detect(d, max_distance=4)[0] for d in angles]
    stim_radar = nengo.Node(detect)
    radar = nengo.Ensemble(n_neurons=50, dimensions=3, radius=4, seed=2,
                noise=nengo.processes.WhiteSignal(10, 0.1, rms=1))
    nengo.Connection(stim_radar, radar)

    def braiten(x):
        turn = x[2] - x[0]
        spd = x[1] - 0.5
        return spd, turn
    nengo.Connection(radar, movement, function=braiten)  
    
    def position_func(t):
        return body.x / world.width * 2 - 1, 1 - body.y/world.height * 2, body.dir / world.directions
    position = nengo.Node(position_func)
    state = nengo.Ensemble(100, 3)
    nengo.Connection(position, state, synapse=None)
    
    reward = nengo.Node(lambda t: body.cell.reward)
        
    value = nengo.Ensemble(n_neurons=50, dimensions=1)

    learn_conn = nengo.Connection(state, value, function=lambda x: 0,
                                  learning_rule_type=nengo.PES(learning_rate=1e-4,
                                                               pre_tau=tau))
    nengo.Connection(reward, learn_conn.learning_rule, 
                     transform=-1, synapse=tau)
    nengo.Connection(value, learn_conn.learning_rule, 
                     transform=-0.9, synapse=0.01)
    nengo.Connection(value, learn_conn.learning_rule, 
                     transform=1, synapse=tau)



In [17]:

    
from nengo_gui.ipython import IPythonViz
IPythonViz(model,'learning6-value.py.cfg')

Unsupervised learning

Hebbian learning
- Neurons that fire together, wire together
- $\Delta \omega_{ij} = \kappa a_i a_j$
- Just that would be unstable
  - Why?
BCM rule (Bienenstock, Cooper, & Munro, 1982)
- $\Delta \omega_{ij} = \kappa a_i a_j (a_j-\theta)$
- $\theta$ is an activity threshold
  - If post-synaptic neuron is more active than this threshold, increase strength
  - Otherwise decrease it
- Other than that, it's a standard Hebbian rule
- Where would we get $\theta$?
  - Need to store something about the overall recent activity of neuron $j$ so it can be compared to its current activity
  - Just have $\theta$ be a pstc-filtered spiking of $a_j$



In [90]:

    
import nengo

model = nengo.Network()
with model:
    sin = nengo.Node(lambda t: np.sin(t*4))
    
    pre = nengo.Ensemble(100, dimensions=1)
    post = nengo.Ensemble(100, dimensions=1)

    nengo.Connection(sin, pre)
    conn = nengo.Connection(pre, post, solver=nengo.solvers.LstsqL2(weights=True))

    pre_p = nengo.Probe(pre, synapse=0.01)
    post_p = nengo.Probe(post, synapse=0.01)

sim = nengo.Simulator(model)
sim.run(2.0)









    



Simulation finished in 0:00:01.



In [91]:

    
plot(sim.trange(), sim.data[pre_p], label="Pre")
plot(sim.trange(), sim.data[post_p], label="Post")
ylabel("Decoded value")
legend(loc="best");



In [93]:

    
conn.learning_rule_type = nengo.BCM(learning_rate=5e-10)

with model:
    trans_p = nengo.Probe(conn, 'weights', synapse=0.01, sample_every=0.01)

sim = nengo.Simulator(model)
sim.run(20.0)









    



Simulation finished in 0:00:08.



In [94]:

    
figure(figsize=(12, 8))
subplot(2, 1, 1)
plot(sim.trange(), sim.data[pre_p], label="Pre")
plot(sim.trange(), sim.data[post_p], label="Post")
ylabel("Decoded value")
ylim(-1.6, 1.6)
legend(loc="lower left")

subplot(2, 1, 2)
# Find weight row with max variance
neuron = np.argmax(np.mean(np.var(sim.data[trans_p], axis=0), axis=1))
plot(sim.trange(dt=0.01), sim.data[trans_p][..., neuron])
ylabel("Connection weight");



In [95]:

    
def sparsity_measure(vector):
    # Max sparsity = 1 (single 1 in the vector)
    v = np.sort(np.abs(vector))
    n = v.shape[0]
    k = np.arange(n) + 1
    l1norm = np.sum(v)
    summation = np.sum((v / l1norm) * ((n - k + 0.5) / n))
    return 1 - 2 * summation

print "Starting sparsity: {0}".format(sparsity_measure(sim.data[trans_p][0]))
print "Ending sparsity: {0}".format(sparsity_measure(sim.data[trans_p][-1]))









    



Starting sparsity: 0.223842561067
Ending sparsity: 0.475518530635

Result: only a few neurons will fire
- Sparsification
What would this do in NEF terms?
- Still represent $x$, but with very sparse encoders (assuming the fucn doesn't change)
This is still a rule on the weight matrix, but functionally seems to be more about encoders than decoders
- What could we do, given that?

The homeostatic Prescribed Error Sensitivity (hPES) rule

Just do them both (Bekolay et al., 2013)
And have a parameter $S$ to adjust how much of each
$\Delta \omega_{ij} = \kappa \alpha_j a_j (S e_j \cdot E + (1-S) a_j (a_j-\theta))$
Works as well (or better) than PES
- Seems to be a bit more stable, but analysis is ongoing
Biological evidence?
- Spike-Timing Dependent Plasticity

Still work to do for comparison, but seems promising
Error-driven for improving decoders
Hebbian sparsification to improve encoders
- Perhaps to sparsify connections (energy savings in the brain, but not necessarily in simulation)



In [ ]: