Stefano Fusi and Xiao-Jing Wang, Arbib book chapter, 2014.

Formalism

For $\mu$th memory, synaptic weights, $w_{ij} \to w_{ij}+\Delta w_{ij}$.

Strength of memory trace $\mu$ can be estimated by the correlation between the synaptic weight matrix and its change due to $\mu$th memory, subtracting from it that part which may be due to chance (when none of the memories being considered are stored in $w_{ij}$) i.e. :

$M^\mu = \frac{1}{N_{syn}} \sum_{ij}w_{ij}\Delta w_{ij} - \left( \frac{1}{N_{syn}} \sum_{ij}w_{ij} \right)\left( \frac{1}{N_{syn}} \sum_{ij}\Delta w_{ij} \right)$

Unbounded synapses: Hopfield model

In a Hopfield network, synapses are unbounded, $\Delta w_{ij} = \pm1$.

Also, mean synaptic weight change is 0.

$\sum_{ij}\Delta w_{ij} = 0$

so,

$M^\mu = \frac{1}{N_{syn}} \sum_{ij}w_{ij}\Delta w_{ij}.$

For the first memory starting from zero weights:

$M^1 = \frac{1}{N_{syn}} \sum_{ij} (\Delta w^1_{ij})^2 = 1$.

Signal for memory $1$ is

$S=\left< M^1\right> = \left< \frac{1}{N_{syn}} \sum_{ij}w_{ij}\Delta w_{ij} \right> = \left< \frac{1}{N_{syn}} \sum_{ij}\sum_{\mu=1}^{p} w_{ij}^p \Delta w_{ij} \right>$

where the average is over all random sequences of $p$ memories weighted by their occurrence probabilities.

For a Hopfield network, $S=1$ for arbitrary $p$ until blackout catastrophe.

Memory noise $N=\sqrt{\left<\left(M^1 - \left<M^1\right>\right)^2 \right>}$.

For a Hopfield model, $N=\frac{1}{2} \sqrt{\frac{p}{N_{syn}}}$. [I'm not getting the factor of $1/2$.]

Memories are accessible while $S>N$.

This gives an upper bound of $p \sim N_{syn}$. However, a more accurate bound for the Hopfield network is $p=0.14C$, where $C$ are the number of synapses per neuron.

Bounded synapses

With bounded synapses, and offline (one-shot initial) learning, $p \sim 0.1C$. Indeed $S\sim 1/\sqrt{p}$, and $N=1/\sqrt{N_{syn}}$.

For a memory system with constant rate of addition of memories, the number of memories $p$ may be used as a proxy for time.

With online learning, a two-state synapse erases any previous information when a new memory comes in. Thus weights are updated with probability $q$.

$S(t)\approx q\exp(-qt) \approx q\exp(-qp)$

$N\approx \frac{1}{2\sqrt{N_{syn}}}$

Catastrophic forgetting occurs when $p\gtrsim \frac{\log(q\sqrt{N_{syn}})}{q}$. This is much smaller than above bounds.

Stability-plasticity dilemna

Slow synapses ($q\to 0$) retain memory over long time scales but don't form memories easily. Initial $S/N$ is large but $p$ is not extensive i.e. does not scale (scales very slowly) with $N_{syn}$. There is a limit to slow synapses ($q\sim 1/\sqrt{N_{syn}}$), that allows the initial $S/N$ to remain finite.

Fast synapses ($q\to 1$) lose memory quickly, but form new ones easily. $p\sim \sqrt{N_{syn}}$, but now initial $S/N$ is not extensive i.e. does not scale with $N_{syn}$.

Multiple time scales

This is one possible solution to the stability-plasticity dilemna. See Roxin and Fusi 2013.

Aside:

Marcus Benna and Stefano optimized the decay of synapses to have large memory capacity and long memory lifetime, and found that the decay should go as $1/\sqrt{t}$. This is reminiscent of a diffusive process, but it is not just a simple diffusion of synaptic receptors [there was some argument against this -- perhaps it was just that there is a biochemical cascade and receptors are bound to the membrane and not just diffusing??]



In [ ]: