Reinforcement Learning Using Spiking Neural Networks

Latest Results

The chart below is based trained neuron responses from each letter of the alphabet being presented 3 times in repetition then having the next letter presented. Neurons, as represented as y-axis values, having points on the graph specific to only 1 letter (3 points in a row) shows the model having learned well and otherwise shows an area learning can improve. Points represent neurons responding (being activated) through firing spikes. Highest performance after parameter adjustment has been measured as 97.6% overall accuracy (TP+TN/TP+FP+TN+FN) and 71.8% precision (TP/TP+FP) to input (while the result has been repeated more repeatability testing is needed). Work is undergoing now to auto-optimize parameters to increase performance even further. Parameter Optimization
$\small\hspace{3pt}Realistic\hspace{1pt}Neuron\hspace{1pt}Property\hspace{1pt}Modeling\hspace{1pt}Key\hspace{1pt}Features: Spike\hspace{1pt}Timing\hspace{1pt}Dependent\hspace{1pt}Plasticity*Active$
$\small Dendrites*Direct\hspace{1pt}to\hspace{1pt}Soma\hspace{1pt}Signaling*Lateral\hspace{1pt}Inhibition * Learning\hspace{1pt}Rate * Neurosci.\hspace{1pt}Toolkit$

26 Char Test Results with Spike Occurences of Neurons

Letters used as stimuli were represented as images with 15 pixels (3x5) and presented sequentially for 100ms at a time 3 times in a row. Spikes occur when a neuron's voltage reaches 10mv. Once a spike occurs the voltages of the other neurons are inhibited which is visible as their large drops on the plot. The inhibition and excitation is greater than wanted (-65mv at some points) but work is being done to scale that better. An area undergoing active work is improving the discrimination of characters close in pixel presentation to each other, for example 'B' and 'D' have the same neuron in the video below. Nate will work further on the performance after his current work on joining a lab or other position, Ignacio as well if he can get the chance but we welcome contributions to this open source project. The video below shows a close up view of voltage response measurement of four neurons trained to respond to example letters (A, B, C, and D). We are trying to improve the simulations eyesight like a human would have trying to read a eye chart! Overall the neurons specialized reasonably well to the input but there is room for greater performance as well.

Intro material: ReadMe and Intro. Authors: Nate and Ignacio. Our code.
Based on: Character Recognition using Spiking Neural Networks.

You may ask, how did the neurons get trained? Let me explain!



In [1]:

    
from IPython.display import HTML;HTML('<center><div style="font-size:1.5em"><iframe width="80%" height="700" src="http://www.youtube.com/embed/DUZldskw51A" loop=1 allowfullscreen></iframe><br><a target="_blank" href="/notebooks/furtherFormulas/3dAnimScatterPlotHdf5.ipynb">Animation</a> <a target="_blank" href="/github/tartavull/snn-rl/blob/master/testVoltage.mp4">Video Download</a></div></center>')









    Out[1]:





Animation Video Download

Training

Values are trained for the model by displaying each character after each other for one spike interval (100ms).

Time and Refractory Period

Time is initally incremented and refractory period status variables are processed. Time&Refrac

First layer and Dirac function

Spikes from the first layer are represented as presynaptic spike times across a 11 epoch (30000ms) timeframe. They are coded into a variable that is has lookups done in it to refer back to when spikes have occured. Spikes found trigger a dirac function which is one of the cofactors in the dendrite and somaDirect equations which build their signal that leads to the soma potential. Dirac Function

Weight

The weights are set at random initially but at a range of values that allows dirac activated dendrite and someDirect signals to combine to create postsynaptic spikes (from output neurons) from the beginning or soon enough afterward. After initial spiking has occured the reinforcement learning comes into effect where character input that causes a presynaptic spike and post synapic spike after that causes a weight increase. The increase is defined by the weighting equations. weight change, returnDeltaW, returnNewW



In [2]:

    
from IPython.display import HTML;HTML('<center><br><div style="font-size:1.5em">Weights of All Neurons During the Training Process with 4 Characters<br><br><iframe width="60%" height="500" src="http://www.youtube.com/embed/-ae2j13XQBU" loop=1 allowfullscreen></iframe><br><a target="_blank" href="/github/tartavull/snn-rl/blob/master/furtherFormulas/3dBarChartAnim.ipynb#returnNewW">3dBarChartAnim</a></div></center>')









    Out[2]:





Weights of All Neurons During the Training Process with 4 Characters


3dBarChartAnim

The below video shows the final weights produced after training 26 neurons with the full alphabet for 30000ms total time. Each neuron is intended to have weights specialized in accordance with only one distinct character.



In [3]:

    
from IPython.display import HTML;HTML('<center><br><div style="font-size:1.5em">Weights of All Neurons After Training with 26 Characters<br><br><iframe width="60%" height="500" src="http://www.youtube.com/embed/8R_BrMk8VYM" loop=1 allowfullscreen></iframe><br><a target="_blank" href="/github/tartavull/snn-rl/blob/master/furtherFormulas/3dBarChartRotatingAnim.ipynb#returnNewW">3dBarChartRotatingAnim</a></div></center>')









    Out[3]:





Weights of All Neurons After Training with 26 Characters


3dBarChartRotatingAnim

Weights of the Neurons After Training with 4 Characters:

3dBarChartGenerator

Tau

The learning rate for the dendrite, tau, is dependednt on the results of the weight calculation. It creates faster learning when a stronger weight is present. Tau

Resistance

Resistance is dependent on tau and assists with allowing the voltage to reach a spiking level even with lower weights and tau. Resistance



In [4]:

    
from IPython.display import HTML;HTML('<center><div style="font-size:1.5em">Relationship Between Weight, Tau, and Resistance:<br><br></div><iframe width="60%" height="500" src="http://www.youtube.com/embed/f9QN9Q1FqPY" loop=1 allowfullscreen></iframe><br><div style="font-size:1.5em"><a target="_blank" href="/github/tartavull/snn-rl/blob/master/furtherFormulas/3dBarWTauRAnim.ipynb#returnNewW">3dBarWTauRAnim</a></div></center>')









    Out[4]:




Relationship Between Weight, Tau, and Resistance:


3dBarWTauRAnim

Dendrite Input

This is one receptor of presynaptic input that translates signals into other values that are passed to the soma. The equation below specifies how. Cofactors in the equation are devided by their units to normalize the values.
Dendrite equ (equ. 1) in the paper is $\tau_d^i*d_(I^i_d(t))/dt=-I_d^i(t)+R_d^i*w^i*\delta (t-t_f^i)$
Equ in Brian2 (open source prog): dv/dt = (((-v/mV)+((r/mV)*(w/volt)*(dirac/volt)))/(tau))*mV : volt Dendrite Equations

Soma Direct Input

Another receptor of presynaptic input.
SomaDirect equ (equ. 4) in the paper is $\tau_s*d_(I_s(t))/dt=-I_s(t)\sum_{i}w^i*\delta (t-t_f^i)$
In Brian2: dv/dt = (((-v/mV)+(summedWandDirac/volt))/(tauS))*mV : volt SomaDirect Equations

Lateral Inhibition

Due to competition amongst neurons inhibition signals are sent from one to another when they receive input. A winner-take-all type of implementation was created where inhibition is auto tuned, the neuron with the greatest soma membrane potential change is the only one allowed to create a post-synaptic spike. The membrane potential is scaled based on the inhibition. Lateral Inhibition

Soma Membrane Potential Charge (Um)

Um (equ. 5) in paper: $\tau_m*d_(U_m(t))/dt=-U_m(t)+R_m(I_d(t)+I_s(t))$
Um prior to lat. inh.: dprelimV/dt = (-prelimV+((Rm/mV)*(SynI+DendI*1.0)))/(tauM) : volt (unless refractory)
Um after lat. inh.: dv/dt = v/(1*second): volt
Soma Membrane Potential Equations

Spikes of Neurons During TrainingThe dots represent a spike fired for an input character stimulus. Notice how over time the neurons specialize in the way they designate themselves for only one character. That is the reinforcement learning causing the neurons to specilize and in this example it takes toward 10000ms for that. In some training simulations they do not all correctly become designated to one character.

Below the spikes generated during training with the full alphabet are shown. Greater specialization toward the end shows the degree of effectiveness in the training.

26 Char Training with Neuron Spikes

Click to enlarge

Check for resets

After each spike interval has passed some logic is included to reset values upon spike occurences. Resets

Testing

The weights and subsequently tau and resistance generated are used as the trained model values and tests are run to evaluate the performance. Input characters are presented three times in a row for three spike intervals (300ms total). Observed spikes fired and not are compared to expected values and performance is reported. intitializeTrainedModelParameters, evaluateClassifierPerf, OutputEvaluationResults

Further descriptions of the simulation results are here

Main code for the simulation:



In [ ]:

    
'''
	Copyright 2015, Nate Sutton and Ignacio Tartavull
	This is the main file for the Spiking Neual Networks
	Reinforcement Learning simulation.  

	More info:
	https://github.com/tartavull/snn-rl/blob/master/README.md
	http://nbviewer.ipython.org/github/tartavull/snn-rl/blob/master/notebooks/introduction.ipynb
	http://nbviewer.ipython.org/github/tartavull/snn-rl/blob/master/FFSSN.ipynb
'''

from furtherFormulas.architecture_further_formulas import *
from furtherFormulas.cofactorCalculations import *
from furtherFormulas.timeAndRefracCalcs import *
from furtherFormulas.outputPrinting import *
from furtherFormulas.testingProcesses import *
from furtherFormulas.lateralInhibition import *
from furtherFormulas.generalUtilities import *

timeAndRefrac = timeAndRefrac	

class gupta_paper:
	'''
		Main program variables are set the simulation is run.  Specific neuron models are
		defined for computing processing of electrophysiology in the soma, direct to soma
		signals, and active dendrites.  Equations with variables
		are defined in the equations() objects.
	'''
	neuralnet = Network()
	dictionary = dictionary()
	spiketimes = dictionary.spikeTimes(dictionaryLongitude, spikeInterval, testingSpikesPerChar, testingEpochs)
	trainingSpikeTimes = dictionary.spikeTimes(dictionaryLongitude, spikeInterval, trainingSpikesPerChar, trainingEpochs)
	LIK = SpikeGeneratorGroup(N=15, indices=spiketimes[:,0], times=spiketimes[:,1]*ms)
	# W = W and other lines below are to avoid an odd 'reference before assignment' error
	W = W
	tauM = tauM
	neuronIndex = neuronIndex
	generalClockDt = generalClockDt
	runTime = runTime
	runTimeScaling = runTimeScaling
	evaluateClassifier = evaluateClassifier
	accelerateTraining = accelerateTraining
	diracScaling = diracScaling
	somaDirectScaling = somaDirectScaling
	negativeWeightReinforcement = negativeWeightReinforcement
	positiveWeightReinforcement = positiveWeightReinforcement
	timeAndRefrac = timeAndRefrac	
	testRun = testRun
	latInhibSettings = latInhibSettings
	standardPrint = standardPrint
	verbosePrint = verbosePrint
	testingRunTime = testingRunTime
	optResultsFile = optResultsFile
	minWeightsRand = minWeightsRand
	maxWeightsRand = maxWeightsRand
	totalSpikeIntervals = totalSpikeIntervals

	def run_model(self):
		neuralnet = self.neuralnet
		dictionary = self.dictionary

		eqs = Equations('''
			dv/dt = v/(1*second): volt
			dprelimV/dt = (-prelimV+((Rm/mV)*(SynI+DendI*1.0)))/(tauM) : volt (unless refractory)
			Rm = 80*mV : volt
			tauM = 30*ms : second
	        V : volt
	        DendI : volt
	        SynI : volt
	        v2 : volt	
			UmSpikeFired : volt	
			beginRefrac : volt
		    ''')			

		dendriteEqs = Equations('''
			dv/dt = (((-v/mV)+((r/mV)*(w/volt)*(dirac/volt)))/(tau))*mV : volt
			V : volt
	        r : volt
	        w : volt
	        dirac : volt
	        tau : second
	        v2: volt
			''')

		directToSomaEqs = Equations('''
			dv/dt = (((-v/mV)+(summedWandDirac/volt))/(tauS))*mV : volt
			tauS = 2*ms : second
			V : volt
			summedWandDirac : volt
			v2 : volt
			spikeFired : boolean
			''')		

		class ADDSNeuronModel(NeuronGroup, gupta_paper): 
			'''
				This is the model used for electrophysiology occuring in the Soma
			'''
			neuronIndex = self.neuronIndex
			generalClockDt = self.generalClockDt

			def __init__(self, params):
				self = parseArgs(self, sys.argv, dictionaryLongitude)		

				NeuronGroup.__init__(self, N=dictionaryLongitude, model=eqs,threshold='v>10*mV', reset='v=-0.002 * mV; dv=0; v2=10*mV;UmSpikeFired=1*mV;beginRefrac=1*mV;inhibitionVoltage=prelimV',refractory=8*ms,clock=Clock(dt=self.generalClockDt))
				@network_operation(dt=self.generalClockDt)
				def additionToNetwork(): 
					neuronIndex = self.neuronIndex
					timeAndRefrac.spikeIntervalCounter = (floor(timeAndRefrac.time/timeAndRefrac.spikeIntervalUnformatted) * timeAndRefrac.spikeIntervalUnformatted)*10

					def dendCalcs(neuronIndex, ADDSObj, dendObj, spiketimes, evaluationActive):
						'''
							Below sequentially Dirac, Tau, then Resistance are calculated every end of a spike-time interval.
							The resulting Dend I is added to the Um calc for the ADDS soma.
						'''
						timeAndRefrac = self.timeAndRefrac

						# Dirac
						dendObj[neuronIndex].dirac = diracCalc(dendObj, neuronIndex, spiketimes, timeAndRefrac.time, timeAndRefrac.lastSpikeInterval)

						# Initialize weights
						if (evaluationActive==False and timeAndRefrac.time == 0.000):
							dend[neuronIndex].w = W[neuronIndex]*volt

						if (evaluationActive==False and timeAndRefrac.refractoryPointSwitch==True):
							# Only change weights of neuron fired
							if ADDSObj.UmSpikeFired[neuronIndex] == 1*mV:
								# Weights
								WeightChangeCalculation(neuronIndex, spiketimes, timeAndRefrac.time, self.negativeWeightReinforcement, self.positiveWeightReinforcement, M, dendObj)	
							# Tau
							tauDCalc(neuronIndex, dendObj)
							# Resistance
							resistanceCalc(neuronIndex, dendObj, self.tauM)

						# TODO: check do I need additional loop below?
						for indexOfDend in range(dictionaryLongitude):
							ADDSObj.DendI[indexOfDend] = sum(dendObj[indexOfDend].v[:])*dendCalcScaling

						#print 'ADDSObj.t',ADDSObj.t,'ADDSObj.DendI',ADDSObj.DendI,'neuronIndex',neuronIndex,'dendObj[neuronIndex].dirac',dendObj[neuronIndex].dirac,'dendObj[neuronIndex].tau',dendObj[neuronIndex].tau,'dendObj[neuronIndex].w',dendObj[neuronIndex].w,'dendObj[neuronIndex].r',dendObj[neuronIndex].r

					def somaDirectCalcs(neuronIndex, ADDSObj, somaDirectObj, dendObj):
						dotProductWandDirac =  sum(w*d for w,d in zip(dendObj[neuronIndex].w[:], dendObj[neuronIndex].dirac[:]))
						#somaDirectObj.summedWandDirac[neuronIndex] = ((dotProductWandDirac*volt)/(volt**2))*self.somaDirectScaling
						somaDirect.summedWandDirac[neuronIndex] = ((dotProductWandDirac*volt)/(volt**2))*self.somaDirectScaling

						for neuronNumber in range(dictionaryLongitude):
							#ADDSObj.SynI[neuronNumber] = somaDirectObj.v[neuronNumber]		
							ADDS.SynI[neuronNumber] = somaDirect.v[neuronNumber]		

						#print 'ADDSObj.t',ADDSObj.t,'ADDSObj.SynI',ADDSObj.SynI,'neuronIndex',neuronIndex,'somaDirectObj.summedWandDirac',somaDirectObj.summedWandDirac[neuronIndex],'dendObj[neuronIndex].w',dendObj[neuronIndex].w,'dendObj[neuronIndex].dirac',dendObj[neuronIndex].dirac

					def mainSimulationCalcs(ADDSObj, dendObj, somaDirectObj, spiketimes, evaluationActive):
						'''
							dend then somaDirect calcs are done which are then used to set lat inhib.
							Soma Um calcs are done automatically using equations entered for brian
							once dend and somaDirect are updated
						'''
						preTNorm = self.timeAndRefrac.time
						tNorm = preTNorm - (floor((preTNorm/.001)*.01) * .1)
						
						self.timeAndRefrac = timePeriodAndRefractoryCalcs(self.timeAndRefrac)

						if (evaluationActive==True) and (timeAndRefrac.time == 0.000 or timeAndRefrac.time == 0.001):
							initializeTrainedModelParameters(dendObj)

						# Option to accelerate computations for training
						if self.accelerateTraining == False or (evaluationActive == False and (tNorm <= .005 or tNorm >= .096)):													
							if self.accelerateTraining == True and (tNorm >= .096 and tNorm < .099):
								for i in range(dictionaryLongitude):
									ADDSObj.DendI[i]=0*mV
									ADDSObj.SynI[i]=0*mV
									ADDSObj.prelimV[i]=0*mV
									ADDSObj.v[i]=0*mV								
									for i2 in range(len(dend[0].v)):
										dendObj[i].v[i2] = 0*mV
									somaDirectObj.v[i] = 0*mV		

							for neuronIndex in range(dictionaryLongitude):
								dendCalcs(neuronIndex, ADDSObj, dendObj, spiketimes, evaluationActive)

								somaDirectCalcs(neuronIndex, ADDSObj, somaDirectObj, dendObj)								

							ADDSObj.v, self.latInhibSettings = lateralInhibition(ADDSObj, self.timeAndRefrac, self.latInhibSettings)

							#if ADDSObj.t > 100*ms:
								#for i in range(dictionaryLongitude):
								#	somaDirectObj.summedWandDirac[i] += 20*mV
								#print 'ADDS.t',ADDS.t,'ADDS.SynI',ADDS.SynI,'ADDS.DendI',ADDS.DendI,'ADDSObj.v',ADDSObj.v,'somaDirectObj.summedWandDirac',somaDirectObj.summedWandDirac,'somaDirectObj.v',somaDirectObj.v

							for neuronIndex in range(dictionaryLongitude): 
								ADDSObj.v2, somaDirectObj.v2, self.timeAndRefrac = checkForResets(neuronIndex, ADDSObj, dendObj, somaDirectObj, self.timeAndRefrac)

							ADDSObj.UmSpikeFired, self.testRun = evaluateClassifierPerf2(ADDSObj, self.testRun)

						#roundedSecondsTime = math.floor(Decimal(format((ADDSObj.t), '.1f'))/Decimal(format((1.0*second), '.1f')))
						#if printWeights < roundedSecondsTime:
						#	self.printWeights = roundedSecondsTime; printWeights(dendObj);
					if self.evaluateClassifier == False:
						mainSimulationCalcs(ADDS, dend, somaDirect, self.trainingSpikeTimes, False)
					else:
						mainSimulationCalcs(ADDS, dend, somaDirect, self.spiketimes, True)					

				self.contained_objects.append(additionToNetwork)				

		class DendriteNeuronModel(NeuronGroup):
			generalClockDt = self.generalClockDt
			def __init__(self, params): 
				NeuronGroup.__init__(self, N=15, model=dendriteEqs,clock=Clock(dt=self.generalClockDt))
				@network_operation(dt=self.generalClockDt)
				def additionToNetwork(): 
					placeHolderForLaterContent = True
				self.contained_objects.append(additionToNetwork)

		class SomaDirectNeuronModel(NeuronGroup): 
			generalClockDt = self.generalClockDt
			def __init__(self, params): 
				NeuronGroup.__init__(self, N=dictionaryLongitude, model=directToSomaEqs,clock=Clock(dt=self.generalClockDt))
				@network_operation(dt=self.generalClockDt)
				def additionToNetwork(): 
					placeHolderForLaterContent = True
				self.contained_objects.append(additionToNetwork)		

		dend = [None]*dictionaryLongitude
		weightMonitors = [None]*dictionaryLongitude
		for firstLayerIndex in range(dictionaryLongitude):
			dend[firstLayerIndex] = DendriteNeuronModel(15)	
			weightMonitors[firstLayerIndex] = StateMonitor(dend[firstLayerIndex], 'w', record=True)
			neuralnet.add(dend[firstLayerIndex])
			neuralnet.add(weightMonitors[firstLayerIndex])
		somaDirect = SomaDirectNeuronModel(dictionaryLongitude)
		neuralnet.add(somaDirect)
		ADDS = ADDSNeuronModel(self)			
		M = SpikeMonitor(ADDS)
		Mv = StateMonitor(ADDS, 'V', record=True)
		UmM = StateMonitor(ADDS, 'v2', record=True)
		self.M = M # for ipython compatibility
		self.UmM = UmM 
		self.weightMonitors = weightMonitors

		neuralnet.add(ADDS)
		neuralnet.add(M)
		neuralnet.add(UmM)
		if (ADDS.evaluateClassifier==True): ADDS.runTime = ADDS.testingRunTime
		ADDS.runTime *= ADDS.runTimeScaling # scaling factor
		if ADDS.standardPrint: neuralnet.run(ADDS.runTime,report='text')
		else: neuralnet.run(ADDS.runTime,report='stderr')

		OutputEvaluationResults(dend, self.testRun, ADDS.verbosePrint, ADDS.evaluateClassifier)
		accuracyPerc = totalCorrectPercentage()
		precisionPerc = precisionPercentage()
		writeOptimizationResults(ADDS, self.testRun, accuracyPerc)

		neuronToPlot = 1
		colors = ['r']*1+['g']*1+['b']*1+['y']*1
		colors = ['blue', 'green', 'magenta', 'cyan']
		subplot(211)
		plot(M.t/ms, M.i, '.')
		legend(['A','B','C','D'], loc='upper left')			
		subplot(212)
		plot(UmM.t, UmM.v2.T/mV)	
		xlabel('Time (ms)')
		ylabel('Membrane Potential (mV)')
		if (showPlot==True):
			show()	

		return evaluateClassifier, precisionPerc

def main():
	run_gupta_paper = gupta_paper()
	evaluateClassifier, precisionPerc = run_gupta_paper.run_model()
	if evaluateClassifier: print precisionPerc
	return precisionPerc

if  __name__ =='__main__':main()