Now that we have imported the raw data into a nice hdf5 data storage and have a nice spike detector, we can take a look at actual known network structures and try to visualize what actually causes them to spike, and what doesn't.
We start by doing the usual stuff, and by defining the methods we will need to detect spikes and create nice animations using matplotlib.
The code below is copied from the second worksheet (Spike Detection)
In [2]:
# Some core imports
import os
import sys
from subprocess import call
import time
import pandas
import numpy as np
import h5py
import theano
import theano.tensor as T
import theano.tensor.nnet as tnn
import matplotlib.pyplot as plt
# These are IPython specific
from IPython.display import display, clear_output
%matplotlib inline
%load_ext cythonmagic
In [3]:
# Core configuration
datadir = "/data/quick/connectomics"
mirror = "http://www.causality.inf.ethz.ch/data/connectomics"
global nbdir
if (not 'nbdir' in globals()):
nbdir = os.getcwd()
os.chdir(datadir)
In [4]:
store = h5py.File(datadir + '/store.h5')
print store.keys()
In [5]:
net = store['small-1'] # We will use this network for initial testing purposes
print net.keys()
In [6]:
import matplotlib.animation as manimation
from IPython.display import HTML, FileLink
from base64 import b64encode
import matplotlib
matplotlib.verbose.set_level('silent')
def display_video_inline(filename):
video = open(filename, "rb").read()
video_encoded = b64encode(video).decode('ascii')
video_tag = '<video controls alt="Video" src="data:video/x-m4v;base64,{0}"/>'.format(video_encoded)
return HTML(data=video_tag)
def display_video(filename):
link = FileLink(os.path.abspath(filename))
video_tag = '<video controls alt="Video" src="{0}{1}" /><br/><a href="{0}{1}">Source</a>'.format(link.url_prefix,link.path)
return HTML(data=video_tag)
def display_video_link(filename):
link = FileLink(os.path.abspath(filename))
video_tag = '<a href="{0}{1}">{1}</a>'.format(link.url_prefix,link.path)
return HTML(data=video_tag)
video_dpi = 100
FFMpegWriter = manimation.writers['ffmpeg']
metadata = dict(title='Connectomics Visualization Video', artist='K. Londenberg')
In [7]:
print net.keys()
net = store['normal-1']
In [11]:
spd = net['signalDistanceMatrix'].value
spdflat = pandas.Series(spd.reshape((spd.shape[0]*spd.shape[1])))
spatial = net['spatialDistanceMatrix'].value
for v in sorted(spdflat.unique()):
# the histogram of the data
if (v==0.0):
continue
selector = (sdd==v)
f = pandas.Series(spatial[selector].flatten())
avg = f.mean()
med = f.median()
if (len(f)<10):
continue
plt.xlim(0.0, 1.5)
plt.ylim(0.0, 1.5)
plt.text(1.0,1.4, "Mean=%f" % (avg))
plt.text(1.0,1.3, "Median=%f" % (med))
n, bins, patches = plt.hist(f, np.linspace(0.0, 1.5, 50), normed=True, cumulative=False, facecolor='g', alpha=0.75)
plt.xlabel('Distance')
plt.ylabel('Counts')
plt.title('Neuron Distance for Signal Distance %r' % (v))
plt.show()
Neuron distance and signal distance don't correlate in any significant way here, this is pretty obvious. So we won't take these into account.
For every neuron spike we look back (and ahead) in a time window and record for every other neuron if these other neurons also spiked at certain points in time. We aggregate this info by spike, and group the "incoming" spikes by signal distance.
If you didn't understand that, it's ok. The visualization will make it much clearer.
In [1]:
def spike_causality_counts(net, intervals, do_full_trace=True, spatial_distance_range=None):
sdd = net['signalDistanceMatrix'].value
fluor = net['fluorescence'].value
sddflat = pandas.Series(sdd.flatten())
signal_distances = sorted(sddflat.unique())
spikes = net['spikes'].value
ncount = fluor.shape[0]
maxtime = fluor.shape[1]
from sys import stdout
print spikes.shape
spike_locations = spikes.nonzero()
# Calculate max/min range of interest
mini = intervals[0][0]
maxi = intervals[0][1]
for (mn,mx) in intervals:
assert mn<mx
mini = min(mn, mini, mx)
maxi = max(mn, maxi, mx)
print "mx=%d mn=%d -> mini=%d ; maxi=%d" % (mn,mx, mini, maxi)
if (spatial_distance_range is not None):
spatial = net['spatialDistanceMatrix'].value
assert maxi>mini
print "-> mini=%d ; maxi=%d" % (mini, maxi)
result = np.zeros((1+len(signal_distances), len(intervals)), dtype=np.float64)
result_sample_size = np.zeros((1+len(signal_distances), len(intervals)), dtype=np.int32)
nlen = maxi-mini+1
ssizes = [0.0]*(1+len(signal_distances))
if (do_full_trace):
spike_trace = np.zeros((1+len(signal_distances), len(spike_locations[0]), nlen))
maxfluor_trace = np.zeros((1+len(signal_distances), len(spike_locations[0]), nlen))
for spi in range(len(spike_locations[0])):
neuron = spike_locations[0][spi]
t = spike_locations[1][spi]
assert spikes[neuron,t] == 1.0
#clear_output()
if (spi % 1000==0):
#clear_output()
print "Counting causing spikes of neuron %d at time %d: %d of %d " % (neuron, t, spi, len(spike_locations[0]))
stdout.flush()
sdslice = sdd[:,neuron].flatten() # All neurons signaling to this one
for i in range(1+len(signal_distances)):
selector = None
if (i==0):
src_selector = np.ones((ncount), dtype=np.int32)
else:
# Select all neurons which can signal with distance signal_distance[i+1] to current neuron
src_selector = (sdslice==signal_distances[i-1])
src_ncount_nr = float(np.sum(src_selector))
if (spatial_distance_range is not None):
c1 = spatial[neuron,:] >= spatial_distance_range[0]
c2 = spatial[neuron,:] < spatial_distance_range[1]
src_selector = src_selector & c1 & c2
src_ncount = float(np.sum(src_selector))
ssizes[i] += src_ncount
if (src_ncount==0.0):
if (i==1):
spike_trace[i,spi,-maxi-1]=1.0
continue
# Select all selected neurons spikes in region of interest around the current spike
#print "Time = %d, mini=%d, maxi=%d" % (t, mini, maxi)
soff = max(0, min(t+mini, maxtime))
eoff = max(0, min(t+maxi+1, maxtime))
if (eoff-soff==0):
if (i==1):
print "Time = %d, mini=%d, maxi=%d - soff=%d, eoff=%d, maxtime=%d" % (t, mini, maxi, soff, eoff, maxtime)
spike_trace[i,spi,-maxi-1]=1.0
continue
rs = spikes[src_selector,slice(soff,eoff)]
if (i==1):
assert np.all(rs[:,-maxi-1]==1.0)
if (rs.shape[1]<nlen):
rspikes = np.zeros((rs.shape[0], nlen), dtype=np.float32)
rspikes[:,-rs.shape[1]:] = rs
else:
rspikes = rs
# And sum up their counts at the corresponding time slices
rmean = np.mean(rspikes, axis=0)
if (i==1):
assert rmean[-maxi-1]==1.0
mlen = rmean.shape[0]
if (do_full_trace):
rf = fluor[src_selector,slice(soff,eoff)]
if (rf.shape[1]<nlen):
rfluor = np.zeros((rf.shape[0], nlen), dtype=np.float32)
rfluor[:,-rf.shape[1]:] = rf
else:
rfluor = rf
rmfluor = np.mean(rfluor, axis=0)
assert len(rmean)==nlen
spike_trace[i,spi,:] = rmean
maxfluor_trace[i,spi,:] = rmfluor
for iidx,(mn,mx) in enumerate(intervals):
mi = min(max(0,mlen-1+mn), mlen)
ma = min(max(0,mlen-1+mx), mlen)
if (ma>mi):
rs = float(np.mean(rspikes[mi:ma]))
result[i, iidx] += rs
result_sample_size[i,iidx] += 1
print "Sample Sizes were %r" % (ssizes)
if (do_full_trace):
return (np.log(result)-np.log(result_sample_size), signal_distances, spike_trace, maxfluor_trace)
return (result/result_sample_size, signal_distances)
In [13]:
spks = spike_causality_counts(store['normal-4-lownoise'], [(-10, 10),(-200,-1),(-10,0),(-50,-11),(-200,-51)])
In [14]:
(result, signal_distances, spike_trace, fluor_trace) = spks
In [15]:
fluor_mean = np.mean(fluor_trace, axis=1)
spike_mean = np.mean(spike_trace, axis=1)
fluor_variance = np.var
for i in range(len(signal_distances)+1):
# the histogram of the data
fig = plt.Figure(figsize=(12,8))
fluor = fluor_mean[i,:]
spikes = spike_mean[i,:]
#print fluor
#print spikes
#plt.plot(np.linspace(-len(fluor),0.0, len(fluor)), fluor/float(np.max(fluor)), color='g', alpha=0.3)
plt.plot(np.linspace(-len(spikes),0.0, len(spikes)), np.sqrt(spikes), color='r', alpha=1.0)
plt.xlabel('t')
if (i==0):
dist = "All"
else:
dist = signal_distances[i-1]
plt.title('Fluorescence and spikes for signal distance %r' % (dist))
plt.show()
The graphs above are pretty interesting. They lead to a few conclusions:
Seeing a neuron spike in the "silent" range roughly 10 to 60 time steps before the reference neuron's spike leads to the almost certain conclusion that the neuron doesn't cause the reference neuron to fire.
Let's zoom in to the interesting details now.
In [17]:
fluor_mean = np.mean(fluor_trace, axis=1)
spike_mean = np.mean(spike_trace, axis=1)
fluor_stddev = np.std(fluor_trace, axis=1)
assert spike_mean[1,-11]==1.0
def plot_traces(title, ind, show_fluor, slc):
sel = np.zeros((fluor_mean.shape[0]), dtype=np.int8)
for i in ind:
sel[i] = 1
fig = plt.Figure(figsize=(16,10))
print sel
if (len(ind)>1):
fluor = np.mean(fluor_mean[sel,-90:], axis=0)
spikes = np.mean(spike_mean[sel,-90:], axis=0)
std = np.mean(fluor_stddev[sel,-90:], axis=0)
else:
fluor = fluor_mean[ind[0],-90:]
spikes = spike_mean[ind[0],-90:]
std = fluor_stddev[ind[0],-90:]
if (ind[0]==1):
assert spike_mean[1,-11]==1.0
assert spikes[-11]==1.0
plt.ylim(-0.1, 1.2)
#plt.plot(np.linspace(-len(spikes)+11.0,11.0, len(spikes)), spikes, color='b', alpha=1.0)
plt.plot(np.linspace(-len(spikes)+11.0,11.0, len(spikes)), spikes**0.3, color='r', alpha=0.4)
if (show_fluor):
plt.plot(np.linspace(-len(spikes)+11-0,11.0, len(spikes)), fluor, color='g', alpha=1.0)
plt.plot(np.linspace(-len(spikes)+11-0,11.0, len(spikes)), fluor+std, color='g', alpha=0.6)
plt.plot(np.linspace(-len(spikes)+11-0,11.0, len(spikes)), fluor-std, color='g', alpha=0.6)
plt.fill_between(np.linspace(-len(spikes)+11-0,11.0, len(spikes)), fluor+std,
fluor-std, facecolor='g', alpha=0.3)
plt.title(title)
plt.show()
In [19]:
plot_traces('All Neurons', [0], False, slice(-90))
plot_traces('Reference Neuron', [1], True, slice(-200))
plot_traces('Connected to Reference, 1 Hop ', [2], False, slice(-90))
plot_traces('Connected to Reference, 2 Hops ', [3], False, slice(-90))
plot_traces('Connected to Reference, 3 Hops ', [4], False, slice(-90))
plot_traces('Connected to Reference, 4 Hops ', [5], False, slice(-90))
plot_traces('Connected to Reference, 5 Hops ', [6], False, slice(-90))
plot_traces('Connected to Reference, 6 Hops ', [7], False, slice(-90))
plot_traces('Connected to Reference, Infinite Hops', [8], False, slice(-90))
Interesting, indeed. It should be possible to use the correlation within a very tight range around the time of the spike as a feature for training a classifier to predict the signal distance. But we need to take a few more factors into account:
Because it's easy, we start with checking whether spatial distance affects measured spikes due to diffusion. If it does, we might have to change our spike detector to be less sensitive to these effects.
In [422]:
net = store['normal-4-lownoise']
spatial = net['spatialDistanceMatrix'].value
sps = pandas.Series(spatial.flatten())
spks_near = spike_causality_counts(net, [(-10, 10),(-200,-1),(-10,0),(-50,-11),(-200,-51)], True, (0.0, sps.quantile(0.3)))
spks_far = spike_causality_counts(net, [(-10, 10),(-200,-1),(-10,0),(-50,-11),(-200,-51)], True, (sps.quantile(0.7), float('inf')))
In [425]:
(result, signal_distances, spike_trace, fluor_trace) = spks_near
fluor_mean = np.mean(fluor_trace, axis=1)
spike_mean = np.mean(spike_trace, axis=1)
fluor_stddev = np.std(fluor_trace, axis=1)
plot_traces('Near: All Neurons', [0], False, slice(-90))
#plot_traces('Near: Sig1 Neurons', [2], False, slice(-90))
#plot_traces('Near: Sig2 Neurons', [3], False, slice(-90))
#plot_traces('Near: SigInf Neurons', [8], False, slice(-90))
(result, signal_distances, spike_trace, fluor_trace) = spks_far
fluor_mean = np.mean(fluor_trace, axis=1)
spike_mean = np.mean(spike_trace, axis=1)
fluor_stddev = np.std(fluor_trace, axis=1)
plot_traces('Far: All Neurons', [0], False, slice(-90))
#plot_traces('Far: Sig1 Neurons', [2], False, slice(-90))
#plot_traces('Far: Sig2 Neurons', [3], False, slice(-90))
#plot_traces('Far: Sig3 Neurons', [4], False, slice(-90))
#plot_traces('Far: SigInf Neurons', [8], False, slice(-90))
Light Diffusion to nearby neurons doesn't seem to have an effect on the detected spikes.
We should create a running statistic of spike distance or frequency and see what we can deduce from that. But first we take a look at average frequencies and how they are distributed and how that correlates with the connectivity of neurons.
In [491]:
spikes = net['spikes'].value
spikefreq = pandas.Series(np.mean(spikes, axis=1))
spikefreq.describe()
fig = plt.Figure(figsize=(12,8))
plt.hist(spikefreq, 40, normed=True, log=True, color='g', alpha=0.8)
plt.show()
The Frequencies don't seem to follow any standard distribution. We should take more networks into account.
In [451]:
import networkx as nx
def create_nx_graph(net, digraph=True):
''' Load a neural network with known connectivity as a NetworkX directed graph'''
if (not 'connectionMatrix' in net):
return None
if (digraph):
G = nx.DiGraph()
else:
G = nx.Graph()
conn = net['connectionMatrix']
for i in range(conn.shape[0]):
G.add_node(i)
for i in range(conn.shape[0]):
for j in range(conn.shape[0]):
if (conn[i,j]>0.0):
G.add_edge(i,j)
return G
G = create_nx_graph(net, True)
print G
In [447]:
print spikefreq.describe()
lowfreq_neurons = np.nonzero(spikefreq<=spikefreq.quantile(0.003))[0]
highfreq_neurons = np.nonzero(spikefreq>=spikefreq.quantile(1.0-0.003))[0]
print lowfreq_neurons
print highfreq_neurons
In [475]:
indegrees = pandas.Series(np.array([float(G.in_degree(i)) for i in range(len(spikefreq))], dtype=np.float32))
outdegrees = pandas.Series(np.array([float(G.out_degree(i)) for i in range(len(spikefreq))], dtype=np.float32))
ds = pandas.DataFrame({ 'spikefreq' : spikefreq, 'indegree' : indegrees, 'outdegree' : outdegrees})
In [498]:
print '''Pearson Correlation Coefficients between average spike frequency and
Indegree: %f
Outdegree: %f
''' % (ds.spikefreq.corr(ds.indegree, method='pearson'),
ds.spikefreq.corr(ds.outdegree, method='pearson'))
In [499]:
fig = plt.Figure(figsize=(12,8))
plt.hist(ds.indegree, 20, normed=True, color='g', alpha=0.8)
plt.title('Indegree distribution')
plt.show()
fig = plt.Figure(figsize=(12,8))
plt.hist(ds.outdegree, 20, normed=True, color='g', alpha=0.8)
plt.title('Outdegree distribution')
plt.show()
In [509]:
# Let's see how sparse the spikes are actually distributed
s = np.sum(spikes, axis=0)
spikelocs = np.array(np.nonzero(s)[0], dtype=np.int32)
spikenz = pandas.Series(s[spikelocs])
print len(spikelocs)
print spikes.shape[1]
print spikes.shape[1]/len(spikelocs)
x = pandas.Series(spikenz)
plt.Figure(figsize=(12,8))
plt.hist(x, 30, color='r', alpha=0.8)
plt.show()
spikenz.describe()
Out[509]:
We will count the number of spikes a neuron received from incoming neurons since it's last spike. We will create a set of exhaustive probability mass functions from that, each of which we will at first visualize as histograms.
Let $ NE $ be the set of neurons, with $ ||NE|| $ be the size of this set.
Let $ Pa(n) $ be the true set of parents of neuron $ n \in NE $, $ ||Pa(n)|| $ denotes the size of this set. We use the variable $ cap(n) $ to denominate a specific candidate set of parents for neuron n, which may be identical, partially identical or completely disjunct from the true set of parents Pa(n). We use $ ||cap|| $ to denote the size of this set.
We assume that the number of parents of a neuron is limited to be below a known upper bound $ max(Pa(NE)) $ - we won't consider sets of parents larger than this maximum.
Let $ Spikecount(j, cap(n), n) $ denote the observation $ j \in {1, ..., J_n } $ of the number of spikes of the neurons in set $ cap(n) $ leading up to spike $ j $ of neuron n, where $ J_n $ denotes the number of spikes of neuron n.
We are interested in the general probability mass function
$$ P( Pa(n) = cap(n) | Spikecount(j, cap(n), n) $$We can't directly get this, so we use Bayes formula:
$$ P( Pa(n) = cap(n) | Spikecount(j, cap(n), n)) = \frac{P(Spikecount(j,cap(n), n))|Pa(n)=cap(n)) \cdot P(Pa(n)=cap(n))}{Z} $$So let's attack these terms one by one:
This is effectively the likelihood of the observation $ j $ given that $ cap(n) $ is actually the true set of parents $ Pa(n) $.
Let's assume all neurons with a given number of parents behave similarly: They have a minimum number of spikes of their parents that they require before they fire themselves.
$$ \forall j : Spikecount(j, Pa(n), n) >= K_n $$If we assume cap(n) = Pa(n) we can simply estimate $ K_n$ to be the minimum of all observations, or very close to the minimum (say, the 1% quantile) to allow for some measurement error.
$$ {min}_j(Spikecount(j, Pa(n), n) = K_n $$Let's further assume that we can describe the spikecount as a constant plus a stochastic error term which is determined by the number of parents, i.e. || Pa(n) ||.
$$ Spikecount(j, Pa(n), n) = K_n + E(||Pa(n)||) $$We can learn the probability distribution of this error term E(||Pa(n)||) for all sizes of the parent set from our training data. So we have:
$$ AddSpikes(j,Pa(n), n) = Spikecount(j, Pa(n), n) - K_n $$$$ P(Spikecount(j,cap(n), n))|Pa(n)=cap(n)) = P (E(||cap(n)||)= AddSpikes(j,Pa(n), n)) $$This can be modeled as the sum of the probabilities of having a parent set of size $ || cap(n) || $ multiplied with the probability of having randomly drawn this specific subset of || cap(n) || neurons from all candidate parent sets of that size. Let $ numcap(n,k) $ be the number of candidate parent sets of size k for neuron n. Then:
$$ P(Pa(n)=cap(n)) = \sum_{i=0)}^{max(||Pa(NE)||)}{\frac{P(||Pa(n)||=||cap(n)||)}{ numcap(n,k) }} $$Note that we can learn $ P(||Pa(n)||=||cap(n)||) $ from the data.
Z is simply a normalization factor, whose exact meaning depends on context. We have to ensure the probability mass function integrates to 1 over all possible inputs.
The above is a nice procedure to get probabilities once we have limited the number of possible parents for a neuron to a tractable number. The big question now is: How do we limit these ?
We have already seen above, that neurons which directly are direct parents of the reference neuron tend to fire more often at exactly same time slice. While the true difference from 1 hop to two hops might be just from 73% correlation to 63% or so, repeated measurements give a strong hint which neurons might be directly connected, and which are not.