This demo shows how to predict spiking probabilities your calcium imaging data for which no ground truth is available. Two approaches are presented.
I. Based on a simple, pretrained convolutional network.
II. Based on the same network, but using a projection of the dataset to be analyzed into an embedding space in order to retrain the network on a focused subset of the ground truth.
Authors: Peter Rupprecht, peter.rupprecht@fmi.ch; Stephan Gerhard, stephan.gerhard@fmi.ch
In [1]:
from __future__ import print_function
import numpy as np
import scipy.io as sio
from sklearn.decomposition import PCA
from copy import deepcopy
import matplotlib.pyplot as plt
from elephant.utils2 import extract_stats, genhurst, map_between_spaces
from elephant.utils import norm
from elephant.c2s_preprocessing import preprocess, percentile_filter
In [2]:
trace = sio.loadmat('demo_dataset/Adult_zebrafish_110neurons.mat')['FFF']
fs = 28
Preprocess data (upsample to 100 Hz, normalize offset/amplitude).
The preprocessing is based on Theis et al., 2016.
In [3]:
tracex, fsx = preprocess(trace,fs)
In [4]:
exec(open("elephant/config_elephant.py").read())
exec(open("elephant/2_model.py").read())
In [12]:
model.load_weights("models/model1.h5")
In [23]:
Ypredict = np.zeros( tracex.shape ) * np.nan # with nan padding
for k in range(0,trace.shape[1]):
if np.mod(k,20) ==0:
print('Predicting spikes for neuron %s out of %s' % (k+1, trace.shape[1]))
x1x = tracex[:,k]
idx = ~np.isnan(x1x)
calcium_traceX = norm(x1x[idx])
# initialize the prediction vector
XX = np.zeros( (calcium_traceX.shape[0]-windowsize,windowsize,1),dtype = np.float32)
for jj in range(0,(calcium_traceX.shape[0]-windowsize)):
XX[jj,:,0] = calcium_traceX[jj:(jj+windowsize)]
A = model.predict( XX,batch_size = 4096 )
indices = slice( int(windowsize*before_frac), int(windowsize*before_frac+len(A)) )
Ypredict[ indices,k ] = A[:,0]
In [24]:
%matplotlib inline
plt.figure(102)
indizes = range(int(windowsize*before_frac),calcium_traceX.shape[0]-int(windowsize*after_frac))
plt.imshow(np.transpose(tracex[indizes,:]), aspect='auto')
plt.jet()
plt.clim(0,8)
plt.axis('off')
Out[24]:
Plot deconvolved calcium data as a 2D matrix. Y-axis corresponds to the 110 neurons, X-axis to time (total ca. 180 sec).
In [25]:
plt.figure(101)
plt.imshow(np.transpose(Ypredict[indizes,:]), aspect='auto')
plt.gray()
plt.axis('off')
plt.clim(0.2,20)
Plot a single neuronal calcium trace, together with the predictions.
In [26]:
plt.figure(103)
neuron_index = 103
timet = np.arange(0,Ypredict.shape[0])/100
plt.plot(timet,Ypredict[:,neuron_index]*0.5)
plt.plot(timet,tracex[:,neuron_index]+9)
plt.xlim(36,55)
plt.ylim(-1.2, 15)
plt.xlabel('Time [sec]')
Out[26]:
This result is generated using the basic pre-trained model. Part II will retrain this model in order to be better adapted to the particularities of the dataset that is analyzed.
Computes statistical properties like skewness, autocorrelation, power spectral densities or Hurst coefficients, which are used to map the dataset into the embedding space created by the ground truth dataset, in order to choose the ground truth data best-suited for re-training of the model.
In [11]:
stats = extract_stats(tracex[:,:])
First, the statistical parameters of the ground truth datasets (Parameters174, based on 174 neurons) are loaded.
This parameter space is then projected into a 2D embedding space (pca2
).
This transformation is then used to project the statistics computed above stats
into this embedding space.
This yields a single 2D point, the location of the dataset to be analyzed in the embedding space.
In [12]:
A = sio.loadmat('statEmbedding/Parameters174py.mat',variable_names=['DasetS','Parameters174','Parameters174temp'])
DasetS = A['DasetS']
Parameters174temp = A['Parameters174temp']
Parameters174 = A['Parameters174']
goodindizes = sio.loadmat('statEmbedding/embedding_spacesX.mat')['goodindizes']
Parameters174 = Parameters174[:,goodindizes]
DasetS = DasetS[goodindizes]
ParametersXX = np.squeeze(deepcopy(Parameters174))
for k in range(0,10):
indizes = np.where(DasetS==k+1)[0]
for j in range(0,18):
ParametersXX[j,indizes] = np.mean(ParametersXX[j,indizes])
pca2 = PCA(n_components=2)
pca2.fit(np.transpose(ParametersXX))
for k in range(0,18):
stats[:,k] = (stats[:,k] - np.mean(Parameters174temp[:,k]))/np.std(Parameters174temp[:,k])
P1 = pca2.transform(stats)
P1mean = np.mean(P1,axis=0)
In [13]:
distances = map_between_spaces(P1mean)
First load ground truth data, used for retraining of the model.
Since the datasets of the ground truth or not equally large, it is necessary to use them accordingly for retraining in a weighted manner. To this end, the number of samples of the datasets have to be known.
In [14]:
exec(open("elephant/1_load_data.py").read())
dataset_sizes = np.array((740777, 672625, 563966, 165048, 149976, 171687, 875247, 486090, 584103, 697437),dtype=float)
dataset_fraction_to_take = np.min(dataset_sizes)/dataset_sizes
In [15]:
distancesX = np.exp(-(distances)/3.5)
distancesX = distancesX/np.max(distancesX)
In [16]:
for jjj in range(0,5):
XX0 = np.empty((0,128,1))
YY0 = np.empty((0,1))
for kk in range(0,10):
dataset_chosen = kk + 1
datasets_to_train = {}
IY = datasets == dataset_chosen
datasets_to_train[dataset_chosen] = neurons[IY]
verbosity = 0
exec( open("elephant/3_preprocessing.py").read() )
X = X[0:int(X.shape[0]*distancesX[kk]*dataset_fraction_to_take[kk]),:,:]
Y = Y[0:int(Y.shape[0]*distancesX[kk]*dataset_fraction_to_take[kk]),:]
XX0 = np.concatenate((XX0,X),axis=0)
YY0 = np.concatenate((YY0,Y),axis=0)
learning_rate = 0.0033
model.optimizer.lr.assign(learning_rate)
model.fit(XX0, YY0, batch_size=batch_size, epochs=1)
In [18]:
Ypredict = np.zeros( tracex.shape ) * np.nan # with nan padding
for k in range(0,trace.shape[1]):
if np.mod(k,20) ==0:
print('Predicting spikes for neuron %s out of %s' % (k+1, trace.shape[1]))
x1x = tracex[:,k]
idx = ~np.isnan(x1x)
calcium_traceX = norm(x1x[idx])
# initialize the prediction vector
XX = np.zeros( (calcium_traceX.shape[0]-windowsize,windowsize,1),dtype = np.float32)
for jj in range(0,(calcium_traceX.shape[0]-windowsize)):
XX[jj,:,0] = calcium_traceX[jj:(jj+windowsize)]
A = model.predict( XX,batch_size = 4096 )
index = slice( int(windowsize*before_frac), int(windowsize*before_frac+len(A)) )
Ypredict[ index,k ] = A[:,0]
In [24]:
plt.figure(103)
indizes = range(int(windowsize*before_frac),calcium_traceX.shape[0]-int(windowsize*after_frac))
plt.imshow(np.transpose(Ypredict[indizes,:]), aspect='auto')
plt.gray()
plt.clim(0.2,20)
plt.figure(104)
plt.imshow(np.transpose(tracex[indizes,:]), aspect='auto')
plt.jet()
plt.clim(0,8)
In [25]:
plt.figure(105)
neuron_index = 103
timet = np.arange(0,Ypredict.shape[0])/100
plt.plot(timet,Ypredict[:,neuron_index]*0.5)
plt.plot(timet,tracex[:,neuron_index]+9)
plt.xlim(36,55)
plt.ylim(-1.2, 15)
plt.xlabel('Time [sec]')
Out[25]:
In [ ]: