In this notebook we'll focus on predicting whether a neuron belongs to layer 6 or not knowing only the following information:
and the following technique:
In [0]:
##force not printing
%%capture
%matplotlib inline
!pip install h5py
!pip install pygsp
import numpy as np
import numpy.ma as ma
import h5py
from scipy import sparse
import IPython.display as ipd
import matplotlib.pyplot as plt
import matplotlib.colors as col
import re
import networkx as nx
import scipy as sp
import scipy.sparse as sps
from scipy.sparse import coo_matrix, hstack, vstack, csr_matrix
from sklearn.cluster import KMeans
import sklearn.metrics as skm
import pygsp as gsp
from pygsp import graphs, filters, plotting, features
import pygsp
In [0]:
##read .h5 file format containing the information about the microcolumn of the averege individual. Run load_ntds_project before this cell!
file_name='cons_locs_pathways_mc2_Column.h5'
h5=h5py.File(file_name,'r')
In [0]:
#read .h5 file format containing the symmetric connection matrix
file_name='conn_matrix.h5'
h5_conn=h5py.File(file_name,'r')
conn_tot=h5_conn['binary']
In [0]:
N=31346
m_values=list(h5['populations'].keys())
In [0]:
##to plot signal
def singleplt2dembeddings(x,y,col1,colmap=None,size=0.3,title1='Embedding with normalized Laplacian eigenmaps'):
plt.figure(figsize=(15,5))
plt.title(title1)
plt.scatter(x, y,c=col1,cmap=colmap, alpha=0.6,s=size)
plt.xlabel('Eigenvector 1')
plt.ylabel('Eigenvector 2')
plt.colorbar()
In [0]:
##preparing layers labels
m_type=dict()
for i in range(0, len(m_values)):
m_type[i] =m_values[i]
num_neuron=dict()
for i in range(0, len(m_values)):
num_neuron[i]=len(list(h5['populations'][m_type[i]]['locations']))
label_layer=[]
label_layer_morpho=[]
for i in m_type.keys():
label_layer.extend(num_neuron[i]*list(map(int, re.findall(r'^\D*(\d+)', m_type[i]))))
label_layer = [v if v!= 23 else 2 for v in label_layer]
label_layer_array = np.asarray(label_layer)
In [0]:
x = np.ones(N)
x[label_layer_array == 6] = -1
def prepare_observations(x,p):
#Prepare observations, where p is the percentage of values to keep
rs = np.random.RandomState(42)
M = np.array(rs.uniform(size=N) < p)
return np.multiply(M,x), M
In [8]:
y, M = prepare_observations(x,p=0.1) #we keep "only" 10 percent of the original data
plt.hist(y);
The problem we then want to solve is $$\mathbf{x}^* = \operatorname*{arg\,min}_{\mathbf{x} \in \mathbb{R}^N} \|\mathbf{y} - \mathbf{Mx}\|_2^2 + \alpha \mathbf{x}^\intercal \mathbf{L} \mathbf{x},$$ where $\alpha$ is an hyper-parameter which controls the trade-off between the data fidelity term and the smoothness prior.
This problem is a convex problem. One particularity is, that there is a global minimum. Thus, if we compute the derivative, we should be able to retrieve this global minimum, because that will be where the derivative is equal to zero. Finally we need to solve :
$
\left( \mathbf{M}^2 + \alpha \mathbf{L} \right) \mathbf{x}^* = \mathbf{My}
$</font>
In [9]:
#create conn_tot and L sparse matrices
conn_tot=csr_matrix(conn_tot)
L=sps.csgraph.laplacian(conn_tot,normed=True)
To solve this problem, the lsqr solver is an efficient way to do it. It is an iterative method to approximate the solution.
In [0]:
#retrieve the first 3 eigenvetors and eigenvalues
evalues_norm, evect_norm =sp.sparse.linalg.eigsh(L,k=3,which="SM")
x_norm = evect_norm[:,1]
y_norm = evect_norm[:,2]
In [0]:
M=[int(M[i]) for i in range(len(M))]
In [0]:
def solve(y,M, L, alpha):
M1=sp.sparse.diags(M)
A= M1.dot(M1)+L.multiply(alpha)
B = M1.dot(y)
x_star, istop, itn, r1norm,r2norm = sp.sparse.linalg.lsqr(A, B)[:5]
return x_star
x_star = solve(y, M, L, alpha=0.01)
In [13]:
colors = ['r','y','w','g','b','purple']
ycoord=[x for x in range(31346)]
plt.scatter(ycoord,x_star,s=0.5,c=label_layer,cmap=col.ListedColormap(colors))
Out[13]:
The result above shows that different layers are separated by a sharp discontinuity in the reconstructed signal. The understanding of "why" is not obvious, bit it is interesting to notice it.
In [14]:
singleplt2dembeddings(x_norm,y_norm, col1=x_star,colmap='cool',size=0.8, title1= '6 Layers', )
In [15]:
##deciding value
value=np.mean(x_star)
x_pred = x_star.copy()
x_pred[x_pred>value]=1
x_pred[x_pred<=value]=-1
# Be sure that the prediction is binary.
np.testing.assert_equal(abs(x_pred), 1)
# Error rate.
err = np.count_nonzero(x - x_pred)
print('We can predict if a neuron is in the 6th layer, or not, with a percentage of error equal to : {:.2%}, number of errors : {}'.format(err/N, err))
Result : By taking only 10% of the original data, the layer 6 could be predict with a small error (7.55%).
In [16]:
colors = ['b','purple']
singleplt2dembeddings(x_norm,y_norm, col1=x_pred,colmap=col.ListedColormap(colors),size=0.8, title1= '6 Layers', )
We will keep the same observations as for the unweighted matrix.
In [0]:
##load already computed weighted matrix
file_name='fail_weight.h5'
h5_dist=h5py.File(file_name,'r')
conn_fail=h5_dist['weight']
In [18]:
conn_fail=csr_matrix(conn_fail)
L_fail=sps.csgraph.laplacian(conn_fail,normed=True)
In [0]:
evalues_norm_fail, evect_norm_fail =sp.sparse.linalg.eigsh(L_fail,k=3,which="SM")
x_norm_fail = evect_norm_fail[:,1]
y_norm_fail = evect_norm_fail[:,2]
In [0]:
x_star_fail = solve(y, M, L_fail, alpha=0.01)
In [21]:
colors = ['r','y','w','g','b','purple']
ycoord=[x for x in range(31346)]
plt.scatter(ycoord,x_star_fail,s=0.5,c=label_layer,cmap=col.ListedColormap(colors))
Out[21]:
In [22]:
##deciding value
value_fail=np.mean(x_star_fail)
x_pred_fail = x_star_fail.copy()
x_pred_fail[x_pred_fail>value_fail]=1
x_pred_fail[x_pred_fail<=value_fail]=-1
# Be sure that the prediction is binary.
np.testing.assert_equal(abs(x_pred_fail), 1)
# Error rate.
err_fail = np.count_nonzero(x - x_pred_fail)
print('We can predict if a neuron in the 6th layer, or not, with a percentage of error equal to : {:.2%}, number of errors : {}'.format(err_fail/N, err_fail))
Results: The results obtained by the prediction with either of the matrices have a very low error rate. When keeping only 10% of the original data using the binary connectivity matrix results in an error rate of 7.55% and using the weighted matrix results in an error rate of 4.72%.
This supports the assumption, that the failure rate is a reasonable measure to be used when distinguishing between the different layers.
In [23]:
colors = ['b','purple']
singleplt2dembeddings(x_norm_fail,y_norm_fail, col1=x_pred_fail,colmap=col.ListedColormap(colors),size=0.8, title1= '6 Layers', )
label : result from : **Unweighted** connectivy matrix | **Weighted** connectivy matrix
p | $\alpha$ = 0.01 | $\alpha$ = 0.1 | $\alpha$ = 1 |
---|---|---|---|
0.05 | 15.6 | 8.05 | 16.7 | 8.60 | 25.7 | 14.6 |
0.1 | 7.55 | 4.72 | 8.04 | 4.86 | 12.7 | 6.41 |
0.2 | 4.93 | 3.68 | 7.55 | 3.69 | 6.95 | 4.24 |
0.5 | 2.59 | 2.31 | 2.62 | 2.29 | 2.89 | 2.24 |
As expected, the same behaviour can be observed as in the last assignment (NTDS assignment 4: graph signal processing). As $\alpha$ is the hyper-parameter which controls the trade-off between the data fidelity term and the smoothness prior, if it is increased, the percentage of errors increases. The opposite is also true. Concerning the probabiliy, it can be noticed (and that makes sense) that if more values are kept (bigger "p") from the original observations to compute the prediction, the amount of errors between the original and the prediction values decrease.
In this section we'll try to predict whether a neuron is excitatory or inhibitory (E\I) using as an underlying graph the structural connectivity matrix of the microcircuit weighted wit the failure mean. The technique used for the prediction is the same as in section 3.1) and 3.2).
The labels of E\I have been manually created by looking at the classification given in 'Reconstruction and Simulation of Neocortical Microcircuitry' (Markram et al., 2015; Cell).
In [0]:
lab_EI=[-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,1,-1,-1,-1,-1,-1,-1,-1,-1,-1,1,-1,1,1,-1,-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,-1,1,-1,-1,-1,1,-1,-1,-1,-1,-1,-1,-1,-1]
In [0]:
label_EI=[]
for i in m_type.keys():
label_EI.extend(num_neuron[i]*[lab_EI[i]])
In [26]:
E=[label_EI[i] for i in range(N) if label_EI[i]==1]
E=sum(E)
I=N-E
print('In the microcircuit there are {} excitatory neurons and {} inhibitory neurons'.format(E,I))
In [27]:
##prepare observation
yEI, MEI = prepare_observations(label_EI,p=0.1) #we keep "only" 10 percent of E\I labels
plt.hist(yEI);
In [0]:
MEI=[int(MEI[i]) for i in range(len(MEI))]
In [0]:
## predict the remaining 90% of the E/I labels
x_star_EI = solve(yEI, MEI, L_fail, alpha=0.01)
In [30]:
colors = ['b','r']
ycoord=[x for x in range(31346)]
plt.scatter(ycoord,x_star_EI,s=0.5,c=label_EI,cmap=col.ListedColormap(colors))
Out[30]:
In [31]:
##deciding value
value_EI=np.mean(x_star_EI)
x_pred_EI = x_star_EI.copy()
x_pred_EI[x_pred_EI>value_EI]=1
x_pred_EI[x_pred_EI<=value_EI]=-1
# Be sure that the prediction is binary.
np.testing.assert_equal(abs(x_pred_EI), 1)
# Error rate.
err_EI = np.count_nonzero(label_EI - x_pred_EI)
print('We can predict if a neuron is inhibitory or excitatory with a percentage of error equal to : {:.2%}, number of errors : {}'.format(err_EI/N, err_EI))
In [32]:
colors = ['b','r']
singleplt2dembeddings(x_norm_fail,y_norm_fail, col1=x_pred_EI,colmap=col.ListedColormap(colors),size=0.8, title1= 'E/I', )
Result: The presented method is not an accurate prediction to distinguish between inhibitory and excitatory neurons. Probably the constructed underlying graph doesn't carry much information about this biological property of the neurons. Further investigation could be done by choosing more biologically related weights for the structural graph of the microcircuit.