"Bayesian Source Separation: Beyond PCA and ICA" (Mohammad-Djafari, 2006) http://djafari.free.fr/pdf/esann06.pdf

  • PCA and ICA assume data are independent and stationary, components are additive, etc.
  • Proposal: progressively account for different properties of the sources

Bayesian nonlinear independent component analysis by multi-layer perceptrons (Lappalainen and Honkela) https://www.hiit.fi/u/ahonkela/papers/ch7.pdf

  • PCA and ICA assume data generated by independent sources through a linear mapping
  • General form of models: observations $\mathbf{x}(t)$ are generated by sources $\mathbf{s}(t)$ mapped to observation space by a parameterized function $f()$, with additive noise $\mathbf{n}(t)$ $$\mathbf{x}(t) = f(\mathbf{s}(t)) + \mathbf{n}(t)$$

Variational methods for Bayesian Independent Component Analysis (Choudrey, 2002, Oxford PhD thesis) http://www.robots.ox.ac.uk/~parg/pubs/theses/RizwanChoudrey_thesis.pdf

Independent Component Analysis (ICA)


In [1]:
import numpy as np
import numpy.random as npr
import pylab as pl
%matplotlib inline

n = 100000
t = np.arange(n)
x1 = np.sin(t*0.001)
x1_noise = npr.randn(n)*0.1
x2 = np.cos(t*0.0001)
x2_noise = npr.randn(n)*0.01
S = np.c_[x1+x1_noise,x2+x2_noise]
S/= S.std(axis=0)
mix = np.array([[0.2,0.8,0.1],[0.1,0.8,0.2]])
obs = np.dot(S,mix)

In [142]:
y = S[:,0]+S[:,1]

In [150]:
pl.plot(y)
pl.figure()
pl.plot(S[:,0])
pl.figure()
pl.plot(S[:,1])


Out[150]:
[<matplotlib.lines.Line2D at 0x102d21e10>]

In [121]:
from sklearn.decomposition import FastICA

In [156]:
ica = FastICA(n_components=2)

In [157]:
S_ = ica.fit_transform(obs)

In [158]:
mix_ = ica.mixing_

In [159]:
mix_


Out[159]:
array([[  63.89837314,  -32.37485653],
       [ 258.22298612, -255.96319867],
       [  32.93524665,  -63.61134297]])

In [160]:
pl.plot(S_[:,0])
pl.figure()
pl.plot(S_[:,1])
pl.figure()
pl.plot(S_[:,2])


---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-160-dfb3e932bd1f> in <module>()
      3 pl.plot(S_[:,1])
      4 pl.figure()
----> 5 pl.plot(S_[:,2])

IndexError: index 2 is out of bounds for axis 1 with size 2
<matplotlib.figure.Figure at 0x10da94fd0>

In [66]:

Notes on Hastie et al., 2001: Elements of Statistical Learning, Chapter 14 - Unsupervised Learning

Intro

  • Supervised learning problem: given inputs $X$ and outputs $Y$, represented by a joint probability density $(X,Y)$, determine properties of the conditional density $\Pr(Y \mid X)$, such as:
    • "location" parameters $\mu$ that minimize expected error at each data point $$\mu(x) = \text{argmin}_\theta \mathbb{E}_{Y\mid X} L(Y,\theta)$$
    • uncertainty, etc.
  • Unsupervised learning: given $N$ observations $(x_1,x_2,\dots,x_N)$, $x \in \mathbb{R}^p$, with joint density $\Pr(X)$, infer properties of $\Pr(X)$

Association rules

  • Find joint values of the variables that appear most frequently: "mode finding" / "bump hunting"
  • For binary-valued data vectors, this is called "market basket analysis"
  • For high-dimensional data, this is intractable-- we don't have enough samples to directly estimate association rules. How to make tractable:
    • Modify goal: seek regions of $X$-space with high probability content relative to their size/support, rather than values $x$ with large $\Pr(x)$
      • Let $\mathcal{S}_j$ be the support of the $j$th variable (the set of all possible values of the $j$th variable), and let $s_j \subseteq \mathcal{S}_j$ be a subset of these values.
      • Modified goal: find subsets of variable values $s_1,\dots,s_p$ such that the probability of each of the variables simultaneously assuming a value within its respective subset is relatively large. I.e. we want the following value to be large relative to the support of the subsets: $$ \Pr \left[ \bigcap_{j=1}^p{(X_j \in s_j)} \right]$$
      • where $\bigcap_{j=1}^p{(X_j \in s_j)}$ is called a conjunctive rule
    • Simplify analysis:
      • Only two types of subsets considered: $s_j$ is a single value of $X_j$ ($s_j = v_{0 j}$), or it consists of the entire set $\mathcal{S}_j$: reduces problem to finding a subset $\mathcal{J}$ of the indices $\{ 1,\dots, p\}$ (and corresponding values $v_{0 j}, j \in \mathcal{J}$ such that the following value is large: $$ \Pr \left[ \bigcap_{j \in \mathcal{J}}{(X_j = v_{0 j})} \right]$$
Unsupervised as Supervised Learning
  • Transform density estimation problem into one of supervised function approximation:
    • Let $g(x)$ be the unknown data probability density to be estimated, and $g_0(x)$ a specified probability density function used for reference (e.g. the uniform density over the range of the variables)
Principal Components, Curves, and Surfaces
  • The principal components of a dataset in $\mathbb{R}^p$ provide a sequence of best linear approximations to that data of all ranks $q \leq p$
  • The rank-$q$ linear model for representing the data: $$ f(\lambda) = \mu + \mathbf{V}_q \lambda$$
    • $\mu$: a location vector in $\mathbb{R}^p$
    • $\mathbf{V}_q$: a $p \times q$ matrix whose columns are orthogonal unit vectors
    • $\lambda$: a $q$-vector of parameters
  • Fit this model to data: choose $\mu, \mathbf{V}_q, \lambda$ to minimize the reconstruction error:
$$ \sum_{i=1}^N \| x_i - \mu - \mathbf{V}_q \lambda_i \|^2$$

In [296]:
p = 10
q = 5
N = 1000
X = npr.randn(N,p)
mu = np.mean(X,axis=0)
V = np.linalg.svd(X)[2][:q].T
lam = np.ones(q)

In [297]:
def reconstruction_error(X,mu,V,lam):
    return np.sum(X - mu - V.dot(lam))

In [298]:
reconstruction_error(X,mu,V,lam)


Out[298]:
4717.7172362706733

We can optimize for $\mu$ and $\lambda_i$ to obtain $$ \begin{align} \hat{\mu} & = \bar{x}\\ \hat{\lambda}_i & = \mathbf{V}_q^T (x_i - \bar{x}) \end{align} $$


In [299]:
lam = np.sum(V.dot(x - np.mean(x)),axis=0)

In [300]:
lam.shape


Out[300]:
(5,)

In [301]:
U,D,V =np.linalg.svd(X.T)

In [302]:
D.shape


Out[302]:
(10,)

In [303]:
PCs = (U*D).T
SVs = D

In [307]:
[pl.plot(PC) for PC in PCs];



In [305]:
pl.plot(SVs)


Out[305]:
[<matplotlib.lines.Line2D at 0x11813d250>]

In [247]:
np.sum(U.T.dot(U))


Out[247]:
1000.0

Independent Component Analysis and Exploratory Projection Pursuit

  • Multivariate data might comprise multiple indirect measurements from an indirectly observable underlying source
  • Factor analysis: identify latent sources (makes Gaussianity assumptions)
Latent variables and factor analysis
  • Latent variable representation of SVD ($\mathbf{X} = \mathbf{U D V}^T$):
    • $\mathbf{S} = \sqrt{N} \mathbf{U}$
    • $\mathbf{A}^T = \frac{1}{\sqrt{N}}\mathbf{D V}^T$
    • $\mathbf{X} = \mathbf{SA}^T$

In [333]:
U,D,Vt = np.linalg.svd(X.T,full_matrices=0)
D = np.diag(D)
N = len(X)
S = np.sqrt(N)*U
At = D*Vt / np.sqrt(N)
X = S * At


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-333-6780456e471a> in <module>()
      3 N = len(X)
      4 S = np.sqrt(N)*U
----> 5 At = D*Vt / np.sqrt(N)
      6 X = S * At

ValueError: operands could not be broadcast together with shapes (10,10) (10,1000) 

In [334]:
print(U.shape)
print(D.shape)
print(Vt.shape)


(10, 10)
(10, 10)
(10, 1000)

In [332]:
X.shape


Out[332]:
(1000, 10)

In [340]:
D.dot(Vt)


Out[340]:
(10, 1000)

In [350]:
U.conj().T == np.linalg.inv(U)


Out[350]:
array([[False,  True, False, False, False, False, False, False, False,
        False],
       [False, False, False, False, False, False, False, False, False,
        False],
       [False, False, False, False, False, False, False, False, False,
        False],
       [False, False, False, False, False, False, False, False,  True,
        False],
       [False, False, False, False, False,  True, False, False, False,
        False],
       [False, False, False, False, False, False,  True, False, False,
        False],
       [False, False, False, False, False,  True, False, False, False,
        False],
       [False, False, False, False, False, False, False, False, False,
        False],
       [False, False,  True, False, False, False, False, False, False,
        False],
       [False, False, False, False, False, False, False, False, False,
        False]], dtype=bool)

In [321]:
D


Out[321]:
array([[ 33.82941267,   0.        ,   0.        ,   0.        ,
          0.        ,   0.        ,   0.        ,   0.        ,
          0.        ,   0.        ],
       [  0.        ,  33.4038151 ,   0.        ,   0.        ,
          0.        ,   0.        ,   0.        ,   0.        ,
          0.        ,   0.        ],
       [  0.        ,   0.        ,  33.0840585 ,   0.        ,
          0.        ,   0.        ,   0.        ,   0.        ,
          0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        ,  32.29008612,
          0.        ,   0.        ,   0.        ,   0.        ,
          0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        ,   0.        ,
         31.88805421,   0.        ,   0.        ,   0.        ,
          0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        ,   0.        ,
          0.        ,  31.53944821,   0.        ,   0.        ,
          0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        ,   0.        ,
          0.        ,   0.        ,  30.85506031,   0.        ,
          0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        ,   0.        ,
          0.        ,   0.        ,   0.        ,  30.11799476,
          0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        ,   0.        ,
          0.        ,   0.        ,   0.        ,   0.        ,
         29.75476868,   0.        ],
       [  0.        ,   0.        ,   0.        ,   0.        ,
          0.        ,   0.        ,   0.        ,   0.        ,
          0.        ,  28.7446322 ]])

ktICA notes

tICA notes:

  • Goal: find projections that decorrelate slowly, i.e. Given a markov chain in phase space $\{ \vec{x}_t \in \Omega\}$ find a vector $\vec{v} \in \Omega$ such that the autocorrelation of $\vec{x}_t$ projected onto $\vec{v}$ is maximized
  • Formally
$$ \max_{\vec{v}} \frac{\mathbb{E}[(\vec{v} \cdot \delta \vec{x}_t) (\vec{v} \cdot \delta \vec{x}_{t+\tau})]}{\mathbb{E}[(\vec{v} \cdot \delta \vec{x}_t)^2} $$

Also equivalent to the generalized eigenvalue problem: $$ C^{(\tau)} \vec{v} = \lambda \Sigma \vec{v}$$


In [437]:
def time_lag_corr_cov(X,tau=1):
    mu = (X[:-tau].mean(0) + X[tau:].mean(0)) / 2
    X_ = X-mu
    M = len(X) - tau
    dim = len(X.T)
    corr = np.zeros((dim,dim))
    cov = np.zeros((dim,dim))
    for i in range(M):
        corr += np.outer(X_[i],X_[i+tau]) + np.outer(X_[i+tau],X_[i])
        cov += np.outer(X_[i],X_[i]) + np.outer(X_[i+tau],X_[i+tau])
    return corr / (2.0*M),cov / (2.0*M)

In [438]:
X = np.ones((10000,100))

In [439]:
for i in range(10,len(X)):
    X[i] = X[i-10] + npr.randn(X.shape[1])

In [440]:
pl.imshow(np.outer(X[10],X[11]),interpolation='none')
pl.colorbar()


Out[440]:
<matplotlib.colorbar.Colorbar instance at 0x11dccb290>

In [441]:
corr,cov = time_lag_corr_cov(X,1)
pl.imshow(corr,interpolation='none')
pl.colorbar()


Out[441]:
<matplotlib.colorbar.Colorbar instance at 0x1445ba998>

In [442]:
corr,cov = time_lag_corr_cov(X,1)
pl.imshow(corr,interpolation='none')
pl.colorbar()
pl.figure()
pl.imshow(cov,interpolation='none')
pl.colorbar()


Out[442]:
<matplotlib.colorbar.Colorbar instance at 0x1132987e8>

In [443]:
np.sum(abs(corr-cov))


Out[443]:
1228608.6408781419

In [446]:
diff = []
for i in range(1,100):
    corr,cov = time_lag_corr_cov(X,i)
    diff.append(np.sum(abs(corr-cov)))
    if i % 10 == 0:
        print(i)
pl.plot(diff)


10
20
30
40
50
60
70
80
90
Out[446]:
[<matplotlib.lines.Line2D at 0x12dd3e3d0>]

In [447]:
pl.plot(np.log(diff))


Out[447]:
[<matplotlib.lines.Line2D at 0x12dd42510>]

In [449]:
a,b = time_lag_corr_cov(X,100)

In [450]:
from scipy import linalg

In [456]:
w,v = linalg.eigh(a,b)
w.shape,v.shape
w = w[::-1]
v = v[:,::-1]

In [457]:
pl.plot(w)


Out[457]:
[<matplotlib.lines.Line2D at 0x1426252d0>]

In [463]:
a,b = time_lag_corr_cov(df_megatraj_full[0],100)
w,v = linalg.eigh(a,b)
print(w.shape,v.shape)
w = w[::-1]
v = v[:,::-1]


((84,), (84, 84))

In [464]:
pl.plot(w)


Out[464]:
[<matplotlib.lines.Line2D at 0x12dccd310>]

In [466]:
pl.scatter(np.ones(len(w)),w,linewidth=0,alpha=0.5)


Out[466]:
<matplotlib.collections.PathCollection at 0x143b1bed0>

In [488]:
pca = PCA(2)
X_pca = pca.fit_transform(df_megatraj_full[0])
pl.scatter(X_pca[:,0],X_pca[:,1],c=range(len(X_pca)),linewidths=0)


Out[488]:
<matplotlib.collections.PathCollection at 0x199a13950>

In [467]:
from sklearn.decomposition import FastICA

In [479]:
ica = FastICA(3,fun='exp')
X_ = ica.fit_transform(df_megatraj_full[0])

In [480]:
pl.scatter(X_[:,0],X_[:,1],c=range(len(X_)),linewidths=0)


Out[480]:
<matplotlib.collections.PathCollection at 0x17ff74a10>

Duh-- idea: perform time-aware clustering. You can visually see distinct clusters when they're colored by time


In [484]:
X_.shape


Out[484]:
(280000, 3)

In [487]:
pl.plot(abs(np.sum(X_[1:]-X_[:-1],1)))


Out[487]:
[<matplotlib.lines.Line2D at 0x17ff2b8d0>]

In [160]:
def gaussian(x,y,bandwidth=1.0):
    return np.exp(-1.0/(2*bandwidth**2)*np.sum((x-y)**2))

In [168]:
%timeit gaussian(X[0],X[1])


100000 loops, best of 3: 9.96 µs per loop

In [177]:
type(gaussian) == type(np.exp)


Out[177]:
False

In [3]:
def time_lagged_K(kernel_matrix,tau=1):
    # to-do: accept either a kernel function or a precomputed kernel matrix
    # for now, just accept a precomputed kernel matrix
    M = len(kernel_matrix) - tau
    K = np.zeros((2*M,2*M))
    #for i in range(M):
        #for j in range(M):
            #K[i,j] = kernel(X[i],X[j])
            #K[i,j+M] = kernel(X[i],X[j+tau])
            #K[i+M,j] = kernel(X[i+tau],X[j])
            #K[i+M,j+M] = kernel(X[i+tau],X[j+tau])
    K[:M,:M] = kernel_matrix[:M,:M]
    K[:M,M:] = kernel_matrix[:M,tau:]
    K[M:,:M] = kernel_matrix[tau:,:M]
    K[M:,M:] = kernel_matrix[tau:,tau:]
    return K

In [382]:
from msmbuilder.example_datasets import AlanineDipeptide,FsPeptide
#from msmbuilder.featurizer import SuperposeFeaturizer
print(AlanineDipeptide.description())

dataset = AlanineDipeptide().get()
ala_trajectories = dataset.trajectories
t = ala_trajectories[0]
import mdtraj

n=5000
rmsd = np.zeros((n,n))
for i in range(n):
    rmsd[i,i:] = mdtraj.rmsd(t[i:n],t[i])
    #if i % 100 == 0:
    #    print(i)
rmsd_ala = rmsd + rmsd.T

dataset = FsPeptide().get()
fs_trajectories = dataset.trajectories
fs_t = fs_trajectories[0]

rmsd = np.zeros((n,n))
fs_t = fs_trajectories[0]

for i in range(n):
    rmsd[i,i:] = mdtraj.rmsd(fs_t[i:n],fs_t[i])

rmsd_fs = rmsd + rmsd.T


The dataset consists of ten 10ns trajectories of of alanine dipeptide,
simulated using OpenMM 6.0.1 (CUDA platform, NVIDIA GTX660) with the
AMBER99SB-ILDN force field at 300K (langevin dynamics, friction coefficient
of 91/ps, timestep of 2fs) with GBSA implicit solvent. The coordinates are
saved every 1ps. Each trajectory contains 9,999 snapshots.

The dataset, including the script used to generate the dataset
is available on figshare at

http://dx.doi.org/10.6084/m9.figshare.1026131

loading trajectory_1.xtc...
loading trajectory_10.xtc...
loading trajectory_11.xtc...
loading trajectory_12.xtc...
loading trajectory_13.xtc...
loading trajectory_14.xtc...
loading trajectory_15.xtc...
loading trajectory_16.xtc...
loading trajectory_17.xtc...
loading trajectory_18.xtc...
loading trajectory_19.xtc...
loading trajectory_2.xtc...
loading trajectory_20.xtc...
loading trajectory_21.xtc...
loading trajectory_22.xtc...
loading trajectory_23.xtc...
loading trajectory_24.xtc...
loading trajectory_25.xtc...
loading trajectory_26.xtc...
loading trajectory_27.xtc...
loading trajectory_28.xtc...
loading trajectory_3.xtc...
loading trajectory_4.xtc...
loading trajectory_5.xtc...
loading trajectory_6.xtc...
loading trajectory_7.xtc...
loading trajectory_8.xtc...
loading trajectory_9.xtc...

In [227]:
from msmbuilder import featurizer
rpf = featurizer.RawPositionsFeaturizer()
rpft = rpf.fit_transform(t.center_coordinates())

In [230]:
len(rpft),rpft[0].shape


Out[230]:
(9999, (1, 66))

In [237]:
print(rpft[0].shape)
rpft[0][:10]


(1, 66)
Out[237]:
array([[-0.08346879,  0.37814903,  0.05097657, -0.17036432,  0.31281137,
         0.04315591, -0.21876258,  0.2962383 , -0.05309349, -0.24813223,
         0.34655154,  0.11167407, -0.12860131,  0.18276775,  0.11027825,
        -0.1192168 ,  0.17883182,  0.2296772 , -0.09241003,  0.08390343,
         0.02667385, -0.09346884,  0.11739886, -0.06860435, -0.01379848,
        -0.0328213 ,  0.05494684,  0.03585249, -0.02314794,  0.15149844,
        -0.113886  , -0.15396959,  0.0687905 , -0.13873768, -0.18143648,
        -0.03372276, -0.06885797, -0.23928595,  0.11953104, -0.19898528,
        -0.10809737,  0.11913782,  0.08298725, -0.05516928, -0.0596621 ,
         0.08525664,  0.01455045, -0.16487294,  0.16623157, -0.15600061,
        -0.04842705,  0.16648954, -0.21376866,  0.03442091,  0.2711575 ,
        -0.18177128, -0.15055555,  0.33518779, -0.26732063, -0.12905061,
         0.21814322, -0.20783979, -0.24215746,  0.32738495, -0.09057325,
        -0.17061532]], dtype=float32)

In [239]:
np.array(rpft).shape


Out[239]:
(9999, 1, 66)

In [241]:
np.reshape(rpft,(9999,22,3)).shape


Out[241]:
(9999, 22, 3)

In [228]:
raw_pos = np.reshape(rpft,(9999,22,3))
raw_pos.shape


Out[228]:
(9999, 22, 3)

In [260]:
from numpy.linalg import det
BC = lambda x,y: det(x.T.dot(y))/ np.sqrt(det(x.T.dot(x)) * det(y.T.dot(y)))

In [261]:
BC(raw_pos[10],raw_pos[100])


Out[261]:
0.87963367

In [262]:
n = 5000

In [263]:
kernel_matrix = np.zeros((n,n))
for i in range(n):
    for j in range(n):
        kernel_matrix[i,j] = BC(raw_pos[i],raw_pos[j])
    if i % 500 == 0:
        print(i)


0
500
1000
1500
2000
2500
3000
3500
4000
4500

In [307]:
kernel_matrix[0,0]


Out[307]:
1.0

In [309]:
kernel_matrix.shape


Out[309]:
(1000, 1000)

In [163]:
kernel_matrix = 1-rmsd_ala

In [164]:
tau=5
M = len(kernel_matrix)-tau
from time import time
t = time()
K = time_lagged_K(kernel_matrix,tau=tau)
print(time() - t)
K.shape


0.00175809860229
Out[164]:
(990, 990)

In [200]:


In [169]:
# earlier way to do it, which scaled as O((2n)**2)

from time import time
t = time()
K = time_lagged_K(X[:1000])
print(time() - t)
K.shape


44.4125571251
Out[169]:
(1998, 1998)

In [165]:
def construct_R(M):
    ''' R is a 2x2 block matrix with MxM entries, [[0,I],[I,0]]'''
    R = np.zeros((2*M,2*M))
    R[M:,:M] = np.eye(M)
    R[:M,M:] = np.eye(M)
    return R

In [166]:
R = construct_R(M)
R.shape


Out[166]:
(990, 990)

In [167]:
K.shape


Out[167]:
(990, 990)

In [168]:
KRK = (K.dot(R)).dot(K)

In [169]:
KK = K.dot(K)

In [170]:
evals,evecs = linalg.eigh(KRK,KK+0.01*np.eye(2*M))

In [171]:
evals[0],evals[-1]


Out[171]:
(-0.76022852142120789, 0.99999918113912623)

In [172]:
pl.scatter(range(len(evals)),evals,linewidth=0,alpha=0.5)


Out[172]:
<matplotlib.collections.PathCollection at 0x113d7f450>

In [173]:
sum(evals>0.1)


Out[173]:
180

In [174]:
evecs[-1]


Out[174]:
array([ -8.27203140e-03,  -2.00527461e-02,   1.88509570e-01,
        -4.68172200e-02,   1.68828317e-01,   4.93318954e-03,
         3.61285099e-04,   4.96226058e-01,  -2.54952176e-01,
        -6.76890819e-02,  -9.06412769e-02,  -3.82150960e-01,
        -5.10345776e-02,   2.43780489e-02,  -5.11047491e-02,
        -9.82162260e-02,  -2.54489450e-01,  -2.03531649e-01,
        -2.11353592e-01,  -3.65767500e-02,   2.03930899e-01,
        -1.29145839e-01,   8.01311061e-02,   2.28703549e-01,
        -4.19444008e-01,   3.61573931e-02,  -6.83174535e-02,
         3.84256346e-02,   1.13634933e-01,   3.01137716e-02,
         2.15892769e-01,  -1.40475369e-01,  -1.92381494e-01,
         1.92312589e-01,  -4.17913220e-02,  -3.49252742e-01,
        -1.31959221e-01,  -1.12490176e-01,  -2.65749620e-02,
         2.73004308e-02,   1.92209209e-01,  -5.98768896e-02,
         1.40298356e-01,   1.07639569e-01,  -1.01605124e-02,
         1.59543000e-01,   8.98727331e-02,  -9.30032727e-02,
        -2.81540097e-02,   5.30606043e-02,   2.27786990e-01,
         9.33623297e-02,   5.37990519e-02,  -9.41657345e-02,
         4.53708245e-01,   4.33449571e-01,  -4.78234918e-02,
        -3.58706400e-01,   8.52832622e-02,  -1.83532878e-01,
        -5.03069599e-01,  -2.43267679e-01,  -1.74624297e-01,
         2.23994183e-03,  -5.67003180e-02,  -2.33783256e-01,
         3.17643636e-01,  -1.09401194e-02,  -3.79118655e-01,
        -1.76484326e-01,   8.90489144e-02,  -1.31669543e-01,
         8.03284904e-02,  -2.11680955e-01,   1.58753896e-01,
         1.07338714e-02,  -1.97182430e-01,  -6.49461468e-02,
        -1.00235851e-01,  -2.08878245e-01,   5.01009416e-02,
        -6.45498400e-03,  -2.69691928e-01,  -2.46172087e-01,
        -3.71274227e-01,   7.38942276e-02,  -1.22894076e-01,
        -2.48448132e-01,  -3.18288526e-01,  -1.85664241e-01,
        -3.31434198e-01,  -2.13158255e-01,   1.43675155e-01,
         4.30892741e-02,  -5.94966413e-02,  -1.09303991e-01,
         1.55451593e-01,  -2.59130189e-01,  -8.88496326e-02,
        -3.42992363e-01,  -2.99507782e-01,   2.09117519e-02,
         2.45653895e-01,  -2.35683294e-01,   2.19317849e-01,
        -1.65839274e-01,   1.95540622e-01,   7.48153227e-02,
         2.14477379e-01,  -3.55391393e-01,   1.69621459e-01,
        -2.88624715e-01,  -1.71476492e-01,   2.29430705e-01,
         8.09189247e-02,   1.59398941e-01,  -5.21276197e-02,
         1.88261648e-01,   7.75061582e-02,   1.78391011e-01,
         3.46411811e-01,  -1.59477730e-01,   3.38782504e-01,
         1.38770542e-01,   3.74458684e-02,   1.66483926e-01,
        -3.85604065e-01,   1.75466694e-01,  -2.27405007e-01,
         1.23827112e-01,  -3.45970946e-01,   1.16075434e-01,
        -1.55370762e-01,  -1.25730510e-01,   1.01961720e-02,
         1.29088938e-01,   2.25602003e-02,   2.36509636e-01,
        -4.09740080e-01,   1.04514993e-01,  -1.65585636e-01,
        -3.16436423e-01,   9.15949615e-02,   2.67230606e-01,
         5.49840468e-02,  -2.62881432e-01,  -3.12942540e-01,
        -6.97204666e-01,   3.71633329e-02,  -1.03476965e-01,
         1.11338551e-01,   3.00917085e-01,   8.67729721e-01,
        -2.46989527e-01,   1.73338415e-01,  -1.03349249e-02,
         2.17631502e-01,   4.84792041e-01,   3.09597153e-01,
        -1.95934681e-01,  -5.51095126e-02,   5.91243540e-01,
        -2.03573463e-01,  -7.80801035e-01,   3.12584694e-01,
        -1.96912779e-01,   4.16345575e-02,  -2.21626323e-01,
         1.54773603e-01,   3.08403227e-01,  -1.72078604e-02,
        -2.02132337e-01,   6.01950939e-02,  -2.70249977e-01,
        -2.76245430e-01,   3.79372286e-01,   1.20249417e-01,
        -1.74424445e-01,  -2.60106002e-01,  -1.44543480e-01,
         3.80386132e-01,  -1.04272494e+00,   1.21165144e+00,
        -4.45038293e-01,  -1.44107876e-01,   4.53516319e-01,
        -1.12620211e+00,  -5.14079848e-01,  -2.53672254e-01,
        -3.69377158e-01,  -1.22448838e+00,   2.32591662e-01,
         7.68364698e-02,  -1.22675117e+00,   1.06145430e+00,
         6.02021043e-02,  -3.28481355e-01,  -6.29792522e-02,
         8.99148284e-01,   1.21415098e+00,   1.04140634e+00,
        -2.10448447e-02,  -8.05345046e-03,   3.11276370e-01,
        -2.49785588e-01,   6.99665675e-01,   3.28019238e-01,
         5.51753367e-02,  -2.38881841e-01,   2.35072851e-02,
         7.57507758e-04,  -2.56055426e-01,   2.43958855e-01,
        -5.97183608e-01,  -1.17651703e-01,   1.17859836e+00,
        -1.37564612e+00,   2.83002269e-01,  -2.19304013e-01,
         6.44242372e-01,  -4.08569929e-01,   6.10128794e-01,
         1.31370326e+00,   2.90977089e-01,  -4.49479475e-01,
         2.20924413e-02,  -1.89460829e-01,   1.43193313e+00,
        -8.03886173e-01,  -3.30758708e-01,  -1.31687476e-01,
        -1.28303359e+00,   5.65799447e-01,  -8.70841444e-01,
        -1.59800306e-01,  -4.52863256e-01,   7.06501896e-01,
        -3.94478281e-01,   1.31459713e+00,   5.61339582e-01,
         3.18831060e-01,  -8.19578627e-01,   1.77268085e-01,
        -9.98886179e-01,  -5.26550573e-01,   9.26017940e-01,
         9.89938231e-01,  -4.10587843e-01,  -7.10504532e-01,
        -2.90407324e-01,   6.05878774e-09,   3.31340834e-09,
        -6.19324597e-09,  -5.60348482e-09,   4.51194876e-09,
         2.36650929e-09,  -3.15260861e-08,   3.45170283e-09,
         5.71270918e-09,  -1.50185297e-08,  -4.35126893e-09,
        -1.04391581e-08,   6.79582681e-09,   3.04775539e-09,
         3.15981105e-09,  -2.77164635e-08,  -8.03332530e-09,
        -8.90582703e-09,   2.49693333e-09,  -1.06394985e-08,
         3.98825866e-09,   6.41903576e-09,   5.11202234e-09,
        -1.88674314e-08,  -2.06023946e-09,   3.81457497e-09,
         1.68580313e-10,   6.19036605e-09,   2.44996473e-09,
        -8.14595766e-09,  -2.03295574e-08,  -5.35912689e-09,
         1.47268924e-08,   3.74791516e-09,   1.21041182e-08,
        -7.18378556e-09,  -4.27012917e-09,   8.16016477e-09,
        -4.08093429e-09,  -1.38327478e-08,  -7.95351127e-09,
        -7.78247965e-09,   1.13857924e-08,   2.03553549e-09,
         5.82344958e-09,   3.38825948e-09,   5.75736254e-09,
         3.54025405e-09,   5.36272394e-09,   9.08418844e-09,
        -1.35644700e-09,  -6.27168115e-09,  -5.93538269e-09,
        -8.01090250e-09,  -3.90718456e-09,  -6.91883562e-09,
         1.27193548e-08,  -4.78716912e-09,   1.19220141e-10,
         3.44493765e-09,   2.29307325e-09,  -9.69004209e-09,
        -4.31553432e-09,   6.81991073e-09,  -8.91593725e-09,
        -1.70955769e-08,  -4.12184282e-09,  -1.28040421e-08,
        -2.40023098e-10,  -1.94270054e-08,  -8.59561341e-09,
        -3.28856206e-09,  -1.02953699e-08,  -1.55286008e-09,
        -1.71168143e-08,   1.88836946e-09,  -8.02094257e-09,
        -4.29224701e-09,  -6.99669476e-09,  -7.23123563e-09,
        -6.75491842e-09,   8.65733179e-11,  -1.89507211e-09,
         4.80539414e-09,  -5.31664028e-09,   2.14966535e-09,
         4.63635859e-10,   1.65746377e-08,  -6.14730362e-09,
        -1.79351784e-08,  -6.53297374e-09,  -1.37661174e-09,
        -8.57552116e-09,   1.58475020e-08,   3.35271641e-09,
         3.67136627e-10,  -4.97187458e-10,  -9.78975826e-09,
         1.93284574e-09,  -4.13124995e-09,   5.37006564e-12,
        -7.55774606e-10,  -3.37045074e-09,   1.50259532e-09,
        -4.33022816e-09,   1.21517272e-08,   7.49192821e-09,
        -4.51378627e-09,   2.75673623e-09,   4.12115992e-09,
        -3.99237035e-09,   6.50760624e-10,  -1.96728046e-08,
         8.60043347e-09,  -1.72094999e-08,  -1.21902086e-08,
        -1.67807587e-09,  -8.61862218e-09,  -1.34890916e-08,
         9.32458933e-11,  -2.59406993e-09,  -5.01071027e-09,
         7.30023675e-10,   2.25282051e-09,  -2.78307951e-09,
         6.81133827e-09,  -3.03843751e-09,   7.89371474e-09,
        -5.90432441e-09,  -1.65253657e-09,   6.89071450e-09,
        -4.78557001e-09,   4.78353286e-09,  -1.45723679e-08,
         1.63246655e-09,  -6.11997181e-09,  -2.87555019e-09,
         1.99139536e-09,  -7.22819926e-09,   8.24714471e-09,
        -8.98706160e-09,   7.64063445e-09,   1.22592646e-09,
         3.53358945e-09,   3.95017371e-09,  -6.69550811e-09,
         4.60807508e-09,   3.31323776e-10,  -1.07303789e-08,
        -3.52668490e-09,   3.63410017e-09,   9.71854344e-09,
         1.07483575e-09,   1.66749197e-09,  -4.64007746e-09,
        -3.00173204e-09,   9.61356888e-09,  -4.05384455e-09,
         1.44665602e-09,  -6.46893032e-09,  -4.91254974e-09,
         3.79769353e-12,  -7.35968663e-09,  -7.93976339e-09,
         4.41647854e-09,   1.23082668e-09,  -1.60495412e-09,
        -8.05785372e-09,   6.07001726e-09,  -3.03885008e-09,
         1.39633708e-09,   2.05997768e-09,  -1.25841739e-08,
        -1.18645715e-08,   3.38999193e-10,  -1.00862342e-08,
         2.45840295e-09,  -1.73697986e-09,   2.06692187e-09,
         5.62047936e-09,   3.34786210e-10,  -1.06172857e-08,
        -1.08006198e-08,   4.49141499e-09,   3.57065930e-09,
         1.61767036e-08,   5.72216449e-09,  -5.70821493e-09,
        -9.19340976e-10,  -9.52352617e-10,  -1.13333601e-08,
        -6.79853868e-10,  -6.04813263e-09,  -9.16936415e-09,
        -5.30858824e-09,  -9.12700327e-09,   2.76168475e-09,
        -4.86561783e-09,  -6.27938783e-10,  -4.88732395e-09,
        -1.00656171e-08,  -3.95856559e-09,  -9.14615903e-09,
         3.30366309e-09,  -6.59814493e-09,  -2.34944811e-10,
        -1.11819029e-08,  -1.59096336e-09,  -1.04657943e-08,
         4.05746925e-09,  -3.55291205e-09,   8.31046379e-10,
        -2.87143795e-09,  -3.00049115e-09,   2.88910009e-09,
        -1.46933801e-08,  -1.64579507e-09,  -9.30223691e-09,
        -6.23765306e-10,  -1.24105783e-08,  -1.24427313e-09,
         1.63875333e-09,  -9.40793711e-09,  -3.52267445e-09,
        -1.23052045e-09,  -8.61353458e-09,   2.91396402e-09,
         4.41069521e-09,   6.55435275e-09,  -1.82080705e-09,
         1.03095576e-08,  -1.67883140e-09,  -2.03981013e-09,
        -2.09508873e-10,   2.97962883e-09,  -6.07595906e-09,
         9.83740557e-09,  -1.07375931e-08,  -2.72204223e-09,
        -9.12007879e-09,   6.95844048e-09,  -8.66027017e-09,
        -1.37163599e-09,  -6.32813355e-09,   2.86228583e-10,
        -1.11927547e-08,  -1.90317691e-10,   5.40127121e-10,
         1.83176783e-09,  -9.05665470e-09,   2.22425894e-09,
         4.36693714e-09,  -3.49339959e-10,  -5.78559894e-10,
        -3.66058901e-09,   1.06346272e-08,  -2.46122792e-09,
        -1.23127306e-08,   5.18205722e-09,  -4.81470132e-09,
         4.87445727e-09,  -7.02410813e-09,  -8.33506490e-09,
        -2.11013235e-10,  -1.74644535e-08,   3.14239678e-10,
        -1.00205012e-08,  -5.61937218e-10,  -6.35016385e-09,
        -5.33650789e-09,   7.66031334e-09,   2.92544468e-09,
         2.12814875e-09,   1.16510225e-09,  -9.41688129e-09,
         3.11097962e-09,  -7.51472772e-09,  -5.33785924e-09,
        -8.07616753e-09,  -3.82916329e-09,   3.94975902e-09,
        -1.73087526e-08,   1.39256487e-09,  -2.64607324e-09,
        -5.33013096e-09,  -8.13678314e-09,  -1.58050405e-09,
        -2.16706748e-09,   9.84535704e-09,  -1.41989872e-09,
        -7.74917815e-09,   4.34746688e-09,  -2.13556846e-09,
        -1.19732353e-08,  -3.92174977e-09,   3.05637542e-09,
        -3.86028035e-09,   9.52473821e-09,  -1.01666840e-08,
         6.59253905e-09,  -3.32374956e-09,  -1.53250106e-09,
        -3.67461371e-09,  -1.54939572e-08,  -2.38769107e-09,
         1.42282572e-10,  -8.37462013e-09,   1.47094860e-08,
        -9.08344880e-10,   5.12910491e-09,  -6.00993170e-09,
        -5.23728758e-09,  -6.76150386e-09,   1.24541738e-08,
        -8.20259814e-09,  -7.73258357e-09,   4.43935864e-09,
         6.65840047e-09,   1.15523622e-09,  -6.92791548e-10,
        -2.96903292e-09,   1.94413285e-09,   4.12637750e-09,
         7.31405110e-10,  -1.30009202e-08,  -9.46386077e-10,
         8.88264959e-09,  -3.97698693e-09,  -9.94427638e-09,
         2.69459311e-09,  -5.12146710e-09,  -9.21122687e-09,
         2.08994130e-09,   2.27311531e-09,   3.92647101e-10,
        -9.37795361e-09,  -3.10580675e-09,  -2.08868466e-09,
        -6.98287124e-09,  -5.07887634e-09,  -7.47037112e-09,
        -6.07831053e-09,  -1.03629375e-09,   5.02656944e-09,
        -8.23538623e-10,  -8.02569319e-10,   1.17413118e-09,
        -2.62026776e-09,  -9.96925332e-09,  -7.66276729e-10,
        -1.25729673e-09,  -9.29246215e-09,   4.38664698e-09,
        -3.06862726e-09,  -2.55195196e-09,  -7.64320312e-09,
         2.57603796e-09,   3.26845731e-09,  -9.51292499e-09,
        -1.02331769e-08,  -8.42978558e-09,  -4.83402999e-09,
         8.42167121e-09,  -7.39887295e-09,   6.70754434e-09,
         9.20650188e-10,  -8.58906647e-09,   3.82694663e-09,
         7.51008871e-09,  -5.75128088e-09,  -4.43777764e-09,
        -3.93751389e-09,  -1.04893833e-09,  -1.05021598e-09,
        -6.72148026e-09,   1.94804488e-09,  -1.06838519e-08,
        -1.10444777e-08,  -2.86460797e-09,  -3.66817330e-10,
         1.78965344e-08,  -8.40050764e-09,  -1.06775224e-08,
        -6.62983420e-09,   3.89609924e-10,  -3.57020605e-09,
         4.07757198e-09,   4.66550811e-09,   4.23069532e-11,
        -5.51584452e-09,  -2.80315046e-09,  -7.39535340e-10,
         1.50583933e-10,  -1.58500605e-09,  -7.53094106e-09,
        -5.00356951e-09,   8.13562093e-09,  -3.20413504e-09,
         1.30437707e-09,  -5.18570955e-10,  -5.04839973e-09,
         5.57606824e-09,   9.51108767e-09,   4.87305331e-09,
        -4.98139484e-09,   1.44495288e-09,  -2.19901454e-11,
         1.04725463e-08,  -6.74614504e-09,  -1.06131379e-09,
        -4.04223389e-09,   1.06550062e-10,   5.49352481e-09,
        -5.41529014e-09,  -1.01770942e-08,   6.14452669e-09,
         2.93802613e-09,   8.55203000e-09,  -4.82680170e-09,
         6.61382145e-09,  -1.81176275e-09,  -1.28181439e-08,
        -2.50983731e-09,  -2.42614673e-08,  -5.25352571e-09,
        -1.19364425e-09,   3.71843342e-09,  -3.63829412e-09,
        -1.77195844e-09,  -4.00236716e-09,   1.18267164e-10,
         8.25285050e-09,  -7.63646267e-10,   1.80450951e-09,
        -8.32363264e-09,  -1.77909854e-09,  -2.37287133e-09,
        -2.68207354e-08,   2.24330569e-09,   2.91080189e-09,
        -1.18830739e-08,  -1.63543843e-08,  -7.51796824e-09,
         8.07273017e-11,  -1.56956005e-08,   1.72128556e-08,
         1.39301483e-09,   2.08725074e-09,  -6.88804076e-09,
        -1.10269411e-08,  -5.75454584e-09,   3.88194090e-09,
        -4.92689173e-10,  -1.48877468e-09,   5.77421709e-09,
        -6.92185399e-09,   7.37023442e-09,   1.56043266e-09,
        -3.31808299e-09,  -8.52667915e-09,   6.94276333e-10,
        -4.20538046e-09,  -6.04947568e-09,   1.37080952e-08,
        -5.46443582e-09,  -3.88337077e-09,   7.07885759e-09,
        -1.35333623e-08,  -6.03620180e-09,   2.56553887e-09,
        -6.80859397e-09,  -9.59442099e-09,  -1.25231127e-08,
        -3.58668508e-09,   1.84255598e-08,  -2.56916577e-09,
        -1.52155785e-08,   8.51370159e-09,  -1.35478729e-08,
         6.40092201e-09,   1.30402680e-08,   1.75561909e-10,
         3.63950639e-09,  -1.98274888e-08,  -5.20848266e-09,
        -1.16929118e-10,  -7.82143540e-09,   4.91864271e-09,
        -6.79746290e-09,  -4.30308717e-08,  -1.50555853e+00,
         1.15187071e-01,   1.31561684e-01,  -1.46686064e-01,
        -9.43906355e-01,  -1.67481066e-01,   6.21474167e-01,
         1.03416701e+00,  -3.21888475e-02,   5.72747531e-01,
        -6.78519684e-01,   4.04318138e-01,  -6.46883965e-01,
        -1.47096131e+00,  -2.63405099e-01,   9.08335226e-01,
         3.78167608e-01,  -4.73648773e-01,  -3.86606463e-01,
         1.01311867e+00,   2.08923432e-02,  -5.05177139e-01,
        -2.13760695e-01,   1.77141621e-01,   2.40479710e-01,
        -1.02430394e-02,  -1.73173389e+00,   2.82418452e-01,
        -5.31992336e-01,   4.54446584e-01,  -2.04123199e-01,
        -4.39794152e-01,   6.59426310e-01,  -5.28741276e-01,
         1.34455964e+00,   1.77155686e-01,  -4.17528186e-01,
        -7.08383721e-01,  -4.55199467e-02,   6.02736745e-01,
        -5.87752792e-01,   3.57743428e-01,   3.53092671e-01,
        -4.01536689e-01,  -1.15279254e+00,   3.28200867e-01,
         2.07600829e-01,   1.47526611e+00,   3.56667013e-01,
         4.69541526e-01,   2.17111962e-01,   1.57781777e-01,
        -7.50503470e-01,   2.68513449e-01,   2.57040897e-01,
        -9.72537156e-02,   5.53575205e-01,  -2.59197585e-01,
         9.84686923e-02,   3.16200511e-01,  -4.38998308e-01,
        -4.33786202e-01,  -3.49767657e-02,  -4.23838272e-02,
         8.25486280e-01,  -1.05461129e+00,   2.76752043e-01,
         6.98784872e-02,   4.65415551e-01,  -3.64409734e-01,
         5.13885240e-02,  -2.29929166e-01,  -3.27278675e-01,
        -3.79104555e-01,  -7.24051778e-01,   9.60256493e-01,
         5.07291183e-01,   2.18309537e-01,   3.12100896e-02,
        -8.96134906e-01,   8.87752652e-03,   4.11487948e-01,
        -2.04104308e-01,  -4.18640982e-01,   1.77267263e-01,
         2.32236288e-01,  -7.11316601e-01,   7.74434258e-01,
         2.45427225e-01,   2.34173441e-01,   2.48905965e-01,
        -2.45317812e-01,   2.49659633e-01,   1.48277863e-01,
        -1.32693100e-01,  -6.82501198e-01,   3.36815096e-02,
        -4.10973544e-01,  -1.22138511e-01,  -5.82457545e-02,
        -7.49795389e-01,   1.71024825e-01,   7.59177777e-02,
         6.88720315e-01,   4.23675125e-01,  -1.39841417e-01,
         3.20248781e-01,  -6.91634753e-02,   1.10413608e-01,
        -2.28979244e-01,   8.65652982e-02,   1.99964945e-01,
         6.24507692e-02,  -4.53692242e-01,   3.55604886e-01,
         8.58052169e-02,   5.12837886e-01,   6.22130198e-02,
         1.20285865e-01,  -1.50086637e-01,   5.02531680e-02,
         6.47220985e-01,  -4.62934684e-01,  -2.76644621e-01,
         2.60555044e-02,  -6.02311332e-02,   7.30736686e-02,
         5.96031407e-02,   4.95521005e-01,  -2.67771127e-01,
        -1.90753044e-02,  -9.81783532e-02,  -1.62810088e-02,
        -4.70312965e-02,  -9.23511182e-02,   1.24217644e-01,
        -9.08849785e-02,   7.13776296e-02,   2.00186421e-01,
        -2.91850218e-01,  -3.28673612e-02,   1.52524779e-01,
         4.39842899e-01,  -1.99996291e-02,   2.04826163e-01,
         1.47569319e-01,   2.01613831e-01,  -7.66415567e-02,
         1.28429236e-01,   1.87026478e-01,   5.07781586e-02,
        -8.44403388e-02,   1.33363737e-02,   4.34801192e-01,
        -1.21087068e-01,   1.71312148e-01,  -3.40228933e-02,
         1.83656404e-03,  -2.81731508e-02,  -9.70160354e-02,
         2.62217844e-01,  -1.17011620e-01,  -2.90336455e-01,
         1.26973915e-01,  -5.08018485e-03,   6.90379586e-03,
        -3.39759328e-01,  -2.58906230e-01,   1.05539489e-01,
        -1.74839221e-01,   2.98301742e-02,  -2.58375421e-02,
         1.28659711e-01,   8.56400559e-02,   3.98258433e-01,
         1.43344727e-01,   2.04889309e-02,   4.49571679e-01,
        -1.51996349e-02,   1.38936939e-01,  -3.96540622e-02,
        -1.14631094e-01,   2.35567318e-02,   2.24161507e-03,
         1.46914824e-02,  -2.79812246e-02,   1.34847149e-01,
        -3.51277589e-02,  -2.31136601e-01,  -2.88755462e-01,
        -3.73435733e-01,  -7.15594882e-02,  -1.22411158e-01,
         1.33425345e-01,  -5.91594980e-02,   6.46451074e-02,
         3.36370370e-03,   2.63438765e-01,  -1.53670160e-01,
        -4.07858539e-02,   1.81156994e-02,   4.39125012e-01,
        -1.16223769e-01,  -1.80624454e-01,  -8.87196106e-02,
         1.57353093e-01,   1.07057564e-01,   8.67560108e-03,
        -2.99172714e-01,  -1.50633164e-01,  -4.58747061e-01,
         8.48885567e-02,  -3.31029410e-01,   2.13895281e-01,
        -2.52714172e-01,  -3.16119897e-01,  -3.18746988e-01,
        -1.30095832e-01,  -1.94091917e-01,   1.29751156e-02,
         2.61403085e-02,   3.40134669e-02,  -1.29573333e-01,
         2.05823921e-01,   2.05975509e-02,   2.28916165e-01,
        -2.58148190e-01,  -3.19435920e-01,  -3.01655471e-01,
        -4.70103769e-02,  -2.49682938e-01,  -1.86893176e-01,
         7.91982345e-02,   1.64945747e-01,  -1.59364967e-01,
        -1.01129804e-01,  -7.24996310e-02,   1.97004476e-01,
         3.72753143e-02,   2.68927129e-02,  -6.39480346e-02,
        -5.25062989e-02,   1.05475899e-01,   4.49406750e-02,
         3.03349350e-02,  -3.57662116e-02,  -1.93371407e-03,
         6.08009001e-02,  -1.93235987e-02,  -1.22498095e-04])

In [175]:
projection = np.zeros((len(rmsd_ala),2))
comp1,comp2 = (1,2)
for j in range(len(rmsd_ala)):
    phi_t1 = 0
    phi_t2 = 0
    for i in range(M):
        phi_t1 += evecs[comp1][i]*kernel_matrix[j,i] + evecs[comp1][i+M]*kernel_matrix[j,i+tau]
        phi_t2 += evecs[comp2][i]*kernel_matrix[j,i] + evecs[comp2][i+M]*kernel_matrix[j,i+tau]
    projection[j] = (phi_t1,phi_t2)

In [176]:
pl.scatter(projection[:,0],projection[:,1],c=range(len(projection)),
           alpha=0.5,linewidth=0)
pl.colorbar()


Out[176]:
<matplotlib.colorbar.Colorbar instance at 0x114095e18>

In [ ]:


In [139]:
from sklearn.manifold import SpectralEmbedding
se = SpectralEmbedding()
se.

In [146]:
X = npr.randn(100,10)

In [158]:
from scipy.spatial import distance
k_ = distance.squareform(distance.pdist(X))

In [183]:
rmsd_ala.shape


Out[183]:
(500, 500)

In [264]:
from sklearn.decomposition import KernelPCA
kpca = KernelPCA(3,kernel='precomputed')

In [265]:
kpca_X = kpca.fit_transform(1-rmsd_ala)

In [266]:
pl.scatter(kpca_X[:,0],kpca_X[:,1],c=range(len(kpca_X)),linewidths=0)


Out[266]:
<matplotlib.collections.PathCollection at 0x11b060d90>

In [267]:
pl.scatter(kpca_X[:,0],kpca_X[:,1],c=dha[:len(kpca_X),3],linewidths=0)


Out[267]:
<matplotlib.collections.PathCollection at 0x14f3d2b90>

In [268]:
from sklearn.decomposition import KernelPCA
kpca = KernelPCA(3,kernel='precomputed')
kpca_X = kpca.fit_transform(kernel_matrix)

In [269]:
pl.scatter(kpca_X[:,0],kpca_X[:,1],c=dha[:len(kpca_X),3],linewidths=0)


Out[269]:
<matplotlib.collections.PathCollection at 0x14f4a2a50>

In [272]:
kernel_matrix.shape


Out[272]:
(5000, 5000)

In [373]:
def kpca(kernel_matrix,n_components=2,c=None):
    M=len(kernel_matrix)-1
    evals,evecs=linalg.eigh(kernel_matrix,eigvals=(M-n_components,M))
    order = sorted(range(len(evals)),key=lambda i:-evals[i])
    alphas = evecs[:,order]
    lambdas = evals[order]
    pl.plot(range(1,len(lambdas)+1),lambdas)
    
    
    sqrt_lambdas = np.diag(np.sqrt(lambdas))
    if c==None:
        c=range(M+1)
    
    projection = alphas.dot(sqrt_lambdas)
    pl.figure()
    pl.scatter(projection[:,1],projection[:,2],c=c[:M+1],linewidths=0,alpha=0.5)
    pl.figure()
    pl.scatter(projection[:,2],projection[:,3],c=c[:M+1],linewidths=0,alpha=0.5)
    pl.figure()
    pl.scatter(projection[:,1],projection[:,3],c=c[:M+1],linewidths=0,alpha=0.5)
    
    return lambdas,alphas

In [374]:
M=len(kernel_matrix[:100,:100])-1
n_components=10
evals,evecs=linalg.eigh(kernel_matrix[:100,:100],eigvals=(M-n_components,M))

In [375]:
evecs.shape,evals.shape


Out[375]:
((100, 11), (11,))

In [376]:
lambdas_bc,alphas_bc=kpca(kernel_matrix[:5000,:5000],n_components=n_components,c=dha[:,3])



In [377]:
lambdas_rmsd,alphas_rmsd=kpca((1-rmsd_ala)[:5000,:5000],n_components=n_components,c=dha[:,3])



In [383]:
lambdas_rmsd,alphas_rmsd=kpca((1-rmsd_fs)[:5000,:5000],n_components=n_components)



In [401]:
thinned = [fs_trajectories[i][::30] for i in range(len(fs_trajectories))]
megatraj = thinned[0]
for thin in thinned[1:]:
    megatraj += thin
megatraj


Out[401]:
<mdtraj.Trajectory with 9352 frames, 264 atoms, 23 residues, without unitcells at 0x14f4a6050>

In [413]:
megatraj_full = fs_trajectories[0]
for full_traj in fs_trajectories[1:]:
    megatraj_full += full_traj
megatraj_full


Out[413]:
<mdtraj.Trajectory with 280000 frames, 264 atoms, 23 residues, without unitcells at 0x11321d8d0>

In [404]:
n = len(megatraj)
rmsd = np.zeros((n,n))

for i in range(n):
    rmsd[i,i:] = mdtraj.rmsd(megatraj[i:n],megatraj[i])

rmsd_fs_m = rmsd + rmsd.T

In [405]:
lambdas_rmsd,alphas_rmsd=kpca((1-rmsd_fs_m),n_components=n_components)



In [387]:
lambdas_rmsd,alphas_rmsd=kpca((1-rmsd_fs_m)[:5000,:5000],n_components=n_components)



In [415]:
df_megatraj = df.fit_transform([megatraj])
df_megatraj_full = df.fit_transform([megatraj_full])

In [492]:
from msmbuilder.decomposition import tICA
tica = tICA(2,500)
X_ = tica.fit_transform(df_megatraj)[0]
X_full = tica.fit_transform(df_megatraj_full)[0]

In [493]:
pl.scatter(X_[:,0],X_[:,1],c=range(len(X_)),linewidths=0,alpha=0.5)


Out[493]:
<matplotlib.collections.PathCollection at 0x1b272d310>

In [494]:
pl.scatter(X_full[:,0],X_full[:,1],c=range(len(X_full)),linewidths=0,alpha=0.5)


Out[494]:
<matplotlib.collections.PathCollection at 0x1a97cb790>

In [ ]:


In [389]:
sqrt_lambdas = np.diag(np.sqrt(lambdas_rmsd))
projection = alphas_rmsd.dot(sqrt_lambdas)

In [390]:
projection.shape


Out[390]:
(5000, 11)

In [398]:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

ax.scatter(projection[:,1],projection[:,2],projection[:,3],
           alpha=0.5,c=range(len(projection)),cmap='Blues',linewidths=0)


Out[398]:
<mpl_toolkits.mplot3d.art3d.Path3DCollection at 0x143b98950>

In [378]:
np.cumsum(lambdas_bc)/np.sum(lambdas_bc)


Out[378]:
array([ 0.46736726,  0.66171606,  0.72044251,  0.77691215,  0.82800861,
        0.87840776,  0.92665106,  0.97237577,  0.98444308,  0.99287453,  1.        ])

In [379]:
np.cumsum(lambdas_rmsd)/np.sum(lambdas_rmsd)


Out[379]:
array([ 0.92712313,  0.95303629,  0.96000826,  0.96651669,  0.97297616,
        0.97920532,  0.98515035,  0.99104111,  0.99531238,  0.99851023,  1.        ])

In [ ]:


In [300]:
vec.shape,val.shape


Out[300]:
((3,), (100, 3))

In [290]:
kernel_matrix.shape


Out[290]:
(5000, 5000)

In [ ]:


In [ ]:


In [ ]:


In [230]:
from sklearn.decomposition import PCA
pca = PCA(2)
pca_X = pca.fit_transform(raw_pos.reshape(9999,66))

In [243]:
pl.scatter(pca_X[:,0],pca_X[:,1],c=dha[:len(pca_X),3],linewidths=0)


Out[243]:
<matplotlib.collections.PathCollection at 0x14e37ac10>

In [244]:
kpca_rbf = KernelPCA(2,kernel='rbf')
rbf_X = kpca_rbf.fit_transform(raw_pos.reshape(9999,66))

In [245]:
pl.scatter(rbf_X[:,0],rbf_X[:,1],c=dha[:len(rbf_X),3],linewidths=0)


Out[245]:
<matplotlib.collections.PathCollection at 0x14e97d110>

In [197]:
df = featurizer.DihedralFeaturizer()
dha = df.transform(t)

In [202]:
dha = np.reshape(np.array(dha),(9999,4))

In [252]:
kpca_rbf = KernelPCA(2,kernel='rbf')
rbf_X = kpca_rbf.fit_transform(dha)

In [253]:
pl.scatter(rbf_X[:,0],rbf_X[:,1],c=dha[:len(rbf_X),3],linewidths=0)


Out[253]:
<matplotlib.collections.PathCollection at 0x14f16a810>

In [216]:
dha[:,3].shape


Out[216]:
(9999,)

In [217]:
kpca_X[:len(dha),0].shape


Out[217]:
(5000,)

In [241]:
pl.scatter(dha[:,3],dha[:,1],c=range(len(dha)),linewidths=0)


Out[241]:
<matplotlib.collections.PathCollection at 0x14e1b8e50>

In [ ]:

Ideas:

  • Kernel choice:

    • 1 - RMSD score
    • Binet-Cauchy kernel
  • Scaling up:

    • Look at "Large-Scale Kernel Machines"
    • Ask Thorsten on Friday
  • Kernel learning:

  • Compare tICA with CCA? The tICA(X,lag_time=t) objective function looks very similar to CCA(X=X[:-t],Y=X[t:])

In [526]:
from sklearn.cross_decomposition import CCA

In [589]:
cca = CCA(n_components=2)

In [590]:
df_megatraj[0].shape


Out[590]:
(9352, 84)

In [624]:
t = 1
length=1000
cca.fit(df_megatraj[0][:length-t],df_megatraj[0][t:length])


Out[624]:
CCA(copy=True, max_iter=500, n_components=2, scale=True, tol=1e-06)

In [625]:
cca.score(df_megatraj[0][:1000-t],df_megatraj[0][t:1000])


Out[625]:
0.33480367035714098

In [626]:
transformed = cca.transform(df_megatraj[0][:1000-t])

In [627]:
pl.scatter(transformed[:,0],transformed[:,1],c=range(len(transformed)),linewidths=0)


Out[627]:
<matplotlib.collections.PathCollection at 0x1bd6aec50>

In [628]:
transformed = cca.transform(df_megatraj_full[0])

In [629]:
pl.scatter(transformed[:,0],transformed[:,1],c=range(len(transformed)),alpha=0.5,linewidths=0)


Out[629]:
<matplotlib.collections.PathCollection at 0x1a166a250>

In [632]:
tica = tICA(2,1)
tica.fit([df_megatraj[0][:2000]])
transformed = tica.transform(df_megatraj_full)[0]

In [633]:
pl.scatter(transformed[:,0],transformed[:,1],c=range(len(transformed)),alpha=0.5,linewidths=0)


Out[633]:
<matplotlib.collections.PathCollection at 0x1c069b590>
  • 'Doubly-kernelized tICA?'
    • ICA finds linear combinations of the original parameters that form ICs, which are linearly mixed in the final output signal
    • ktICA kernelizes the combination of parameters that form tICs
    • Can we also search for nonlinear mixtures?
  • Optimize something other than autocorrelation, e.g. some information measure?

In [2]:
from msmbuilder.example_datasets import AlanineDipeptide
ala = AlanineDipeptide().get()

In [5]:
print(ala.DESCR)


The dataset consists of ten 10ns trajectories of of alanine dipeptide,
simulated using OpenMM 6.0.1 (CUDA platform, NVIDIA GTX660) with the
AMBER99SB-ILDN force field at 300K (langevin dynamics, friction coefficient
of 91/ps, timestep of 2fs) with GBSA implicit solvent. The coordinates are
saved every 1ps. Each trajectory contains 9,999 snapshots.

The dataset, including the script used to generate the dataset
is available on figshare at

http://dx.doi.org/10.6084/m9.figshare.1026131


In [8]:
ala.trajectories


Out[8]:
[<mdtraj.Trajectory with 9999 frames, 22 atoms, 3 residues, without unitcells at 0x10a0eead0>,
 <mdtraj.Trajectory with 10000 frames, 22 atoms, 3 residues, without unitcells at 0x10a0eeb90>,
 <mdtraj.Trajectory with 10000 frames, 22 atoms, 3 residues, without unitcells at 0x10a0eec50>,
 <mdtraj.Trajectory with 10000 frames, 22 atoms, 3 residues, without unitcells at 0x10a0eed10>,
 <mdtraj.Trajectory with 10000 frames, 22 atoms, 3 residues, without unitcells at 0x10a0eedd0>,
 <mdtraj.Trajectory with 10000 frames, 22 atoms, 3 residues, without unitcells at 0x10a0eee90>,
 <mdtraj.Trajectory with 10000 frames, 22 atoms, 3 residues, without unitcells at 0x10a0eef50>,
 <mdtraj.Trajectory with 10000 frames, 22 atoms, 3 residues, without unitcells at 0x10a0ee790>,
 <mdtraj.Trajectory with 10000 frames, 22 atoms, 3 residues, without unitcells at 0x10a0ee810>,
 <mdtraj.Trajectory with 10000 frames, 22 atoms, 3 residues, without unitcells at 0x10a0ee8d0>]

In [ ]: