Discussion:
PCA
Find the orthogonal linear projection $f: X \mapsto M^T X$ that minimizes squared reconstruction error, i.e. $$ \text{argmin}_{M \in \mathcal{O}^{d \times r}} \|X - M M^T X \|^2_F$$
MAF
Find the linear projection $M^T X$ that maximizes the correlation between time-adjacent points $$ \text{argmin}_{M \in \mathcal{O}^{d \times r}} \frac{\text{tr}(M^T \Sigma_{\tau} M)}{\text{tr}(M^T \Sigma M)}$$
where
Comments:
In [145]:
import autograd
from autograd import grad
from autograd import numpy as np
from autograd.numpy import random as npr
from autograd.numpy import linalg as la
In [5]:
from sklearn.decomposition import PCA
In [2]:
def proj_stiefel(Z,M):
''' M is a matrix on the stiefel manifold,
Z is what we want to project'''
skew_MTZ = 0.5 * (np.dot(M.T,Z) - np.dot(Z.T,M))
MMT = np.dot(M,M.T)
I = np.eye(len(MMT))
return np.dot(M,skew_MTZ) + np.dot(I - MMT,Z)
In [ ]:
def proj_grad_descent_stiefel(d,r,obj_func,
convergence_threshold=0.1,max_iter=100):
obj_grad = grad(obj_func)
# initialize M in the set of orthonormal matrices of desired dimension, M ∈ O^{d×r}
M = PCA().fit_transform(npr.randn(d,r)) # orthogonalize
M /= M.sum(1) # normalize
# while obj_func(M) has not converged:
converged = False
timed_out=False
n_iter = 0
while (not converged) and (not timed_out):
old_M = M
# calculate free gradient of the objective
free_grad = obj_grad(M)
# calculate search direction (eq. 34)
proj_M = proj_stiefel(M,free_grad)
retr_M = retr_stiefel_closed_form(
# while the objective_func(retraction(projection(step))) is not < f(M) - \eps
# adjust step size using linesearch / retraction (eq. 35)
# update M
n_iter += 1
if n_iter >= max_iter:
if np.mean((new_M - old_M)**2) < thresh:
converged = True
# return M
Notes from "optimization on manifolds" (http://press.princeton.edu/chapters/absil/)
Chapter 1: Intro
Two prototypical structures for problems to be considered:
Goal: efficiently translate iterative optimization procedures from their original context (Euclidean vector space) to nonlinear manifolds.
Key concepts from diff-geom --> numerical relaxation:
Chapter 2: Motivations and Applications
We'll look at a case study, the eigenvalue problem. This is important in many areas, and since eigenvectors are scale-invariant they nicely illustrate the approach in this book
Review:
The kernel $\text{ker}(B)$ of a matrix $B$ is the subspace formed by the vectors $x$ such that $Bx = 0$.
... incomplete notes ...
In [23]:
X = npr.randn(1000,10)
from sklearn.decomposition import PCA
pca = PCA(2)
pca.fit(X)
M = pca.components_.T
M.shape
Out[23]:
In [15]:
def cov(X):
''' compute covariance matrix of X '''
return np.dot(X,X.T) / len(X)
def t_cov(X,tau=1):
''' compute time-lagged covariance matrix of X '''
E1 = np.dot(X[tau:],X[:-tau].T) / len(X)
E2 = np.dot(X[:-tau],X[tau:].T) / len(X)
return 0.5*(E1 + E2)
In [16]:
def MAF_obj(M,cov_X,t_cov_X):
''' minimize this'''
return np.trace(M.T.dot(t_cov_X.dot(M)))
In [24]:
def PCA_obj(M,X):
return np.sum((X.T - M.dot(M.T.dot(X.T)))**2)
In [28]:
opt = PCA_obj(M,X)
opt
Out[28]:
In [43]:
M,M.shape,(M**2).sum(0),M / (M**2).sum(0)
Out[43]:
In [72]:
winners = []
for i in range(100):
M_ = M+npr.randn(*M.shape)*0.001
for i in range(M_.shape[1]):
M_[:,i] = M_[:,i] / np.sqrt(sum(M_[:,i]**2))
M_val = PCA_obj(M_,X)
if M_val < opt:
print('omg nooo',M_val)
winners.append((M_,M_val))
In [ ]:
In [76]:
M_ = winners[0][0]
(M_**2).sum(0)
M_[:,0].dot(M_[:,1]),M[:,0].dot(M[:,1])
Out[76]:
In [56]:
M_[0]
Out[56]:
In [71]:
(M_[:,0]**2).sum(),((M_[:,0] / (M_[:,0]**2).sum())**2).sum()
Out[71]:
In [82]:
np.linalg.norm(M_.T.dot(M_) - np.eye(len(M_.T)))
Out[82]:
In [83]:
np.allclose(M_.T.dot(M_),np.eye(len(M_.T)))
Out[83]:
(notes from [CG15] Appendix A)
$\newcommand{\odr}{\mathcal{O}^{d\times r}}$ $\newcommand{\rdr}{\mathbb{R}^{d \times r}}$
Overview: We want to take a free gradient in $\rdr$, project it onto the tangent space $T_M \odr$, then retract from the tangent space to the manifold $\odr$
Tangent space: linear approximation to the manifold at a particular point.
Projection from $\rdr$ onto the tangent space $T_M \odr$
Retraction from $T_M \odr$ to $\odr$
In [ ]:
dot = np.dot
In [153]:
M,M.shape
Out[153]:
In [149]:
nabla = np.zeros(M.shape)
nabla[0,0] = 1
nabla[1,1] = 1
In [154]:
M + 0.01*proj_stiefel(M,-nabla)
Out[154]:
In [136]:
M.T.dot(M)
Out[136]:
In [128]:
def skew(M,Z):
''' return the skew-symmetric part of $M^T Z$'''
return 0.5 * (dot(M.T,Z) - dot(Z.T,M))
def proj_stiefel(M,Z):
''' M is a d-by-r point on the stiefel manifold, defining a tangent
space $T_M \mathcal{O}^{d \times r}$
$Z \in \mathbb{R}^{d\times r}$ is an arbitrary point
we would like to project onto the '''
MskewMTZ = np.dot(M,skew(M,Z))
IMMTZ = dot((np.eye(len(M)) - dot(M,M.T)),Z)
return MskewMTZ + IMMTZ
In [160]:
def retr_stiefel_closed_form(M,Z):
''' compute '''
ZTZ = dot(Z.T,Z)
return la.inv(la.sqrtm(dot(M+Z,(np.eye(len(ZTZ)) + ZTZ))))
def retr_stiefel_svd(M,Z):
U,s,V = la.svd(M+Z)
print(U.shape,V.shape)
return dot(U,V.T)
In [156]:
retr_stiefel_svd(M,M + 0.01*proj_stiefel(M,-nabla))
In [124]:
X.shape, np.dot(X,X.T).shape
Out[124]:
In [125]:
def proj(X,U):
skew_XTU = 0.5 * (np.dot(X.T,U) - np.dot(U.T,X))
XXT = np.dot(X,X.T)
I = np.eye(len(XXT))
return np.dot(X,skew_XTU) + np.dot(I - XXT,U)
In [126]:
X = M
Z = M_
In [144]:
proj_stiefel(X,X)
Out[144]:
In [135]:
proj(X,Z)
Out[135]:
In [99]:
X.shape
Out[99]:
In [103]:
X.T.shape
Out[103]:
In [106]:
np.dot(X.T,X).shape,np.dot(X,X.T).shape
Out[106]:
In [107]:
X.T.dot(X).shape,(X.dot(X.T)).shape
Out[107]:
In [108]:
np.linalg.norm(X),np.trace(np.dot(X,X.T)),np.trace(np.dot(X.T,X))
Out[108]:
In [89]:
X
Out[89]:
In [ ]: