Significant reduction in complexity can be achieved if the matrix has a certain structure, e.g. it is sparse. In the following task we consider an important example of $LU$ for tridiagonal matrices.
Some details in lecture proofs about $LU$ were omitted. Let us complete them.
In some cases $LU$ decomposition might fail. Hence in practice pivoting strategy is used. Consider this problems on a simple example of the matrix $A = \begin{pmatrix} \varepsilon & 1\\ 1 & 1 \end{pmatrix}.$
In [ ]:
Our goal now is to orthogonalize a system of linearly independent vectors $v_1,\dots,v_n$. The standard algorithm for the task is the Gram-Schmidt algorithm: $$ \begin{split} u_1 &= v_1, \\ u_2 &= v_2 - \frac{(v_2, u_1)}{(u_1, u_1)} u_1, \\ u_3 &= v_3 - \frac{(v_3, u_1)}{(u_1, u_1)} u_1 - \frac{(v_3, u_2)}{(u_2, u_2)} u_2, \\ \dots \\ u_n &= v_n - \frac{(v_n, u_1)}{(u_1, u_1)} u_1 - \frac{(v_n, u_2)}{(u_2, u_2)} u_2 - \dots - \frac{(v_n, u_{n-1})}{(u_{n-1}, u_{n-1})} u_{n-1}. \end{split} $$ Now $u_1, \dots, u_n$ are orthogonal vectors in exact arithmetics. Then to get orthonormal system you should divide each of the vectors by its norm: $u_i := u_i/\|u_i\|$.
The observed loss of orthogonality is a problem of this particular algorithm. To avoid it modified Gram-Schmidt algorithm, QR via Householder reflections or Givens rotations can be used.
Remember that multiplying a vector by a Householder matrix $H$ eliminates all elements but the first one and, thus, Householder transformations are used to compute QR decomposition. We will implement the simple version of the algorithm by successively applying $H_j$ to matrix $A$, where $j$ corresponds to the index of the column we want to zero elements in: $$H_nH_{n-1}\dots H_1A = R$$
Multiplying the first column of $A$ with a Householder matrix $H_1$ reflects $A_{:,1}$ about a hyperplane given by its normal vector $v_1$. We set $v_1$ equal to $x + \text{sgn}(x_1) \|x\|_2e_1$ so that the reflection is onto the standard basis vector $e_1 = \begin{bmatrix}1&0&0&\dots\end{bmatrix}^T$. We also need to normalize $v_1$. We thus obtained the key to forming $H_1$. Applying it results in elimination of subdiagonal elements in the first column: $$ H_1 A = \begin{bmatrix} \times & \times & \dots & \times \\ 0 & \star & \star & \star \\ 0 & \star & \star & \star \\ \dots& \dots&\dots &\dots \\ 0 & \star & \star & \star \end{bmatrix} $$
The second step now involves the submatrix given by $\star$s above. So we would like to apply some $H^\star_2$ to it and zero all the elements but the first two in the second column of $A$. $H^\star_2$ is computed analogously to the first step and then is embedded into $$ H_2 = \begin{bmatrix} 1 & 0 & \dots & 0 \\ 0 & H^\star_2 \\ \dots & \\ 0 & \end{bmatrix} $$
After $n$ steps we will arrive at $R$ and $Q = \Pi_{i=1}^n H_i$.
householder_qr(A)
which outputs Q,R
. Apply it to the matrix $B$ created above. Print out the error.
In [ ]:
Prove that for any Hermiatian matrix singular values equal to absolute value of eigenvalues. Does this hold for a general matrix? Prove or give a counterexample.
Find skeleton decomposition of the matrix $\begin{pmatrix}1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9\end{pmatrix}$.
Prove that $\|AB\|_F \leq \|A\|_2 \|B\|_F$.
The aim of this task is to build a face classifier. There are 40 persons in the database. Every person is represented by 10 photos with slightly different facial expression.
Download the database of faces from here.
Create training set:
Import first 9 images for each face ($360$ images). Represent these pictures as a matrix $F$ with $9\times 40$ columns, where each column is a reshaped 2D picture. Note: use np.reshape
to reshape matrix into a column.
Calculate and plot the mean face. Subtract it from each column of the matrix $F$.
Calculate SVD decomposition of the shifted matrix F and truncate the obtained representaition: $F_r = U_r S_r V_r^T$.
Here $U_r$ is a matrix with $r$ columns - basis set in a space of faces. $W_r = S_r V_r^T$ is a matrix of coefficients in the basis $U_r$. Note that now every image is represented as a small number of coefficients in the basis $U_r$.
Plot first 4 vectors in $U_r$ using subplots as it is done on the illustration to this problem. Now you know what eigenfaces are =)
Import testing set which is the rest of photos. Find their coefficients in the basis $U_r$. Note: coefficients can be found as a projection onto the columns of $U_r$.
Compare the obtained vectors of coefficients to vectors in $W_r$ using cosine similarity and classify testing faces. As an output give indices of faces that were misclassified when $r=5$.
In [ ]:
Let $T$ be upper triangular matrix. Prove that $TT^*=T^*T$ iff $T$ is diagonal.
Prove that normal matrix is Hermitian iff its eigenvalues are real. Prove that normal matrix is unitary iff its eigenvalues satisfy $|\lambda| = 1$.
The following problem illustrates instability of the Jordan form. Find theoretically eigenvalues of the perturbed Jordan block $$ J(\varepsilon) = \begin{bmatrix} \lambda & 1 & & & 0 \\ & \lambda & 1 & & \\ & & \ddots & \ddots & \\ & & & \lambda & 1 \\ \varepsilon & & & & \lambda \\ \end{bmatrix}_{n\times n} $$
power_method(A, x0, num_iter)
that outputs approximation to eigenvector $x$, eigenvalue $\lambda$ and history of errors $\{\|Ax_k - \lambda_k x_k\|_2\}$. Make sure that the method conveges to the correct solution on a matrix $\begin{bmatrix} 2 & -1 \\ -1 & 2 \end{bmatrix}$ which is known to have the largest eigenvalue equal to $3$.num_iter=100
and random initial guess x0
. Explain why you observe the absense of convergence. In order to avoid this problem Larry Page and Sergey Brin proposed to use the following regularization technique: $$ A_d = dA + \frac{1-d}{N} \begin{pmatrix} 1 & \dots & 1 \\ \vdots & & \vdots \\ 1 & \dots & 1 \end{pmatrix}, $$ where $d$ is small parameter (typically $d=0.85$), which is called damping factor, $A$ is of size $N\times N$. Now $A_d$ is the matrix with multiplicity of the largest eigenvalue equal to 1. Recall that computing the eigenvector of the PageRank matrix which corresponds to the largest eigenvalue has the following interpretation: consider a person who stays in a random node of the graph (i.e. opens a random web page); at each step he follows one of the outcoming edges uniformly at random (i.e. opens one of the links). So the guy randomly walks through the graph and the eigenvector we are looking for is exactly his stationary distribution — for each node it tells you the probability of visiting this particular node. Therefore if the guy has started from a part of a graph which is not connected with the other part, he will never get there. In the regularized model the person at each step follows one of the outcoming links with probability $d$ OR visits a random node from the whole graph with probability $(1-d)$.
num_iter=100
and random initial guess x0
.Let us now find the most significant articles on the simple english Wikipedia according to the PageRank model. We provide you with the adjecency matrix of all simple Wikipedia articles (file simple_wiki_matrix.mat
, matrix can be acessed by the key 'W'
) and the dictionary that maps article id to its name on Wikipedia (file simple_wiki_dict.pickle
).
np.dot(A, x)
. However, they have method .dot()
, so A.dot(x)
will work.QR_alg(A,num_iters)
to $A$. Use plt.spy(A,precision=1e-7)
to plot the $A_k$ matrix (see lecture notation). Does $A_k$ converge to upper-triangular matrix? How does this correspond to the claim about convergence of QR algorithm?Make sure you debug your implementation first on $B$ = tridiag($-1,2,-1$) matrix:
In [ ]: