Problem 1 (LU decomposition)

30 pts

Find exact number of operations required to calculate the LU decomposition of an arbitrary dense matrix.
Is it possible to solve 1 000 000 x 1 000 000 dense linear system on your laptop within 1 month via LU decomposition. Assume that $LU$ decomposition is not given.

Significant reduction in complexity can be achieved if the matrix has a certain structure, e.g. it is sparse. In the following task we consider an important example of $LU$ for tridiagonal matrices.

Find number of operations to compute $LU$ decomposition of tridiagonal matrix, taking into account only non-zero elements. How many nonzero elements are in factors $L$ and $U$ and where are they located? Conclude what is the complexity to solve a linear system with tridiagonal matrix.

Some details in lecture proofs about $LU$ were omitted. Let us complete them.

Prove that if $LU$ decomposition exists, then matrix is strictly regular.
Prove that if $A$ is strictly regular, then $A_1 = D - \frac 1a b c^T$ (see lectures for notations) is also strictly regular.

In some cases $LU$ decomposition might fail. Hence in practice pivoting strategy is used. Consider this problems on a simple example of the matrix $A = \begin{pmatrix} \varepsilon & 1\\ 1 & 1 \end{pmatrix}.$

Find analytically $LU$ decomposition with and without pivoting for the matrix $A$. Why $LU$ without pivoting fails when $\varepsilon$ is small?
Find condition number of $A$ for $\|\cdot\|_2$. For what $\varepsilon$ this matrix is ill-conditioned?



In [ ]:

Problem 2 (QR decomposition)

30 pts

Standard Gram-Schmidt algorithm (10 pts)

Our goal now is to orthogonalize a system of linearly independent vectors $v_1,\dots,v_n$. The standard algorithm for the task is the Gram-Schmidt algorithm: $$ \begin{split} u_1 &= v_1, \\ u_2 &= v_2 - \frac{(v_2, u_1)}{(u_1, u_1)} u_1, \\ u_3 &= v_3 - \frac{(v_3, u_1)}{(u_1, u_1)} u_1 - \frac{(v_3, u_2)}{(u_2, u_2)} u_2, \\ \dots \\ u_n &= v_n - \frac{(v_n, u_1)}{(u_1, u_1)} u_1 - \frac{(v_n, u_2)}{(u_2, u_2)} u_2 - \dots - \frac{(v_n, u_{n-1})}{(u_{n-1}, u_{n-1})} u_{n-1}. \end{split} $$ Now $u_1, \dots, u_n$ are orthogonal vectors in exact arithmetics. Then to get orthonormal system you should divide each of the vectors by its norm: $u_i := u_i/\|u_i\|$.

Write out what is $Q$ and $R$ obtained in the process described.

Implement the described Gram-Schmidt algorithm and check it on a random $100\times 100$ matrix $B$. Print out the error. Note: To check orthogonality calculate the matrix of scalar products $G_{ij} = (u_i, u_j)$ (called Gram matrix) which should be equal to the identity matrix $I$. Error $\|G - I\|_F$ will show you how far is the system $u_i$ from orthonormal.

Create a Hilbert matrix $A$ of size $100\times 100$ without using loops. Othogonalize its columns by the described Gram-Schmidt algorithm. Is the Gram matrix close to the identity matrix in this case? Why?

The observed loss of orthogonality is a problem of this particular algorithm. To avoid it modified Gram-Schmidt algorithm, QR via Householder reflections or Givens rotations can be used.

Householder QR (20 pts)

Remember that multiplying a vector by a Householder matrix $H$ eliminates all elements but the first one and, thus, Householder transformations are used to compute QR decomposition. We will implement the simple version of the algorithm by successively applying $H_j$ to matrix $A$, where $j$ corresponds to the index of the column we want to zero elements in: $$H_nH_{n-1}\dots H_1A = R$$

Multiplying the first column of $A$ with a Householder matrix $H_1$ reflects $A_{:,1}$ about a hyperplane given by its normal vector $v_1$. We set $v_1$ equal to $x + \text{sgn}(x_1) \|x\|_2e_1$ so that the reflection is onto the standard basis vector $e_1 = \begin{bmatrix}1&0&0&\dots\end{bmatrix}^T$. We also need to normalize $v_1$. We thus obtained the key to forming $H_1$. Applying it results in elimination of subdiagonal elements in the first column: $$ H_1 A = \begin{bmatrix} \times & \times & \dots & \times \\ 0 & \star & \star & \star \\ 0 & \star & \star & \star \\ \dots& \dots&\dots &\dots \\ 0 & \star & \star & \star \end{bmatrix} $$

The second step now involves the submatrix given by $\star$s above. So we would like to apply some $H^\star_2$ to it and zero all the elements but the first two in the second column of $A$. $H^\star_2$ is computed analogously to the first step and then is embedded into $$ H_2 = \begin{bmatrix} 1 & 0 & \dots & 0 \\ 0 & H^\star_2 \\ \dots & \\ 0 & \end{bmatrix} $$

After $n$ steps we will arrive at $R$ and $Q = \Pi_{i=1}^n H_i$.

Implement Householder QR described above as a function householder_qr(A) which outputs Q,R. Apply it to the matrix $B$ created above. Print out the error.

Apply it to the Hilbert matrix $A$ created in the first part of the problem and print out the error. Consider how stable is Householder compared to Gram-Schmidt.

Derive the number of operations needed for the Householder QR. Do the same thing for QR via Givens rotations. Compare the constants; which one is faster?



In [ ]:

Problem 3

45 pts

Theoretical tasks (15 pts)

Prove that for any Hermiatian matrix singular values equal to absolute value of eigenvalues. Does this hold for a general matrix? Prove or give a counterexample.
Find skeleton decomposition of the matrix $\begin{pmatrix}1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9\end{pmatrix}$.
Prove that $\|AB\|_F \leq \|A\|_2 \|B\|_F$.

Eigenfaces (30 pts)

The aim of this task is to build a face classifier. There are 40 persons in the database. Every person is represented by 10 photos with slightly different facial expression.

Download the database of faces from here.
Create training set:

Import first 9 images for each face ($360$ images). Represent these pictures as a matrix $F$ with $9\times 40$ columns, where each column is a reshaped 2D picture. Note: use np.reshape to reshape matrix into a column.

Calculate and plot the mean face. Subtract it from each column of the matrix $F$.
Calculate SVD decomposition of the shifted matrix F and truncate the obtained representaition: $F_r = U_r S_r V_r^T$.

Here $U_r$ is a matrix with $r$ columns - basis set in a space of faces. $W_r = S_r V_r^T$ is a matrix of coefficients in the basis $U_r$. Note that now every image is represented as a small number of coefficients in the basis $U_r$.

Plot first 4 vectors in $U_r$ using subplots as it is done on the illustration to this problem. Now you know what eigenfaces are =)
Import testing set which is the rest of photos. Find their coefficients in the basis $U_r$. Note: coefficients can be found as a projection onto the columns of $U_r$.
Compare the obtained vectors of coefficients to vectors in $W_r$ using cosine similarity and classify testing faces. As an output give indices of faces that were misclassified when $r=5$.



In [ ]:

Problem 4 (eigenvalues)

55 pts

Theoretical tasks (15 pts)

Let $T$ be upper triangular matrix. Prove that $TT^*=T^*T$ iff $T$ is diagonal.
Prove that normal matrix is Hermitian iff its eigenvalues are real. Prove that normal matrix is unitary iff its eigenvalues satisfy $|\lambda| = 1$.
The following problem illustrates instability of the Jordan form. Find theoretically eigenvalues of the perturbed Jordan block $$ J(\varepsilon) = \begin{bmatrix} \lambda & 1 & & & 0 \\ & \lambda & 1 & & \\ & & \ddots & \ddots & \\ & & & \lambda & 1 \\ \varepsilon & & & & \lambda \\ \end{bmatrix}_{n\times n} $$

PageRank (30 pts)

Damping factor importance

Write PageRank matrix $A$ that corresponds to the following graph: What is its largest eigenvalue? What multiplicity does it have?

Implement the power method for given matrix $A$ and initial guess $x_0$. It should be organized as a function power_method(A, x0, num_iter) that outputs approximation to eigenvector $x$, eigenvalue $\lambda$ and history of errors $\{\|Ax_k - \lambda_k x_k\|_2\}$. Make sure that the method conveges to the correct solution on a matrix $\begin{bmatrix} 2 & -1 \\ -1 & 2 \end{bmatrix}$ which is known to have the largest eigenvalue equal to $3$.

Run the power method for the graph presented above and plot errors $\|Ax_k - \lambda_k x_k\|_2$ as a function of $k$ for num_iter=100 and random initial guess x0. Explain why you observe the absense of convergence.

In order to avoid this problem Larry Page and Sergey Brin proposed to use the following regularization technique: $$ A_d = dA + \frac{1-d}{N} \begin{pmatrix} 1 & \dots & 1 \\ \vdots & & \vdots \\ 1 & \dots & 1 \end{pmatrix}, $$ where $d$ is small parameter (typically $d=0.85$), which is called damping factor, $A$ is of size $N\times N$. Now $A_d$ is the matrix with multiplicity of the largest eigenvalue equal to 1. Recall that computing the eigenvector of the PageRank matrix which corresponds to the largest eigenvalue has the following interpretation: consider a person who stays in a random node of the graph (i.e. opens a random web page); at each step he follows one of the outcoming edges uniformly at random (i.e. opens one of the links). So the guy randomly walks through the graph and the eigenvector we are looking for is exactly his stationary distribution — for each node it tells you the probability of visiting this particular node. Therefore if the guy has started from a part of a graph which is not connected with the other part, he will never get there. In the regularized model the person at each step follows one of the outcoming links with probability $d$ OR visits a random node from the whole graph with probability $(1-d)$.

Now run the power method with $A_d$ and plot errors $\|A_d x_k - \lambda_k x_k\|_2$ as a function of $k$ for $d=0.99$, num_iter=100 and random initial guess x0.

Simple English Wiki

Let us now find the most significant articles on the simple english Wikipedia according to the PageRank model. We provide you with the adjecency matrix of all simple Wikipedia articles (file simple_wiki_matrix.mat, matrix can be acessed by the key 'W') and the dictionary that maps article id to its name on Wikipedia (file simple_wiki_dict.pickle).

Load the adjecency matrix and the dictionary into Python. Normalize each column of the adjecency matrix to get a matrix from the PageRank model. Check that the obtained matrix is stochastic. Note that your matrix contains a lot of zeros and therefore is stored in the sparse format. Sparse matrices do not support matrix-vector multiplications via np.dot(A, x). However, they have method .dot(), so A.dot(x) will work.

Run power method starting from the vector of all ones and plot errors $\|Ax_k - \lambda_k x_k\|_2$ as a function of $k$ for $d=0.85$.

Print names of top-5 articles when $d=0.85$.

QR algorithm (10 pts)

Create tridiagonal matrix $A$ = tridiag(1,2,-2) of size $5\times5$. Implement and apply QR algorithm QR_alg(A,num_iters) to $A$. Use plt.spy(A,precision=1e-7) to plot the $A_k$ matrix (see lecture notation). Does $A_k$ converge to upper-triangular matrix? How does this correspond to the claim about convergence of QR algorithm?

Make sure you debug your implementation first on $B$ = tridiag($-1,2,-1$) matrix:



In [ ]: