Any $n \times m$ matrix $A$ can be decomposed into a product
$$
A = U \Sigma V^*,
$$
where $U$ is an $n \times k$ unitary matrix, $V$ is an $m \times k$ unitary matrix, and $\Sigma = \mathrm{diag}(\sigma_1, \ldots, \sigma_k)$ a diagonal matrix with $\sigma_1 \geq \ldots \sigma_k \geq 0$ and $k = \min(m, n)$.
One of the main features of the SVD is to solve the minimization problem
$$
\Vert A - A_r \Vert \rightarrow \min,
$$
i.e. find a closest rank-r matrix to a given matrix.
The solution is given as
$$
A_r = U_r \Sigma_r V^*_r,
$$
where $U_r$ are the first $r$ columns of the matrix $U$, $V_r$ are the first $r$ columns of the matrix $V$ and $\Sigma_r$ are the first $r$ singular values
Proof. Recall the definition of $\sigma = \Vert A \Vert_2$ as an operator norm. Thus, there exist vectors $x$, $y$ of unit length such that $Ay = \sigma x$. Now introduce two unitary matrices
$$U = [x, U_1], \quad V = [y, V_1].$$
Then,
$$
U^* A V = \begin{bmatrix}
\sigma & w^* \\
0 & B
\end{bmatrix},
$$
and
$$
\Vert U^* A V \begin{bmatrix}
\sigma \\
w
\end{bmatrix}\Vert^2_2 \geq (\sigma^2 + w^* w)^2 \longrightarrow \Vert U^* A V \Vert^2_2 \geq \sigma^2 + w^* w.
$$
Since $\Vert U^* A V \Vert = \sigma$, we get $w = 0$.
From $A = U \Sigma V^*$ we get $$ A A^* = U \Sigma^2 U^*, \quad A^* A = V \Sigma^2 V^*, $$ i.e. you can compute the singular values as a square roots of the eigenvalues of $A^*A$ (or $A A^*$). It is not a good idea, since the maximal precision you might get this way is $\sqrt{\varepsilon}$. You just loose half of the digits!
All SVD computations are done in two steps.
The Golub-Kahan bidiagonalization algorithm was the first (and still one of the most popular) algorithms for solving the SVD problem. It is basically a QR algorithm, implicitly applied to a matrix $B^* B$.
SVD can be used to compute pseudoinverses of a matrix, trying to solve the ill-conditioned / rectangular systems, $$A^{\dagger} = V^* \Sigma^{\dagger} U,$$ where $\Sigma^{\dagger}_i = \min(\frac{1}{\sigma_i}, \frac{1}{\delta})$, i.e. small singular values are smoothed
SVD gives a full information about the nullspace and range of the matrix
One of the most famous application is Latent semantic indexing, see, for example,
Deerwester, Scott C., et al. "Indexing by latent semantic analysis." (1990)
The problem setup: we have a set of text documents $D_1, \ldots, D_N.$ We want to solve the search problem: i.e., we have a query as a set of words, and we want to find the best documents.
Our data is processed as follows: for each document we create a list of words contained in the document, and count the frequencies of each word. This is called the bag of words model (i.e., the document is unordered set of words).
This is how the term-document matrix $A$ is obtained. Its row size is the size of the dictionary.
Its column size is the number of documents. An element $A_{ij}$ the the frequency of occurence of the $i$-th word in the $j$-document.
To do the search, we can just multiply a term-document matrix by a search vector, i.e. a list of words.
The problem: The document will be returned only if there is an exact word match. However, we might search for "Samuel Clemens", and hope to get the results for "Mark Twain" as well. But there might be no exact match!
How the SVD can help?
Compute low-rank approximation $A_r$ of the term-document matrix $A$.
$$A \approx A_r,$$
and we do not care about the approximation error (i.e., we do not require it to be small).
The matrix $A_r$ can be then used to do queries.
We project the documents to low-dimensional subspace, given a query $q$ the projection is
$$q_r = \Sigma^{-1}_r U^{\top}_r q$$
Now we can compute the similarity between $d_r$ and other projected documents
$$\widehat{d}_i = \Sigma^{-1}_r U^{\top}_r d_i,$$
and compute the cosine of the angles between the query and the projected document.
In [3]:
import numpy as np
import pandas as pd
import re #Regular expressions
rows = ['human', 'interface', 'computer', 'user', 'system', 'response', 'time', 'EPS', 'survey', 'trees', 'graph', 'minors']
nterms = len(rows)
docs = []
docs += ['Human machine interface for Lab ABC computer applications']
docs += ['A survey of user opinions of computer system response time']
docs += ['The EPS user interfaces management system']
docs += ['System and human system engineering testing of EPS']
docs += ['Relation of user-perceived response time on user management']
docs += ['The generation of random, binary, unordered trees']
docs += ['The intersection graph of paths in trees']
docs += ['Graph minors IV: Width of trees and well-quasi-ordering']
docs += ['Graph minors: A survey']
ndocs = len(docs)
term_doc = np.zeros((nterms, ndocs))
nonletters = re.compile('[^a-zA-Z]+') # split by anything non-alphanumeric
docs = [set(nonletters.split(i.lower())) for i in docs]
rows = [i.lower() for i in rows]
for i in xrange(nterms):
for j in xrange(ndocs):
if rows[i] in docs[j]:
term_doc[i, j] = 1
#Use pandas to plot
pd.DataFrame(data=term_doc,index=rows)
Out[3]:
Now we can compare the results between ordinary matvec and low-rank matvec.
In [5]:
query = 'Human computer interaction'
qv = np.zeros((nterms))
for i in xrange(nterms):
if re.search(rows[i], query, re.IGNORECASE):
qv[i] = 1
res1 = qv.dot(term_doc) #Non-compressed search result
u, s, v = np.linalg.svd(term_doc)
r = 5
u = u[:, :r]
s = s[:r]
v = v[:r, :] #Numpy transposes
appr1 = u.dot(np.diag(s)).dot(v)
res2 = qv.dot(appr1)
res_all = np.vstack((res1, res2)).T #To make two columns
print 'There query is:', query, ',the scores are:'
pd.DataFrame(res_all, index=docs, columns=['No SVD', 'SVD'])
Out[5]:
Another important (and similar) application comes from recommender systems.
Suppose you have a user-product matrix: each user puts a rating for a particular product.
The matrix is then the number of users times the number of products. The goal is to recommend additional products to be bought for a particular user.
The scheme is the same: we compute the SVD, and the recommendation for each user is just a column of the approximated matrix.
In [6]:
data_read = np.loadtxt('task2_transact_eval.txt',dtype=np.int32, skiprows=1,delimiter='|')
In [8]:
#Read the data
columns = ['SessionNo', 'ItemNo', 'TransType']
ds = pd.DataFrame(data=data_read,columns=columns)
ds
Out[8]:
In [10]:
ds_buy = ds[ds.TransType==2][['SessionNo', 'ItemNo']]
arr = np.array(ds_buy)
users, inv_users = np.unique(arr[:, 0], return_inverse=True)
products, inv_products = np.unique(arr[:, 1], return_inverse=True)
print 'Unique sessions:', len(users), 'Unique products', len(products)
#Scipy spars
In [17]:
import scipy.sparse
import scipy.sparse.linalg
%matplotlib inline
import matplotlib.pyplot as plt
plt.xkcd()
nnz = len(inv_users)
spmat = scipy.sparse.coo_matrix((np.ones(nnz), (inv_users, inv_products)))
r = 2#Approximation rank
u, s, v = scipy.sparse.linalg.svds(spmat, r)
n = spmat.shape[0]
m = spmat.shape[1]
q = np.zeros(n)
user_id = 5
q[user_id] = 1.0
qrec = q.T.dot(u).dot(np.diag(s)).dot(v)
qrec = qrec / np.max(abs(qrec))
qb = spmat.T.dot(q)
plt.plot(qrec)
plt.plot(qb)
plt.xlabel('Product ID')
plt.ylabel('Recommendation')
Out[17]:
In [22]:
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
plt.xkcd()
n = 256
a = [[1.0/(np.sqrt(i + 3 * j + 0.5)) for i in xrange(n)] for j in xrange(n)]
a = np.array(a)
u, s, v = np.linalg.svd(a)
plt.semilogy(s/s[0])
plt.title('Singular values decay for a Hilbert matrix')
Out[22]:
We can try to test on a more realistic matrix, since solving linear systems with Hilbert matrix has little practical sense. Instead, we solve a linear system with a matrix
$$A_{ij} = \frac{1}{i - j + \frac{1}{2}},$$
which corresponds to an integral equation
$$
\int \frac{q(y)dy}{x - y } = f(x).
$$
In real life, the equation
$$
\int_{\Omega} \frac{q(y)dy}{\Vert x - y\Vert } = f(x),
$$
is solved, where $\Omega$ is a surface in 3D. This is used, for example, in modelling integral circuits.
Let us see what happens with the singular values.
In [29]:
import numpy as np
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
plt.xkcd()
n = 256
a = [[1.0/(i - j + 0.5) for i in xrange(n)] for j in xrange(n)]
a = np.array(a)
u, s, v = np.linalg.svd(a)
plt.plot(s)
plt.title('Singular values decay for a Cauchy-Hilbert matrix')
plt.tight_layout()
What to do?
The idea is to break the matrix intro blocks
$$
A = \begin{bmatrix}
A_{11} & A_{12} \\
A_{21} & A_{22}
\end{bmatrix}
$$
and the blocks $A_{12}$ and $A_{21}$ will be of low-rank! Let us try that..
In [30]:
import numpy as np
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
plt.xkcd()
n = 256
a = [[1.0/(i - j + 0.5) for i in xrange(n)] for j in xrange(n)]
a = np.array(a)
a12 = a[:n/2,n/2:]
a21 = a[n/2:,:n/2]
s12 = np.linalg.svd(a12)[1]
s21 = np.linalg.svd(a21)[1]
plt.semilogy(s12)
plt.semilogy(s21)#Actually, they are the same
Out[30]:
The whole idea is the applied recursively, leading to $\mathcal{O}(N \log N r)$ storage.
A good question is how to compute the approximation without computing all the entries -- here the skeleton decomposition comes into play: to get a rank-$r$ approximation, we can compute $r$ columns and $r$ rows.
You will have this task in your HW. The problem setup is similar: you have a database of faces,
which you write down as a face-pixel matrix (you can also write this down as a 3D array as well). The goal is given a new face, find a best match to the one in the database. The scheme is the same: you compute the projection onto a low-dimensional subspace using the SVD, and then compare everything in this low-dimensional subspace. The example "singular vectors" look similar to the following.
There are several ways how to generalize SVD to multidimensional arrays (tensors). In data mining, this is very natural. For example, in real recommender system, the data are organized as
user-product-action
where action is either "watch" or "put into the bucket" or "buy".
It is natural to organize such data into a 3D tensor of size
number of users $\times$ number of products $\times$ number of actions.
This is what High Order SVD (HOSVD) is used.
The idea is to decompose a tensor in the form
$$
A_{ijk} \approx \sum_{\alpha \beta \gamma} U_{i \alpha} V_{j \beta} W_{k \gamma} g_{\alpha \beta \gamma}.
$$
This can be done by considering three unfoldings of a tensor and computing its left singular matrices.
In [137]:
from IPython.core.display import HTML
def css_styling():
styles = open("./styles/custom.css", "r").read()
return HTML(styles)
css_styling()
Out[137]: