Lecture 7: Singular value decomposition

Syllabus

Week 1: Python intro
Week 2: Matrices, vectors, norms, ranks
Week 3: Linear systems, eigenvectors, eigenvalues
Week 4: Singular value decomposition + test + homework seminar

Recap of the previous lecture

LU decomposition = Solving linear systems
QR decomposition = Solving linear systems (can be more accurate in the ill-conditioned case, slower than LU, used in the orthogonalization)

Today lecture

Today we will talk about the singular value decomposition (SVD)

SVD and best low-rank approximation
Applications

Singular value decomposition: definition

Any $n \times m$ matrix $A$ can be decomposed into a product
$$ A = U \Sigma V^*, $$ where $U$ is an $n \times k$ unitary matrix, $V$ is an $m \times k$ unitary matrix, and $\Sigma = \mathrm{diag}(\sigma_1, \ldots, \sigma_k)$ a diagonal matrix with $\sigma_1 \geq \ldots \sigma_k \geq 0$ and $k = \min(m, n)$.

SVD and low-rank approximation

One of the main features of the SVD is to solve the minimization problem
$$ \Vert A - A_r \Vert \rightarrow \min, $$ i.e. find a closest rank-r matrix to a given matrix.
The solution is given as
$$ A_r = U_r \Sigma_r V^*_r, $$ where $U_r$ are the first $r$ columns of the matrix $U$, $V_r$ are the first $r$ columns of the matrix $V$ and $\Sigma_r$ are the first $r$ singular values

Existence of the SVD

Theorem. Any matrix has an SVD decomposition.

Proof. Recall the definition of $\sigma = \Vert A \Vert_2$ as an operator norm. Thus, there exist vectors $x$, $y$ of unit length such that $Ay = \sigma x$. Now introduce two unitary matrices
$$U = [x, U_1], \quad V = [y, V_1].$$ Then, $$ U^* A V = \begin{bmatrix} \sigma & w^* \\ 0 & B \end{bmatrix}, $$ and $$ \Vert U^* A V \begin{bmatrix} \sigma \\ w \end{bmatrix}\Vert^2_2 \geq (\sigma^2 + w^* w)^2 \longrightarrow \Vert U^* A V \Vert^2_2 \geq \sigma^2 + w^* w. $$ Since $\Vert U^* A V \Vert = \sigma$, we get $w = 0$.

SVD and eigenvalues

From $A = U \Sigma V^*$ we get $$ A A^* = U \Sigma^2 U^*, \quad A^* A = V \Sigma^2 V^*, $$ i.e. you can compute the singular values as a square roots of the eigenvalues of $A^*A$ (or $A A^*$). It is not a good idea, since the maximal precision you might get this way is $\sqrt{\varepsilon}$. You just loose half of the digits!

How the SVD is actually computed

All SVD computations are done in two steps.

Reduction of a given matrix to a bidiagonal form, $B = U^* A V$ by means of Householder/Givens transformations $\mathcal{O}(n^3)$
Solving the SVD problem for a bidiagonal matrix.

The Golub-Kahan bidiagonalization algorithm was the first (and still one of the most popular) algorithms for solving the SVD problem. It is basically a QR algorithm, implicitly applied to a matrix $B^* B$.

Some other SVD properties

SVD can be used to compute pseudoinverses of a matrix, trying to solve the ill-conditioned / rectangular systems, $$A^{\dagger} = V^* \Sigma^{\dagger} U,$$ where $\Sigma^{\dagger}_i = \min(\frac{1}{\sigma_i}, \frac{1}{\delta})$, i.e. small singular values are smoothed
SVD gives a full information about the nullspace and range of the matrix

Application 1: Latent semantic analysis

One of the most famous application is Latent semantic indexing, see, for example,
Deerwester, Scott C., et al. "Indexing by latent semantic analysis." (1990)

The problem setup: we have a set of text documents $D_1, \ldots, D_N.$ We want to solve the search problem: i.e., we have a query as a set of words, and we want to find the best documents.
Our data is processed as follows: for each document we create a list of words contained in the document, and count the frequencies of each word. This is called the bag of words model (i.e., the document is unordered set of words).

Term-document matrix

This is how the term-document matrix $A$ is obtained. Its row size is the size of the dictionary.
Its column size is the number of documents. An element $A_{ij}$ the the frequency of occurence of the $i$-th word in the $j$-document.

To do the search, we can just multiply a term-document matrix by a search vector, i.e. a list of words.

The problem: The document will be returned only if there is an exact word match. However, we might search for "Samuel Clemens", and hope to get the results for "Mark Twain" as well. But there might be no exact match!
How the SVD can help?

Idea of LSI

Compute low-rank approximation $A_r$ of the term-document matrix $A$.
$$A \approx A_r,$$ and we do not care about the approximation error (i.e., we do not require it to be small).
The matrix $A_r$ can be then used to do queries.

We project the documents to low-dimensional subspace, given a query $q$ the projection is
$$q_r = \Sigma^{-1}_r U^{\top}_r q$$ Now we can compute the similarity between $d_r$ and other projected documents
$$\widehat{d}_i = \Sigma^{-1}_r U^{\top}_r d_i,$$ and compute the cosine of the angles between the query and the projected document.

Demo

Now we can test a demo database



In [3]:

    
import numpy as np
import pandas as pd
import re #Regular expressions
rows = ['human', 'interface', 'computer', 'user', 'system', 'response', 'time', 'EPS', 'survey', 'trees', 'graph', 'minors']
nterms = len(rows)
docs = []
docs += ['Human machine interface for Lab ABC computer applications']
docs += ['A survey of user opinions of computer system response time']
docs += ['The EPS user interfaces management system']
docs += ['System and human system engineering testing of EPS']
docs += ['Relation of user-perceived response time on user management']
docs += ['The generation of random, binary, unordered trees']
docs += ['The intersection graph of paths in trees']
docs += ['Graph minors IV: Width of trees and well-quasi-ordering']
docs += ['Graph minors: A survey']
ndocs = len(docs)
term_doc = np.zeros((nterms, ndocs))

nonletters = re.compile('[^a-zA-Z]+') # split by anything non-alphanumeric
docs = [set(nonletters.split(i.lower())) for i in docs]
rows = [i.lower() for i in rows]
for i in xrange(nterms):
    for j in xrange(ndocs):
        if rows[i] in docs[j]:
            term_doc[i, j] = 1

#Use pandas to plot        
pd.DataFrame(data=term_doc,index=rows)

Now we can compare the results between ordinary matvec and low-rank matvec.



In [5]:

    
query = 'Human computer interaction'
qv = np.zeros((nterms))
for i in xrange(nterms):
    if re.search(rows[i], query, re.IGNORECASE):
        qv[i] = 1
res1 = qv.dot(term_doc) #Non-compressed search result

u, s, v = np.linalg.svd(term_doc)
r = 5
u = u[:, :r]
s = s[:r]
v = v[:r, :] #Numpy transposes
appr1  = u.dot(np.diag(s)).dot(v)
res2 = qv.dot(appr1)
res_all = np.vstack((res1, res2)).T #To make two columns
print 'There query is:', query, ',the scores are:'
pd.DataFrame(res_all, index=docs, columns=['No SVD', 'SVD'])









    



There query is: Human computer interaction ,the scores are:






    Out[5]:






  
    
      
      No SVD
      SVD
    
  
  
    
      Human machine interface for Lab ABC computer applications
       2
       2.087969
    
    
      A survey of user opinions of computer system response time
       1
       0.927427
    
    
      The EPS user interfaces management system
       0
       0.329092
    
    
      System and human system engineering testing of EPS
       1
       0.605656
    
    
      Relation of user-perceived response time on user management
       0
      -0.043570
    
    
      The generation of random, binary, unordered trees
       0
      -0.008597
    
    
      The intersection graph of paths in trees
       0
      -0.027086
    
    
      Graph minors IV: Width of trees and well-quasi-ordering
       0
      -0.044239
    
    
      Graph minors: A survey
       0
       0.097481

Application 2: Collaborative filtering

Another important (and similar) application comes from recommender systems.
Suppose you have a user-product matrix: each user puts a rating for a particular product.
The matrix is then the number of users times the number of products. The goal is to recommend additional products to be bought for a particular user.
The scheme is the same: we compute the SVD, and the recommendation for each user is just a column of the approximated matrix.



In [6]:

    
data_read = np.loadtxt('task2_transact_eval.txt',dtype=np.int32, skiprows=1,delimiter='|')



In [8]:

    
#Read the data
columns = ['SessionNo', 'ItemNo', 'TransType']
ds = pd.DataFrame(data=data_read,columns=columns)
ds









    Out[8]:






  
    
      
      SessionNo
      ItemNo
      TransType
    
  
  
    
      0     
            0
       15029
       0
    
    
      1     
            0
       12103
       0
    
    
      2     
            0
       11755
       0
    
    
      3     
            0
        2246
       0
    
    
      4     
            1
        6498
       0
    
    
      5     
            1
        6498
       1
    
    
      6     
            1
        6498
       0
    
    
      7     
            1
        3726
       0
    
    
      8     
            1
        3726
       1
    
    
      9     
            1
        6498
       0
    
    
      10    
            1
        6720
       0
    
    
      11    
            1
       11396
       0
    
    
      12    
            1
       11396
       1
    
    
      13    
            1
        6498
       2
    
    
      14    
            1
        3726
       2
    
    
      15    
            1
       11396
       2
    
    
      16    
            3
        5335
       0
    
    
      17    
            3
         724
       0
    
    
      18    
            3
        9436
       0
    
    
      19    
            4
       10003
       0
    
    
      20    
            4
        5166
       0
    
    
      21    
            4
        1962
       0
    
    
      22    
            4
       15235
       0
    
    
      23    
            4
        1608
       0
    
    
      24    
            4
        5214
       0
    
    
      25    
            4
       11358
       0
    
    
      26    
            4
        7566
       0
    
    
      27    
            4
        2825
       0
    
    
      28    
            4
        5214
       0
    
    
      29    
            4
        1962
       0
    
    
      ...
      ...
      ...
      ...
    
    
      985115
       145683
        3901
       2
    
    
      985116
       145683
         329
       2
    
    
      985117
       145685
       13867
       0
    
    
      985118
       145685
       13805
       0
    
    
      985119
       145685
       11800
       0
    
    
      985120
       145685
       14161
       0
    
    
      985121
       145685
        9266
       0
    
    
      985122
       145685
        4710
       0
    
    
      985123
       145685
        4710
       1
    
    
      985124
       145685
        5208
       0
    
    
      985125
       145685
        1179
       0
    
    
      985126
       145685
        7555
       0
    
    
      985127
       145685
        7555
       1
    
    
      985128
       145685
       14690
       0
    
    
      985129
       145685
       14690
       1
    
    
      985130
       145687
        9378
       0
    
    
      985131
       145687
        1027
       0
    
    
      985132
       145687
        3671
       0
    
    
      985133
       145687
        9378
       0
    
    
      985134
       145688
        8197
       0
    
    
      985135
       145688
        4079
       0
    
    
      985136
       145688
        4842
       0
    
    
      985137
       145688
       14759
       0
    
    
      985138
       145688
       15259
       0
    
    
      985139
       145688
       13197
       0
    
    
      985140
       145689
       14168
       0
    
    
      985141
       145689
        5373
       0
    
    
      985142
       145690
        3781
       0
    
    
      985143
       145690
        5025
       0
    
    
      985144
       145690
        5025
       1
    
  

985145 rows × 3 columns



In [10]:

    
ds_buy = ds[ds.TransType==2][['SessionNo', 'ItemNo']]
arr = np.array(ds_buy)
users, inv_users = np.unique(arr[:, 0], return_inverse=True)
products, inv_products = np.unique(arr[:, 1], return_inverse=True)
print 'Unique sessions:', len(users), 'Unique products', len(products)
#Scipy spars









    



Unique sessions: 14164 Unique products 9679



In [17]:

    
import scipy.sparse
import scipy.sparse.linalg
%matplotlib inline 
import matplotlib.pyplot as plt
plt.xkcd()
nnz = len(inv_users)
spmat = scipy.sparse.coo_matrix((np.ones(nnz), (inv_users, inv_products)))
r =  2#Approximation rank
u, s, v = scipy.sparse.linalg.svds(spmat, r)
n = spmat.shape[0]
m = spmat.shape[1]
q = np.zeros(n)
user_id = 5
q[user_id] = 1.0

qrec = q.T.dot(u).dot(np.diag(s)).dot(v)
qrec = qrec / np.max(abs(qrec))
qb = spmat.T.dot(q)
plt.plot(qrec)
plt.plot(qb)
plt.xlabel('Product ID')
plt.ylabel('Recommendation')









    Out[17]:





<matplotlib.text.Text at 0x1143df050>

Application 3: Dense matrix compression

Dense matrices typically require $N^2$ elements to be stored. For $N \sim 10^4 - 10^5$ the memory requirements. A low rank approximation can reduces this number of $\mathcal{O}(Nr)$



In [22]:

    
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
plt.xkcd()
n = 256
a = [[1.0/(np.sqrt(i + 3 * j + 0.5)) for i in xrange(n)] for j in xrange(n)]
a = np.array(a)
u, s, v = np.linalg.svd(a)
plt.semilogy(s/s[0])
plt.title('Singular values decay for a Hilbert matrix')









    Out[22]:





<matplotlib.text.Text at 0x1157e4bd0>

A more realistic example

We can try to test on a more realistic matrix, since solving linear systems with Hilbert matrix has little practical sense. Instead, we solve a linear system with a matrix $$A_{ij} = \frac{1}{i - j + \frac{1}{2}},$$ which corresponds to an integral equation $$ \int \frac{q(y)dy}{x - y } = f(x). $$ In real life, the equation
$$ \int_{\Omega} \frac{q(y)dy}{\Vert x - y\Vert } = f(x), $$ is solved, where $\Omega$ is a surface in 3D. This is used, for example, in modelling integral circuits. Let us see what happens with the singular values.



In [29]:

    
import numpy as np
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
plt.xkcd()
n = 256
a = [[1.0/(i - j + 0.5) for i in xrange(n)] for j in xrange(n)]
a = np.array(a)
u, s, v = np.linalg.svd(a)
plt.plot(s)
plt.title('Singular values decay for a Cauchy-Hilbert matrix')
plt.tight_layout()

What to do?
The idea is to break the matrix intro blocks
$$ A = \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix} $$ and the blocks $A_{12}$ and $A_{21}$ will be of low-rank! Let us try that..



In [30]:

    
import numpy as np
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
plt.xkcd()
n = 256
a = [[1.0/(i - j + 0.5) for i in xrange(n)] for j in xrange(n)]
a = np.array(a)

a12 = a[:n/2,n/2:]
a21 = a[n/2:,:n/2]
s12 = np.linalg.svd(a12)[1]
s21 = np.linalg.svd(a21)[1]
plt.semilogy(s12)
plt.semilogy(s21)#Actually, they are the same









    Out[30]:





[<matplotlib.lines.Line2D at 0x115c44850>]

The whole idea is the applied recursively, leading to $\mathcal{O}(N \log N r)$ storage.
A good question is how to compute the approximation without computing all the entries -- here the skeleton decomposition comes into play: to get a rank-$r$ approximation, we can compute $r$ columns and $r$ rows.

Application 4: Eigenfaces

You will have this task in your HW. The problem setup is similar: you have a database of faces, which you write down as a face-pixel matrix (you can also write this down as a 3D array as well). The goal is given a new face, find a best match to the one in the database. The scheme is the same: you compute the projection onto a low-dimensional subspace using the SVD, and then compare everything in this low-dimensional subspace. The example "singular vectors" look similar to the following.

Higher dimensions

There are several ways how to generalize SVD to multidimensional arrays (tensors). In data mining, this is very natural. For example, in real recommender system, the data are organized as

user-product-action

where action is either "watch" or "put into the bucket" or "buy".
It is natural to organize such data into a 3D tensor of size

number of users $\times$ number of products $\times$ number of actions.

This is what High Order SVD (HOSVD) is used.
The idea is to decompose a tensor in the form
$$ A_{ijk} \approx \sum_{\alpha \beta \gamma} U_{i \alpha} V_{j \beta} W_{k \gamma} g_{\alpha \beta \gamma}. $$ This can be done by considering three unfoldings of a tensor and computing its left singular matrices.

Questions?



In [137]:

    
from IPython.core.display import HTML
def css_styling():
    styles = open("./styles/custom.css", "r").read()
    return HTML(styles)
css_styling()









    Out[137]:

	0	1	2	3	4	5	6	7	8
human	1	0	0	1	0	0	0	0	0
interface	1	0	0	0	0	0	0	0	0
computer	1	1	0	0	0	0	0	0	0
user	0	1	1	0	1	0	0	0	0
system	0	1	1	1	0	0	0	0	0
response	0	1	0	0	1	0	0	0	0
time	0	1	0	0	1	0	0	0	0
eps	0	0	1	1	0	0	0	0	0
survey	0	1	0	0	0	0	0	0	1
trees	0	0	0	0	0	1	1	1	0
graph	0	0	0	0	0	0	1	1	1
minors	0	0	0	0	0	0	0	1	1

	No SVD	SVD
Human machine interface for Lab ABC computer applications	2	2.087969
A survey of user opinions of computer system response time	1	0.927427
The EPS user interfaces management system	0	0.329092
System and human system engineering testing of EPS	1	0.605656
Relation of user-perceived response time on user management	0	-0.043570
The generation of random, binary, unordered trees	0	-0.008597
The intersection graph of paths in trees	0	-0.027086
Graph minors IV: Width of trees and well-quasi-ordering	0	-0.044239
Graph minors: A survey	0	0.097481

	SessionNo	ItemNo	TransType
0	0	15029	0
1	0	12103	0
2	0	11755	0
3	0	2246	0
4	1	6498	0
5	1	6498	1
6	1	6498	0
7	1	3726	0
8	1	3726	1
9	1	6498	0
10	1	6720	0
11	1	11396	0
12	1	11396	1
13	1	6498	2
14	1	3726	2
15	1	11396	2
16	3	5335	0
17	3	724	0
18	3	9436	0
19	4	10003	0
20	4	5166	0
21	4	1962	0
22	4	15235	0
23	4	1608	0
24	4	5214	0
25	4	11358	0
26	4	7566	0
27	4	2825	0
28	4	5214	0
29	4	1962	0
...	...	...	...
985115	145683	3901	2
985116	145683	329	2
985117	145685	13867	0
985118	145685	13805	0
985119	145685	11800	0
985120	145685	14161	0
985121	145685	9266	0
985122	145685	4710	0
985123	145685	4710	1
985124	145685	5208	0
985125	145685	1179	0
985126	145685	7555	0
985127	145685	7555	1
985128	145685	14690	0
985129	145685	14690	1
985130	145687	9378	0
985131	145687	1027	0
985132	145687	3671	0
985133	145687	9378	0
985134	145688	8197	0
985135	145688	4079	0
985136	145688	4842	0
985137	145688	14759	0
985138	145688	15259	0
985139	145688	13197	0
985140	145689	14168	0
985141	145689	5373	0
985142	145690	3781	0
985143	145690	5025	0
985144	145690	5025	1