This code performs tensor decomposition given the third order moment M3 in the following form:
$\text{Triplets}_{\alpha_0} := \mathbb{E}[\bf{x}_1 \otimes \bf{x}_2 \otimes \bf{x}_3] \\$ $-\frac{\alpha_0}{\alpha_0+2}\big( \mathbb{E}[\bf{x}_1 \otimes \bf{x}_2 \otimes \mu] + \mathbb{E}[\bf{x}_1 \otimes \mu \otimes \bf{x}_3] + \mathbb{E}[\mu \otimes \bf{x}_2 \otimes \bf{x}_3] \big) \\$ $+\frac{2\alpha_0^2}{(\alpha_0+2)(\alpha_0+1)} \mu \otimes \mu \otimes \mu \in \mathbb{R}^{d \times d \times d}$
where:
$\mu := \mathbb{E}[\bf{x}_1] \in \mathbb{R}^d$
The M3 tensor $\in \mathbb{R}^{k \times k \times k}$ can be represented as a matrix m3 $\in \mathbb{R}^{k \times k^2}$. m3 can be written as a $k \times k$ block matrix in which each $k \times k$ block represents one slice of M3 with reserved order. The code assumes that M3 is given in m3 format and is written in the sparse matrix representation in a text file.
Respresenting M3 as m3 is called tensor matricization. for a more detailed description of how this works take a look at section 2.4 of "Tensor Decomposition and Applications" by Tamara G. Kolda and Brett W. Bader. We are considering the mode-1 matricization in our code.
Note that this code might be slow for large scale M3 since the tensor is directly being input.
This document will walk you through the compilation steps and teach you how to define the inputs. We will then generate a set of synthetic data and pass it on to the code and see the output
First clone the following repository and pull the contents:
In [ ]:
git clone https://megaDataLab@bitbucket.org/megaDataLab/tensormethodsforml.git
The source files are contained in the TopicModelingSingleNodeALS/TopicModel/M3decomp folder.
We will show you how to compile the source code:
In [ ]:
cd (PATH_TO_topicModling)/TopicModelingSingleNodeALS/TopicModel/TopicModel
Then you have to compile the files using the following command which will give the following output. Note that while making the executable you should have a folder named dependency that contains the Eigen library in the following path: "../../dependency/Eigen"
In [ ]:
make exe-tensor
This will creat an executable file named "exe-M3decomp" in your current directory which is the directory of the source code. Now you need to specify the input arguments to the executable file which will give you the desired outputs. There are a total number of 8 input arguments to the executable listed below in order. (The last four arguments - Arguments 12 through 15 - are arbitrary, leave blank in case you do not wish to give those inputs).
InputArgument 1: (NA) is the vocabulary size
InputArgument 2: (KHID) is the number of topics you want to learn
InputArgument 3: (DATATYPE) denotes the index convention. Set it to 1 if the data is compatible with MATLAB (array indexing starts from 1) and set it to 0 if the data is compatible with C++ (array indexing starts from 0)
InputArgument 4: (M3_filename) filename/filepath to the third order moment
InputArgument 5: (outputEigValPATH) denotes the filename/filepath for the eigenvalues
InputArgument 6: (outputEigVectPATH) denotes the filename/filepath for eigenvectors
You can now run the executable by typing the following in the command line:
In [ ]:
make runtensor
You should see the following printed on the screen:
The code then performs batch Alternating Least Squares to perform tensor decomposition for our synthetic example. This is done by function tensorDecom_batchALS whose syntax is given below.
In [ ]:
void tensorDecom_batchALS(MatrixXd T, VectorXd &lambda, MatrixXd &A_new)
Let M3 have the following factorization. The goal of ALS is to perform this tensor decomposition and return $\bf{A}$ and $\Lambda$ which is a vector whose entries are the $\lambda_i$'s.
$M3 = \sum_{i \in [k]} \lambda_i \bf{A}_i \otimes \bf{A}_i \otimes \bf{A}_i$
The ALS algorithm is a iterative procedure, where in each iteration it updates $\bf{A}$ using the previous value of $\bf{A}$. The update equations are given below (t indicates the iteration index). Note that $\bf{y}_i$ indicates the whitened version of $\bf{x}_i$.
$\bf{A}^{t+1} \longleftarrow \bf{A}^t + \beta \frac{1}{\vert \bf{X} \vert} \bf{y}_1 \bigg(\bf{y_3} ^\top \bf{A}^t * \bf{y}_2 ^\top \bf{A}^{t} \bigg) \Bigg( \bigg( (\bf{A}^t)^\top \bf{A}^t * (\bf{A}^t)^\top \bf{A}^t \bigg)^\top \Bigg) ^\dagger$
$\Lambda^{t+1} \longleftarrow \| \bf{A}^{t+1} \| \| \bf{A}^{t+1} \| \| \bf{A}^{t+1} \|$
$\bf{A}$ and $\Lambda = [\lambda_i] \in \mathbb{R}^k$ will be written in the specified output path and are the desired components of the input tensor.
Let us give an example of a simple matrix decomposition result for the M3 matrix in the results folder called Tsparse.txt. cd into the M3decomp directory and make the executable. Then run the programming by typing the following in the command line:
./exe-tensor 3 3 1 ../datasets/synthetic/Tensor.txt ../datasets/synthetic/result/tensor_eigenvalue.txt ../datasets/synthetic/result/tensor_eigenvector.txt (1) Reading data---------------- reading third order moment input_tensor: 1 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 3 Exec Time reading matrices before preproc = 1.9400000000e-04 (Seconds) (2) Tensor decomposition---------------- Running ALS 0 replace current eigenvalue and eigenvectors with this run Running ALS 1 Running ALS 2 FINAL ERROR (0-th run) : 0 eigenvalues: 3 2 1 eigenvectors: 0 0 1 0 1 0 1 0 0 (3) Writing results---------- (4) Program over------------
___ $_{Document}$ $_{prepared}$ $_{by}$ $_{Forough}$ $_{Arabshahi}$ $_{and}$ $_{edited}$ $_{by}$ $_{Animashree}$ $_{Anandkumar}$ - $_{Fall}$ $_{2014}$ __