this page in an interactive mode via Google Colaboratory.
In this notebook we provide an example of how to build a simple Tensor Net (see https://arxiv.org/abs/1509.06569).
The main ingredient is the so-called TT-Matrix, a generalization of the Kronecker product matrices, i.e. matrices of the form $$A = A_1 \otimes A_2 \cdots \otimes A_n$$
In t3f
TT-Matrices are represented using the TensorTrain
class.
In [1]:
# Import TF 2.
%tensorflow_version 2.x
import tensorflow as tf
import numpy as np
import tensorflow.keras.backend as K
# Fix seed so that the results are reproducable.
tf.random.set_seed(0)
np.random.seed(0)
try:
import t3f
except ImportError:
# Install T3F if it's not already installed.
!git clone https://github.com/Bihaqo/t3f.git
!cd t3f; pip install .
import t3f
In [3]:
W = t3f.random_matrix([[4, 7, 4, 7], [5, 5, 5, 5]], tt_rank=2)
print(W)
Using TT-Matrices we can compactly represent densely connected layers in neural networks, which allows us to greatly reduce number of parameters. Matrix multiplication can be handled by the t3f.matmul
method which allows for multiplying dense (ordinary) matrices and TT-Matrices. Very simple neural network could look as following (for initialization several options such as t3f.glorot_initializer
, t3f.he_initializer
or t3f.random_matrix
are available):
In [0]:
class Learner:
def __init__(self):
initializer = t3f.glorot_initializer([[4, 7, 4, 7], [5, 5, 5, 5]], tt_rank=2)
self.W1 = t3f.get_variable('W1', initializer=initializer)
self.W2 = tf.Variable(tf.random.normal([625, 10]))
self.b2 = tf.Variable(tf.random.normal([10]))
def predict(self, x):
b1 = tf.Variable(tf.zeros([625]))
h1 = t3f.matmul(x, W1) + b1
h1 = tf.nn.relu(h1)
return tf.matmul(h1, W2) + b2
def loss(self, x, y):
y_ = tf.one_hot(y, 10)
logits = self.predict(x)
return tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=logits))
For convenience we have implemented a layer analogous to Keras Dense
layer but with a TT-Matrix instead of an ordinary matrix. An example of fully trainable net is provided below.
In [0]:
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout, Flatten
from tensorflow.keras.utils import to_categorical
from tensorflow.keras import optimizers
In [9]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()
Some preprocessing...
In [0]:
x_train = x_train / 127.5 - 1.0
x_test = x_test / 127.5 - 1.0
y_train = to_categorical(y_train, num_classes=10)
y_test = to_categorical(y_test, num_classes=10)
In [0]:
model = Sequential()
model.add(Flatten(input_shape=(28, 28)))
tt_layer = t3f.nn.KerasDense(input_dims=[7, 4, 7, 4], output_dims=[5, 5, 5, 5],
tt_rank=4, activation='relu',
bias_initializer=1e-3)
model.add(tt_layer)
model.add(Dense(10))
model.add(Activation('softmax'))
In [68]:
model.summary()
Note that in the dense layer we only have $1725$ parameters instead of $784 * 625 = 490000$.
In [0]:
optimizer = optimizers.Adam(lr=1e-2)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
In [70]:
model.fit(x_train, y_train, epochs=3, batch_size=64, validation_data=(x_test, y_test))
Out[70]:
Let us now train an ordinary DNN (without TT-Matrices) and show how we can compress it using the TT decomposition. (In contrast to directly training a TT-layer from scratch in the example above.)
In [0]:
model = Sequential()
model.add(Flatten(input_shape=(28, 28)))
model.add(Dense(625, activation='relu'))
model.add(Dense(10))
model.add(Activation('softmax'))
In [72]:
model.summary()
In [0]:
optimizer = optimizers.Adam(lr=1e-3)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
In [74]:
model.fit(x_train, y_train, epochs=5, batch_size=64, validation_data=(x_test, y_test))
Out[74]:
Let us convert the matrix used in the Dense layer to the TT-Matrix with tt-ranks equal to 16 (since we trained the network without the low-rank structure assumption we may wish start with high rank values).
In [75]:
W = model.trainable_weights[0]
print(W)
Wtt = t3f.to_tt_matrix(W, shape=[[7, 4, 7, 4], [5, 5, 5, 5]], max_tt_rank=16)
print(Wtt)
We need to evaluate the tt-cores of Wtt. We also need to store other parameters for later (biases and the second dense layer).
In [0]:
cores = Wtt.tt_cores
other_params = model.get_weights()[1:]
Now we can construct a tensor network with the first Dense layer replaced by Wtt
initialized using the previously computed cores.
In [0]:
model = Sequential()
model.add(Flatten(input_shape=(28, 28)))
tt_layer = t3f.nn.KerasDense(input_dims=[7, 4, 7, 4], output_dims=[5, 5, 5, 5],
tt_rank=16, activation='relu')
model.add(tt_layer)
model.add(Dense(10))
model.add(Activation('softmax'))
In [0]:
optimizer = optimizers.Adam(lr=1e-3)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
In [0]:
model.set_weights(list(cores) + other_params)
In [97]:
print("new accuracy: ", model.evaluate(x_test, y_test)[1])
In [98]:
model.summary()
We see that even though we now have about 5% of the original number of parameters we still achieve a relatively high accuracy.
In [99]:
model.fit(x_train, y_train, epochs=2, batch_size=64, validation_data=(x_test, y_test))
Out[99]:
We see that we were able to achieve higher validation accuracy than we had in the plain DNN, while keeping the number of parameters extremely small (21845 vs 496885 parameters in the uncompressed model).
In [0]: