Matrix Factorization

In a recommendation system, there is a group of users and a set of items. Given that each users have rated some items in the system, we would like to predict how the users would rate the items that they have not yet rated, such that we can make recommendations to the users.

Matrix factorization is one of the mainly used algorithm in recommendation systems. It can be used to discover latent features underlying the interactions between two different kinds of entities.

Assume we assign a k-dimensional vector to each user and a k-dimensional vector to each item such that the dot product of these two vectors gives the user's rating of that item. We can learn the user and item vectors directly, which is essentially performing SVD on the user-item matrix. We can also try to learn the latent features using multi-layer neural networks.

In this tutorial, we will work though the steps to implement these ideas in MXNet.

Prepare Data

We use the MovieLens data here, but it can apply to other datasets as well. Each row of this dataset contains a tuple of user id, movie id, rating, and time stamp, we will only use the first three items. We first define the a batch which contains n tuples. It also provides name and shape information to MXNet about the data and label.


In [1]:
class Batch(object):
    def __init__(self, data_names, data, label_names, label):
        self.data = data
        self.label = label
        self.data_names = data_names
        self.label_names = label_names
        
    @property
    def provide_data(self):
        return [(n, x.shape) for n, x in zip(self.data_names, self.data)]
    
    @property
    def provide_label(self):
        return [(n, x.shape) for n, x in zip(self.label_names, self.label)]

Then we define a data iterator, which returns a batch of tuples each time.


In [2]:
import mxnet as mx
import random

class Batch(object):
    def __init__(self, data_names, data, label_names, label):
        self.data = data
        self.label = label
        self.data_names = data_names
        self.label_names = label_names

    @property
    def provide_data(self):
        return [(n, x.shape) for n, x in zip(self.data_names, self.data)]

    @property
    def provide_label(self):
        return [(n, x.shape) for n, x in zip(self.label_names, self.label)]

class DataIter(mx.io.DataIter):
    def __init__(self, fname, batch_size):
        super(DataIter, self).__init__()
        self.batch_size = batch_size
        self.data = []
        for line in file(fname):
            tks = line.strip().split('\t')
            if len(tks) != 4:
                continue
            self.data.append((int(tks[0]), int(tks[1]), float(tks[2])))
        self.provide_data = [('user', (batch_size, )), ('item', (batch_size, ))]
        self.provide_label = [('score', (self.batch_size, ))]

    def __iter__(self):
        for k in range(len(self.data) / self.batch_size):
            users = []
            items = []
            scores = []
            for i in range(self.batch_size):
                j = k * self.batch_size + i
                user, item, score = self.data[j]
                users.append(user)
                items.append(item)
                scores.append(score)

            data_all = [mx.nd.array(users), mx.nd.array(items)]
            label_all = [mx.nd.array(scores)]
            data_names = ['user', 'item']
            label_names = ['score']

            data_batch = Batch(data_names, data_all, label_names, label_all)
            yield data_batch

    def reset(self):
        random.shuffle(self.data)

Now we download the data and provide a function to obtain the data iterator:


In [3]:
import os
import urllib
import zipfile
if not os.path.exists('ml-100k.zip'):
    urllib.urlretrieve('http://files.grouplens.org/datasets/movielens/ml-100k.zip', 'ml-100k.zip')
with zipfile.ZipFile("ml-100k.zip","r") as f:
    f.extractall("./")
def get_data(batch_size):
    return (DataIter('./ml-100k/u1.base', batch_size), DataIter('./ml-100k/u1.test', batch_size))

Finally we calculate the numbers of users and items for later use.


In [4]:
def max_id(fname):
    mu = 0
    mi = 0
    for line in file(fname):
        tks = line.strip().split('\t')
        if len(tks) != 4:
            continue
        mu = max(mu, int(tks[0]))
        mi = max(mi, int(tks[1]))
    return mu + 1, mi + 1
max_user, max_item = max_id('./ml-100k/u.data')
(max_user, max_item)


Out[4]:
(944, 1683)

Optimization

We first implement the RMSE (root-mean-square error) measurement, which is commonly used by matrix factorization.


In [5]:
import math
def RMSE(label, pred):
    ret = 0.0
    n = 0.0
    pred = pred.flatten()
    for i in range(len(label)):
        ret += (label[i] - pred[i]) * (label[i] - pred[i])
        n += 1.0
    return math.sqrt(ret / n)

Then we define a general training module, which is borrowed from the image classification application.


In [6]:
def train(network, batch_size, num_epoch, learning_rate):
    batch_size = batch_size
    train, test = get_data(batch_size)
    model = mx.mod.Module(symbol = network, 
                          data_names=[x[0] for x in train.provide_data],
                          label_names=[y[0] for y in train.provide_label],
                          context=[mx.gpu(0)])

    import logging
    head = '%(asctime)-15s %(message)s'
    logging.basicConfig(level=logging.DEBUG)

    model.fit(train_data = train, 
              eval_data = test,
              num_epoch=num_epoch,
              optimizer='sgd',
              optimizer_params={'learning_rate':learning_rate, 'momentum':0.9, 'wd':0.0001},
              eval_metric = RMSE,
              batch_end_callback=mx.callback.Speedometer(batch_size, 20000/batch_size),)

Networks

Now we try various networks. We first learn the latent vectors directly.


In [7]:
# @@@ AUTOTEST_OUTPUT_IGNORED_CELL
def plain_net(k):
    # input
    user = mx.symbol.Variable('user')
    item = mx.symbol.Variable('item')
    score = mx.symbol.Variable('score')
    # user feature lookup
    user = mx.symbol.Embedding(data = user, input_dim = max_user, output_dim = k) 
    # item feature lookup
    item = mx.symbol.Embedding(data = item, input_dim = max_item, output_dim = k)
    # predict by the inner product, which is elementwise product and then sum
    pred = user * item
    pred = mx.symbol.sum_axis(data = pred, axis = 1)
    pred = mx.symbol.Flatten(data = pred)
    # loss layer
    pred = mx.symbol.LinearRegressionOutput(data = pred, label = score)
    return pred

train(plain_net(64), batch_size=64, num_epoch=10, learning_rate=.05)


INFO:root:Start training with [gpu(0)]
INFO:root:Epoch[0] Batch [312]	Speed: 41428.85 samples/sec	Train-RMSE=3.699369
INFO:root:Epoch[0] Batch [624]	Speed: 42144.73 samples/sec	Train-RMSE=3.702055
INFO:root:Epoch[0] Batch [936]	Speed: 42175.25 samples/sec	Train-RMSE=3.694050
INFO:root:Epoch[0] Batch [1248]	Speed: 42307.85 samples/sec	Train-RMSE=3.698962
INFO:root:Epoch[0] Resetting Data Iterator
INFO:root:Epoch[0] Time cost=2.174
INFO:root:Epoch[0] Validation-RMSE=3.714925
INFO:root:Epoch[1] Batch [312]	Speed: 42458.83 samples/sec	Train-RMSE=3.680307
INFO:root:Epoch[1] Batch [624]	Speed: 42960.69 samples/sec	Train-RMSE=3.627375
INFO:root:Epoch[1] Batch [936]	Speed: 42684.44 samples/sec	Train-RMSE=3.283429
INFO:root:Epoch[1] Batch [1248]	Speed: 42694.95 samples/sec	Train-RMSE=2.586398
INFO:root:Epoch[1] Resetting Data Iterator
INFO:root:Epoch[1] Time cost=1.905
INFO:root:Epoch[1] Validation-RMSE=2.448542
INFO:root:Epoch[2] Batch [312]	Speed: 43118.68 samples/sec	Train-RMSE=2.025510
INFO:root:Epoch[2] Batch [624]	Speed: 42748.51 samples/sec	Train-RMSE=1.712786
INFO:root:Epoch[2] Batch [936]	Speed: 42972.15 samples/sec	Train-RMSE=1.505154
INFO:root:Epoch[2] Batch [1248]	Speed: 42779.65 samples/sec	Train-RMSE=1.376185
INFO:root:Epoch[2] Resetting Data Iterator
INFO:root:Epoch[2] Time cost=1.897
INFO:root:Epoch[2] Validation-RMSE=1.432550
INFO:root:Epoch[3] Batch [312]	Speed: 43192.26 samples/sec	Train-RMSE=1.266802
INFO:root:Epoch[3] Batch [624]	Speed: 43051.19 samples/sec	Train-RMSE=1.218756
INFO:root:Epoch[3] Batch [936]	Speed: 42522.01 samples/sec	Train-RMSE=1.169181
INFO:root:Epoch[3] Batch [1248]	Speed: 42781.29 samples/sec	Train-RMSE=1.142398
INFO:root:Epoch[3] Resetting Data Iterator
INFO:root:Epoch[3] Time cost=1.898
INFO:root:Epoch[3] Validation-RMSE=1.202146
INFO:root:Epoch[4] Batch [312]	Speed: 42951.35 samples/sec	Train-RMSE=1.094332
INFO:root:Epoch[4] Batch [624]	Speed: 42675.15 samples/sec	Train-RMSE=1.085629
INFO:root:Epoch[4] Batch [936]	Speed: 42919.15 samples/sec	Train-RMSE=1.073905
INFO:root:Epoch[4] Batch [1248]	Speed: 42915.65 samples/sec	Train-RMSE=1.056360
INFO:root:Epoch[4] Resetting Data Iterator
INFO:root:Epoch[4] Time cost=1.898
INFO:root:Epoch[4] Validation-RMSE=1.116253
INFO:root:Epoch[5] Batch [312]	Speed: 43046.65 samples/sec	Train-RMSE=1.031064
INFO:root:Epoch[5] Batch [624]	Speed: 43046.10 samples/sec	Train-RMSE=1.031983
INFO:root:Epoch[5] Batch [936]	Speed: 42901.89 samples/sec	Train-RMSE=1.024316
INFO:root:Epoch[5] Batch [1248]	Speed: 43107.51 samples/sec	Train-RMSE=1.028240
INFO:root:Epoch[5] Resetting Data Iterator
INFO:root:Epoch[5] Time cost=1.892
INFO:root:Epoch[5] Validation-RMSE=1.073396
INFO:root:Epoch[6] Batch [312]	Speed: 43002.51 samples/sec	Train-RMSE=1.010509
INFO:root:Epoch[6] Batch [624]	Speed: 42726.47 samples/sec	Train-RMSE=1.003917
INFO:root:Epoch[6] Batch [936]	Speed: 42738.72 samples/sec	Train-RMSE=1.002835
INFO:root:Epoch[6] Batch [1248]	Speed: 42982.32 samples/sec	Train-RMSE=1.000915
INFO:root:Epoch[6] Resetting Data Iterator
INFO:root:Epoch[6] Time cost=1.898
INFO:root:Epoch[6] Validation-RMSE=1.049511
INFO:root:Epoch[7] Batch [312]	Speed: 43566.30 samples/sec	Train-RMSE=0.985503
INFO:root:Epoch[7] Batch [624]	Speed: 43287.67 samples/sec	Train-RMSE=0.995254
INFO:root:Epoch[7] Batch [936]	Speed: 43351.22 samples/sec	Train-RMSE=0.993624
INFO:root:Epoch[7] Batch [1248]	Speed: 43160.14 samples/sec	Train-RMSE=0.988817
INFO:root:Epoch[7] Resetting Data Iterator
INFO:root:Epoch[7] Time cost=1.877
INFO:root:Epoch[7] Validation-RMSE=1.035829
INFO:root:Epoch[8] Batch [312]	Speed: 43220.97 samples/sec	Train-RMSE=0.984125
INFO:root:Epoch[8] Batch [624]	Speed: 43146.06 samples/sec	Train-RMSE=0.982134
INFO:root:Epoch[8] Batch [936]	Speed: 43274.74 samples/sec	Train-RMSE=0.978188
INFO:root:Epoch[8] Batch [1248]	Speed: 43234.06 samples/sec	Train-RMSE=0.976167
INFO:root:Epoch[8] Resetting Data Iterator
INFO:root:Epoch[8] Time cost=1.885
INFO:root:Epoch[8] Validation-RMSE=1.025111
INFO:root:Epoch[9] Batch [312]	Speed: 43331.84 samples/sec	Train-RMSE=0.965266
INFO:root:Epoch[9] Batch [624]	Speed: 43185.53 samples/sec	Train-RMSE=0.984249
INFO:root:Epoch[9] Batch [936]	Speed: 43261.89 samples/sec	Train-RMSE=0.972151
INFO:root:Epoch[9] Batch [1248]	Speed: 43151.76 samples/sec	Train-RMSE=0.972135
INFO:root:Epoch[9] Resetting Data Iterator
INFO:root:Epoch[9] Time cost=1.883
INFO:root:Epoch[9] Validation-RMSE=1.016984

Next we try to use 2 layers neural network to learn the latent variables, which stack a fully connected layer above the embedding layers:


In [8]:
# @@@ AUTOTEST_OUTPUT_IGNORED_CELL
def get_one_layer_mlp(hidden, k):
    # input
    user = mx.symbol.Variable('user')
    item = mx.symbol.Variable('item')
    score = mx.symbol.Variable('score')
    # user latent features
    user = mx.symbol.Embedding(data = user, input_dim = max_user, output_dim = k)
    user = mx.symbol.Activation(data = user, act_type="relu")
    user = mx.symbol.FullyConnected(data = user, num_hidden = hidden)
    # item latent features
    item = mx.symbol.Embedding(data = item, input_dim = max_item, output_dim = k)
    item = mx.symbol.Activation(data = item, act_type="relu")
    item = mx.symbol.FullyConnected(data = item, num_hidden = hidden)
    # predict by the inner product
    pred = user * item
    pred = mx.symbol.sum_axis(data = pred, axis = 1)
    pred = mx.symbol.Flatten(data = pred)
    # loss layer
    pred = mx.symbol.LinearRegressionOutput(data = pred, label = score)
    return pred

train(get_one_layer_mlp(64, 64), batch_size=64, num_epoch=10, learning_rate=.05)


INFO:root:Start training with [gpu(0)]
INFO:root:Epoch[0] Batch [312]	Speed: 29358.84 samples/sec	Train-RMSE=1.338036
INFO:root:Epoch[0] Batch [624]	Speed: 29040.21 samples/sec	Train-RMSE=1.030033
INFO:root:Epoch[0] Batch [936]	Speed: 30795.04 samples/sec	Train-RMSE=1.005132
INFO:root:Epoch[0] Batch [1248]	Speed: 29430.40 samples/sec	Train-RMSE=0.987999
INFO:root:Epoch[0] Resetting Data Iterator
INFO:root:Epoch[0] Time cost=2.733
INFO:root:Epoch[0] Validation-RMSE=0.990343
INFO:root:Epoch[1] Batch [312]	Speed: 29327.44 samples/sec	Train-RMSE=0.970059
INFO:root:Epoch[1] Batch [624]	Speed: 29056.96 samples/sec	Train-RMSE=0.963783
INFO:root:Epoch[1] Batch [936]	Speed: 28939.17 samples/sec	Train-RMSE=0.966315
INFO:root:Epoch[1] Batch [1248]	Speed: 29150.33 samples/sec	Train-RMSE=0.970925
INFO:root:Epoch[1] Resetting Data Iterator
INFO:root:Epoch[1] Time cost=2.781
INFO:root:Epoch[1] Validation-RMSE=0.978711
INFO:root:Epoch[2] Batch [312]	Speed: 29850.39 samples/sec	Train-RMSE=0.949423
INFO:root:Epoch[2] Batch [624]	Speed: 29131.78 samples/sec	Train-RMSE=0.950519
INFO:root:Epoch[2] Batch [936]	Speed: 29222.54 samples/sec	Train-RMSE=0.953821
INFO:root:Epoch[2] Batch [1248]	Speed: 29144.28 samples/sec	Train-RMSE=0.957096
INFO:root:Epoch[2] Resetting Data Iterator
INFO:root:Epoch[2] Time cost=2.762
INFO:root:Epoch[2] Validation-RMSE=0.967527
INFO:root:Epoch[3] Batch [312]	Speed: 31087.31 samples/sec	Train-RMSE=0.938336
INFO:root:Epoch[3] Batch [624]	Speed: 30903.51 samples/sec	Train-RMSE=0.940157
INFO:root:Epoch[3] Batch [936]	Speed: 30788.64 samples/sec	Train-RMSE=0.961417
INFO:root:Epoch[3] Batch [1248]	Speed: 29309.31 samples/sec	Train-RMSE=0.957611
INFO:root:Epoch[3] Resetting Data Iterator
INFO:root:Epoch[3] Time cost=2.654
INFO:root:Epoch[3] Validation-RMSE=0.965186
INFO:root:Epoch[4] Batch [312]	Speed: 30882.11 samples/sec	Train-RMSE=0.941631
INFO:root:Epoch[4] Batch [624]	Speed: 30760.84 samples/sec	Train-RMSE=0.944820
INFO:root:Epoch[4] Batch [936]	Speed: 30836.56 samples/sec	Train-RMSE=0.947041
INFO:root:Epoch[4] Batch [1248]	Speed: 30889.28 samples/sec	Train-RMSE=0.958895
INFO:root:Epoch[4] Resetting Data Iterator
INFO:root:Epoch[4] Time cost=2.625
INFO:root:Epoch[4] Validation-RMSE=1.021695
INFO:root:Epoch[5] Batch [312]	Speed: 31184.36 samples/sec	Train-RMSE=0.939983
INFO:root:Epoch[5] Batch [624]	Speed: 31011.36 samples/sec	Train-RMSE=0.942205
INFO:root:Epoch[5] Batch [936]	Speed: 30927.93 samples/sec	Train-RMSE=0.948742
INFO:root:Epoch[5] Batch [1248]	Speed: 31253.58 samples/sec	Train-RMSE=0.946025
INFO:root:Epoch[5] Resetting Data Iterator
INFO:root:Epoch[5] Time cost=2.604
INFO:root:Epoch[5] Validation-RMSE=0.971389
INFO:root:Epoch[6] Batch [312]	Speed: 28975.54 samples/sec	Train-RMSE=0.944098
INFO:root:Epoch[6] Batch [624]	Speed: 29132.85 samples/sec	Train-RMSE=0.948457
INFO:root:Epoch[6] Batch [936]	Speed: 29212.85 samples/sec	Train-RMSE=0.939584
INFO:root:Epoch[6] Batch [1248]	Speed: 29189.70 samples/sec	Train-RMSE=0.949756
INFO:root:Epoch[6] Resetting Data Iterator
INFO:root:Epoch[6] Time cost=2.781
INFO:root:Epoch[6] Validation-RMSE=0.979447
INFO:root:Epoch[7] Batch [312]	Speed: 29228.83 samples/sec	Train-RMSE=0.933935
INFO:root:Epoch[7] Batch [624]	Speed: 29092.05 samples/sec	Train-RMSE=0.944123
INFO:root:Epoch[7] Batch [936]	Speed: 29902.46 samples/sec	Train-RMSE=0.943008
INFO:root:Epoch[7] Batch [1248]	Speed: 31193.96 samples/sec	Train-RMSE=0.952697
INFO:root:Epoch[7] Resetting Data Iterator
INFO:root:Epoch[7] Time cost=2.714
INFO:root:Epoch[7] Validation-RMSE=0.966694
INFO:root:Epoch[8] Batch [312]	Speed: 29643.79 samples/sec	Train-RMSE=0.941655
INFO:root:Epoch[8] Batch [624]	Speed: 29277.61 samples/sec	Train-RMSE=0.932646
INFO:root:Epoch[8] Batch [936]	Speed: 29298.30 samples/sec	Train-RMSE=0.948542
INFO:root:Epoch[8] Batch [1248]	Speed: 29119.72 samples/sec	Train-RMSE=0.936999
INFO:root:Epoch[8] Resetting Data Iterator
INFO:root:Epoch[8] Time cost=2.760
INFO:root:Epoch[8] Validation-RMSE=0.964616
INFO:root:Epoch[9] Batch [312]	Speed: 29232.17 samples/sec	Train-RMSE=0.934462
INFO:root:Epoch[9] Batch [624]	Speed: 29045.67 samples/sec	Train-RMSE=0.946235
INFO:root:Epoch[9] Batch [936]	Speed: 29202.37 samples/sec	Train-RMSE=0.943892
INFO:root:Epoch[9] Batch [1248]	Speed: 29250.80 samples/sec	Train-RMSE=0.939333
INFO:root:Epoch[9] Resetting Data Iterator
INFO:root:Epoch[9] Time cost=2.779
INFO:root:Epoch[9] Validation-RMSE=1.057628

Adding dropout layers to relief the over-fitting.


In [9]:
# @@@ AUTOTEST_OUTPUT_IGNORED_CELL
def get_one_layer_dropout_mlp(hidden, k):
    # input
    user = mx.symbol.Variable('user')
    item = mx.symbol.Variable('item')
    score = mx.symbol.Variable('score')
    # user latent features
    user = mx.symbol.Embedding(data = user, input_dim = max_user, output_dim = k)
    user = mx.symbol.Activation(data = user, act_type="relu")
    user = mx.symbol.FullyConnected(data = user, num_hidden = hidden)
    user = mx.symbol.Dropout(data=user, p=0.5)
    # item latent features
    item = mx.symbol.Embedding(data = item, input_dim = max_item, output_dim = k)
    item = mx.symbol.Activation(data = item, act_type="relu")
    item = mx.symbol.FullyConnected(data = item, num_hidden = hidden)
    item = mx.symbol.Dropout(data=item, p=0.5)    
    # predict by the inner product
    pred = user * item
    pred = mx.symbol.sum_axis(data = pred, axis = 1)
    pred = mx.symbol.Flatten(data = pred)
    # loss layer
    pred = mx.symbol.LinearRegressionOutput(data = pred, label = score)
    return pred
train(get_one_layer_mlp(256, 512), batch_size=64, num_epoch=10, learning_rate=.05)


INFO:root:Start training with [gpu(0)]
INFO:root:Epoch[0] Batch [312]	Speed: 31588.74 samples/sec	Train-RMSE=1.284921
INFO:root:Epoch[0] Batch [624]	Speed: 32668.93 samples/sec	Train-RMSE=1.007235
INFO:root:Epoch[0] Batch [936]	Speed: 32648.85 samples/sec	Train-RMSE=0.988519
INFO:root:Epoch[0] Batch [1248]	Speed: 32748.70 samples/sec	Train-RMSE=0.971257
INFO:root:Epoch[0] Resetting Data Iterator
INFO:root:Epoch[0] Time cost=2.507
INFO:root:Epoch[0] Validation-RMSE=1.019258
INFO:root:Epoch[1] Batch [312]	Speed: 32408.79 samples/sec	Train-RMSE=0.950525
INFO:root:Epoch[1] Batch [624]	Speed: 32447.41 samples/sec	Train-RMSE=0.953668
INFO:root:Epoch[1] Batch [936]	Speed: 32325.17 samples/sec	Train-RMSE=0.948842
INFO:root:Epoch[1] Batch [1248]	Speed: 32282.63 samples/sec	Train-RMSE=0.958244
INFO:root:Epoch[1] Resetting Data Iterator
INFO:root:Epoch[1] Time cost=2.504
INFO:root:Epoch[1] Validation-RMSE=0.988164
INFO:root:Epoch[2] Batch [312]	Speed: 32210.61 samples/sec	Train-RMSE=0.950271
INFO:root:Epoch[2] Batch [624]	Speed: 32531.03 samples/sec	Train-RMSE=0.945151
INFO:root:Epoch[2] Batch [936]	Speed: 32286.91 samples/sec	Train-RMSE=0.945528
INFO:root:Epoch[2] Batch [1248]	Speed: 32432.80 samples/sec	Train-RMSE=0.951414
INFO:root:Epoch[2] Resetting Data Iterator
INFO:root:Epoch[2] Time cost=2.506
INFO:root:Epoch[2] Validation-RMSE=0.971108
INFO:root:Epoch[3] Batch [312]	Speed: 32375.11 samples/sec	Train-RMSE=0.949024
INFO:root:Epoch[3] Batch [624]	Speed: 32215.49 samples/sec	Train-RMSE=0.948121
INFO:root:Epoch[3] Batch [936]	Speed: 32142.47 samples/sec	Train-RMSE=0.931569
INFO:root:Epoch[3] Batch [1248]	Speed: 32337.05 samples/sec	Train-RMSE=0.946255
INFO:root:Epoch[3] Resetting Data Iterator
INFO:root:Epoch[3] Time cost=2.512
INFO:root:Epoch[3] Validation-RMSE=0.979659
INFO:root:Epoch[4] Batch [312]	Speed: 32381.31 samples/sec	Train-RMSE=0.937760
INFO:root:Epoch[4] Batch [624]	Speed: 32408.33 samples/sec	Train-RMSE=0.939204
INFO:root:Epoch[4] Batch [936]	Speed: 32315.08 samples/sec	Train-RMSE=0.948780
INFO:root:Epoch[4] Batch [1248]	Speed: 32315.90 samples/sec	Train-RMSE=0.946596
INFO:root:Epoch[4] Resetting Data Iterator
INFO:root:Epoch[4] Time cost=2.508
INFO:root:Epoch[4] Validation-RMSE=0.989020
INFO:root:Epoch[5] Batch [312]	Speed: 32452.73 samples/sec	Train-RMSE=0.931686
INFO:root:Epoch[5] Batch [624]	Speed: 33032.10 samples/sec	Train-RMSE=0.936187
INFO:root:Epoch[5] Batch [936]	Speed: 32601.67 samples/sec	Train-RMSE=0.938291
INFO:root:Epoch[5] Batch [1248]	Speed: 31272.81 samples/sec	Train-RMSE=0.940872
INFO:root:Epoch[5] Resetting Data Iterator
INFO:root:Epoch[5] Time cost=2.510
INFO:root:Epoch[5] Validation-RMSE=0.959602
INFO:root:Epoch[6] Batch [312]	Speed: 31634.43 samples/sec	Train-RMSE=0.920564
INFO:root:Epoch[6] Batch [624]	Speed: 31489.65 samples/sec	Train-RMSE=0.942716
INFO:root:Epoch[6] Batch [936]	Speed: 31589.54 samples/sec	Train-RMSE=0.943799
INFO:root:Epoch[6] Batch [1248]	Speed: 31413.52 samples/sec	Train-RMSE=0.942387
INFO:root:Epoch[6] Resetting Data Iterator
INFO:root:Epoch[6] Time cost=2.570
INFO:root:Epoch[6] Validation-RMSE=1.002785
INFO:root:Epoch[7] Batch [312]	Speed: 31559.69 samples/sec	Train-RMSE=0.931971
INFO:root:Epoch[7] Batch [624]	Speed: 31461.23 samples/sec	Train-RMSE=0.936018
INFO:root:Epoch[7] Batch [936]	Speed: 31464.00 samples/sec	Train-RMSE=0.935134
INFO:root:Epoch[7] Batch [1248]	Speed: 31442.45 samples/sec	Train-RMSE=0.939548
INFO:root:Epoch[7] Resetting Data Iterator
INFO:root:Epoch[7] Time cost=2.573
INFO:root:Epoch[7] Validation-RMSE=0.997704
INFO:root:Epoch[8] Batch [312]	Speed: 31483.89 samples/sec	Train-RMSE=0.913341
INFO:root:Epoch[8] Batch [624]	Speed: 31470.20 samples/sec	Train-RMSE=0.932365
INFO:root:Epoch[8] Batch [936]	Speed: 31448.09 samples/sec	Train-RMSE=0.937986
INFO:root:Epoch[8] Batch [1248]	Speed: 31527.63 samples/sec	Train-RMSE=0.941808
INFO:root:Epoch[8] Resetting Data Iterator
INFO:root:Epoch[8] Time cost=2.578
INFO:root:Epoch[8] Validation-RMSE=0.965740
INFO:root:Epoch[9] Batch [312]	Speed: 31604.54 samples/sec	Train-RMSE=0.923620
INFO:root:Epoch[9] Batch [624]	Speed: 31263.40 samples/sec	Train-RMSE=0.926735
INFO:root:Epoch[9] Batch [936]	Speed: 31391.19 samples/sec	Train-RMSE=0.919276
INFO:root:Epoch[9] Batch [1248]	Speed: 31518.78 samples/sec	Train-RMSE=0.931024
INFO:root:Epoch[9] Resetting Data Iterator
INFO:root:Epoch[9] Time cost=2.579
INFO:root:Epoch[9] Validation-RMSE=0.950934

Acknowledgement

This tutorial is based on examples from xlvector/github.