A Simple Autoencoder

We'll start off by building a simple autoencoder to compress the MNIST dataset. With autoencoders, we pass input data through an encoder that makes a compressed representation of the input. Then, this representation is passed through a decoder to reconstruct the input data. Generally the encoder and decoder will be built with neural networks, then trained on example data.

In this notebook, we'll be build a simple network architecture for the encoder and decoder. Let's get started by importing our libraries and getting the dataset.



In [1]:

    
%matplotlib inline

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt



In [2]:

    
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', validation_size=0)









    



Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz

Below I'm plotting an example image from the MNIST dataset. These are 28x28 grayscale images of handwritten digits.



In [3]:

    
img = mnist.train.images[2]
plt.imshow(img.reshape((28, 28)), cmap='Greys_r')









    Out[3]:





<matplotlib.image.AxesImage at 0x7fe24d8a0588>

We'll train an autoencoder with these images by flattening them into 784 length vectors. The images from this dataset are already normalized such that the values are between 0 and 1. Let's start by building basically the simplest autoencoder with a single ReLU hidden layer. This layer will be used as the compressed representation. Then, the encoder is the input layer and the hidden layer. The decoder is the hidden layer and the output layer. Since the images are normalized between 0 and 1, we need to use a sigmoid activation on the output layer to get values matching the input.

Exercise: Build the graph for the autoencoder in the cell below. The input images will be flattened into 784 length vectors. The targets are the same as the inputs. And there should be one hidden layer with a ReLU activation and an output layer with a sigmoid activation. The loss should be calculated with the cross-entropy loss, there is a convenient TensorFlow function for this tf.nn.sigmoid_cross_entropy_with_logits (documentation). You should note that tf.nn.sigmoid_cross_entropy_with_logits takes the logits, but to get the reconstructed images you'll need to pass the logits through the sigmoid function.



In [4]:

    
tf.reset_default_graph()



In [5]:

    
# Size of the encoding layer (the hidden layer)
encoding_dim = 32 # feel free to change this value

image_size = mnist.train.images.shape[1]
glorot_lim = np.sqrt(6 / (image_size+encoding_dim))

inputs_ = tf.placeholder(tf.float32, (None, image_size), name='inputs')
targets_ = tf.placeholder(tf.float32, (None, image_size), name='targets')

# Output of hidden layer
# encoded = tf.layers.dense(inputs_, encoding_dim, activation=tf.nn.relu)
weights1 = tf.Variable(
    tf.random_uniform([image_size, encoding_dim], -glorot_lim, glorot_lim))
biases1 = tf.Variable(tf.zeros(encoding_dim))
encoded = tf.nn.relu(tf.matmul(inputs_, weights1) + biases1)

# Output layer logits
# logits = tf.layers.dense(encoded, image_size, activation=None)
weights2 = tf.Variable(
    tf.random_uniform([encoding_dim, image_size], -glorot_lim, glorot_lim))
biases2 = tf.Variable(tf.zeros(image_size))
logits = tf.matmul(encoded, weights2) + biases2

# Sigmoid output from logits
decoded = tf.sigmoid(logits, name='output')

# Sigmoid cross-entropy loss
loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=targets_, logits=logits)
# Mean of the loss
cost = tf.reduce_mean(loss)

# Adam optimizer
opt = tf.train.AdamOptimizer(0.001).minimize(loss)

Training



In [6]:

    
# Create the session
sess = tf.Session()

Here I'll write a bit of code to train the network. I'm not too interested in validation here, so I'll just monitor the training loss.

Calling mnist.train.next_batch(batch_size) will return a tuple of (images, labels). We're not concerned with the labels here, we just need the images. Otherwise this is pretty straightfoward training with TensorFlow. We initialize the variables with sess.run(tf.global_variables_initializer()). Then, run the optimizer and get the loss with batch_cost, _ = sess.run([cost, opt], feed_dict=feed).



In [7]:

    
epochs = 20
batch_size = 200
show_after = 100

sess.run(tf.global_variables_initializer())
for e in range(epochs):
    for ii in range(mnist.train.num_examples//batch_size):
        batch = mnist.train.next_batch(batch_size)
        feed = {inputs_: batch[0], targets_: batch[0]}
        batch_cost, _ = sess.run([cost, opt], feed_dict=feed)

        if ii % show_after == 0:
            print("Epoch: {}/{}...".format(e+1, epochs),
                  "Training loss: {:.4f}".format(batch_cost))









    



Epoch: 1/20... Training loss: 0.6941
Epoch: 1/20... Training loss: 0.2800
Epoch: 1/20... Training loss: 0.2132
Epoch: 2/20... Training loss: 0.1880
Epoch: 2/20... Training loss: 0.1761
Epoch: 2/20... Training loss: 0.1605
Epoch: 3/20... Training loss: 0.1614
Epoch: 3/20... Training loss: 0.1454
Epoch: 3/20... Training loss: 0.1406
Epoch: 4/20... Training loss: 0.1354
Epoch: 4/20... Training loss: 0.1292
Epoch: 4/20... Training loss: 0.1318
Epoch: 5/20... Training loss: 0.1223
Epoch: 5/20... Training loss: 0.1163
Epoch: 5/20... Training loss: 0.1123
Epoch: 6/20... Training loss: 0.1161
Epoch: 6/20... Training loss: 0.1073
Epoch: 6/20... Training loss: 0.1083
Epoch: 7/20... Training loss: 0.1079
Epoch: 7/20... Training loss: 0.1071
Epoch: 7/20... Training loss: 0.1071
Epoch: 8/20... Training loss: 0.1065
Epoch: 8/20... Training loss: 0.1046
Epoch: 8/20... Training loss: 0.0981
Epoch: 9/20... Training loss: 0.1018
Epoch: 9/20... Training loss: 0.0979
Epoch: 9/20... Training loss: 0.1011
Epoch: 10/20... Training loss: 0.0978
Epoch: 10/20... Training loss: 0.0968
Epoch: 10/20... Training loss: 0.0994
Epoch: 11/20... Training loss: 0.0977
Epoch: 11/20... Training loss: 0.0953
Epoch: 11/20... Training loss: 0.0950
Epoch: 12/20... Training loss: 0.0973
Epoch: 12/20... Training loss: 0.0946
Epoch: 12/20... Training loss: 0.0964
Epoch: 13/20... Training loss: 0.0953
Epoch: 13/20... Training loss: 0.0958
Epoch: 13/20... Training loss: 0.0973
Epoch: 14/20... Training loss: 0.0983
Epoch: 14/20... Training loss: 0.0954
Epoch: 14/20... Training loss: 0.0961
Epoch: 15/20... Training loss: 0.0938
Epoch: 15/20... Training loss: 0.0944
Epoch: 15/20... Training loss: 0.0993
Epoch: 16/20... Training loss: 0.0965
Epoch: 16/20... Training loss: 0.0912
Epoch: 16/20... Training loss: 0.0939
Epoch: 17/20... Training loss: 0.0951
Epoch: 17/20... Training loss: 0.0936
Epoch: 17/20... Training loss: 0.0924
Epoch: 18/20... Training loss: 0.0909
Epoch: 18/20... Training loss: 0.0935
Epoch: 18/20... Training loss: 0.0936
Epoch: 19/20... Training loss: 0.0922
Epoch: 19/20... Training loss: 0.0999
Epoch: 19/20... Training loss: 0.0912
Epoch: 20/20... Training loss: 0.0960
Epoch: 20/20... Training loss: 0.0905
Epoch: 20/20... Training loss: 0.0968

Checking out the results

Below I've plotted some of the test images along with their reconstructions. For the most part these look pretty good except for some blurriness in some parts.



In [8]:

    
fig, axes = plt.subplots(nrows=2, ncols=10, sharex=True, sharey=True, figsize=(20,4))
in_imgs = mnist.test.images[:10]
reconstructed, compressed = sess.run([decoded, encoded], feed_dict={inputs_: in_imgs})

for images, row in zip([in_imgs, reconstructed], axes):
    for img, ax in zip(images, row):
        ax.imshow(img.reshape((28, 28)), cmap='Greys_r')
        ax.get_xaxis().set_visible(False)
        ax.get_yaxis().set_visible(False)

fig.tight_layout(pad=0.1)



In [9]:

    
sess.close()

Up Next

We're dealing with images here, so we can (usually) get better performance using convolution layers. So, next we'll build a better autoencoder with convolutional layers.

In practice, autoencoders aren't actually better at compression compared to typical methods like JPEGs and MP3s. But, they are being used for noise reduction, which you'll also build.