In [0]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
In this notebook, we will create a vanilla autoencoder model using the TensorFlow subclassing API. We are going to use the popular MNIST dataset (Grayscale images of hand-written digits from 0 to 9).
We deal with huge amount of data in machine learning which naturally leads to more computations. However, we can pick the parts of the data which contribute the most to a model's learning, thus leading to less computations. The process of choosing the important parts of data is known as feature selection, which is among the number of use cases of an autoencoder.
But what exactly is an autoencoder? Well, let's first recall that a neural network is a computational model that is used for finding a function describing the relationship between data features $x$ and its values or labels $y$, i.e. $y = f(x)$.
Now, an autoencoder is also a neural network. But instead of finding the function mapping the features $x$ to their corresponding values or labels $y$, it aims to find the function mapping the features $x$ to itself $x$. Wait, what? Why would we do that?
Well, what's interesting is what happens inside the autoencoder.
In [0]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
assert tf.__version__.startswith('2')
from tensorflow.keras.datasets import mnist
print('TensorFlow version:', tf.__version__)
print('Is Executing Eagerly?', tf.executing_eagerly())
An autoencoder consists of two components: (1) an encoder which learns the data representation $z$, i.e. the important features of a given data $x$ (I like to describe it as what makes something something), and (2) a decoder which reconstructs the data $\hat{x}$ based on its idea $z$ of how it is structured. $$ z = f\big(h_{e}(x)\big)$$ $$ \hat{x} = f\big(h_{d}(z)\big)$$ where $z$ is the learned data representation by encoder $h_{e}$, and $\hat{x}$ is the reconstructed data by decoder $h_{d}$ based on $z$.
Let's further dissect the model below.
The first component, the encoder, is similar to a conventional feed-forward network. However, it is not tasked on predicting values (a regression task) or categories (a classification task). Instead, it is tasked to learn how the data is structured, i.e. data representation $z$. We can implement the encoder layer as follows,
In [0]:
class Encoder(tf.keras.layers.Layer):
def __init__(self, intermediate_dim):
super(Encoder, self).__init__()
self.hidden_layer = tf.keras.layers.Dense(units=intermediate_dim, activation=tf.nn.relu)
self.output_layer = tf.keras.layers.Dense(units=intermediate_dim, activation=tf.nn.relu)
def call(self, input_features):
activation = self.hidden_layer(input_features)
return self.output_layer(activation)
The encoding is done by passing data input $x$ to the encoder's hidden layer $h$ in order to learn the data representation $z = f(h(x))$.
We first create an Encoder
class that inherits the tf.keras.layers.Layer
class to define it as a layer. So, why a layer instead of a model? Recall that the encoder is a component of the autoencoder model.
Analyzing the code, the Encoder
layer is defined to have a single hidden layer of neurons (self.hidden_layer
) to learn the input features. Then, we connect the hidden layer to a layer (self.output_layer
) that encodes the learned activations to lower dimension which consists of what it thinks as important features. Hence, the "output" of the Encoder
layer is the what makes something something $z$ of the data $x$.
The second component, the decoder, is also similar to a feed-forward network. However, instead of reducing data to lower dimension, it attempts to reverse the process, i.e. reconstruct the data $\hat{x}$ from its lower dimension representation $z$ to its original dimension.
The decoding is done by passing the lower dimension representation $z$ to the decoder's hidden layer $h$ in order to reconstruct the data to its original dimension $\hat{x} = f(h(z))$. We can implement the decoder layer as follows,
In [0]:
class Decoder(tf.keras.layers.Layer):
def __init__(self, intermediate_dim, original_dim):
super(Decoder, self).__init__()
self.hidden_layer = tf.keras.layers.Dense(units=intermediate_dim, activation=tf.nn.relu)
self.output_layer = tf.keras.layers.Dense(units=original_dim, activation=tf.nn.relu)
def call(self, code):
activation = self.hidden_layer(code)
return self.output_layer(activation)
We now create a Decoder
class that also inherits the tf.keras.layers.Layer
.
The Decoder
layer is also defined to have a single hidden layer of neurons to reconstruct the input features $\hat{x}$ from the learned representation $z$ by the encoder $f\big(h_{e}(x)\big)$. Then, we connect its hidden layer to a layer that decodes the data representation from lower dimension $z$ to its original dimension $\hat{x}$. Hence, the "output" of the Decoder
layer is the reconstructed data $\hat{x}$ from the data representation $z$.
Ultimately, the output of the decoder is the autoencoder's output.
Now that we have defined the components of our autoencoder, we can finally build our model.
In [0]:
class Autoencoder(tf.keras.Model):
def __init__(self, intermediate_dim, original_dim):
super(Autoencoder, self).__init__()
self.loss = []
self.encoder = Encoder(intermediate_dim=intermediate_dim)
self.decoder = Decoder(intermediate_dim=intermediate_dim, original_dim=original_dim)
def call(self, input_features):
code = self.encoder(input_features)
reconstructed = self.decoder(code)
return reconstructed
As discussed above, the encoder's output is the input to the decoder, as it is written above (reconstructed = self.decoder(code)
).
We only discussed and built the model, but we talked about how it actually learns. All we know up to this point is the flow of learning from the input layer of the encoder which supposedly learns the data representation, and use that representation as input to the decoder that reconstructs the original data. Like "simple" neural networks, an autoencoder learns through backpropagation. However, instead of comparing the values or labels of the model, we compare the reconstructed data and the original data. Let's call this comparison the reconstruction error function, and it is given by the following equation, $$ L = \dfrac{1}{n} \sum_{i=0}^{n-1} \big(\hat{x}_{i} - x_{i}\big)^{2}$$ where $\hat{x}$ is the reconstructed data while $x$ is the original data.
In [0]:
def loss(preds, real):
return tf.reduce_mean(tf.square(tf.subtract(preds, real)))
In [0]:
def train(loss, model, opt, original):
with tf.GradientTape() as tape:
preds = model(original)
reconstruction_error = loss(preds, original)
gradients = tape.gradient(reconstruction_error, model.trainable_variables)
gradient_variables = zip(gradients, model.trainable_variables)
opt.apply_gradients(gradient_variables)
return reconstruction_error
In [0]:
def train_loop(model, opt, loss, dataset, epochs):
for epoch in range(epochs):
epoch_loss = 0
for step, batch_features in enumerate(dataset):
loss_values = train(loss, model, opt, batch_features)
epoch_loss += loss_values
model.loss.append(epoch_loss)
print('Epoch {}/{}. Loss: {}'.format(epoch + 1, epochs, epoch_loss.numpy()))
Now that we have defined our Autoencoder
class, the loss function, and the training loop, let's import the dataset. We will normalize the pixel values for each example through dividing by maximum pixel value. We shall flatten the examples from 28 by 28 arrays to 784-dimensional vectors.
In [0]:
(x_train, _), (x_test, _) = mnist.load_data()
x_train = x_train / 255.
x_train = x_train.astype(np.float32)
x_train = np.reshape(x_train, (x_train.shape[0], 784))
x_test = np.reshape(x_test, (x_test.shape[0], 784))
training_dataset = tf.data.Dataset.from_tensor_slices(x_train).batch(256)
In [0]:
model = Autoencoder(intermediate_dim=128, original_dim=784)
opt = tf.keras.optimizers.Adam(learning_rate=1e-2)
train_loop(model, opt, loss, training_dataset, 20)
In [0]:
plt.plot(range(20), model.loss)
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.show()
In [0]:
number = 10 # how many digits we will display
plt.figure(figsize=(20, 4))
for index in range(number):
# display original
ax = plt.subplot(2, number, index + 1)
plt.imshow(x_test[index].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# display reconstruction
ax = plt.subplot(2, number, index + 1 + number)
plt.imshow(model(x_test)[index].numpy().reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
As you may see after training this model, the reconstructed images are quite blurry. A number of things could be done to move forward from this point, e.g. adding more layers, or using a convolutional neural network architecture as the basis of the autoencoder, or use a different kind of autoencoder.