(Edit) I fixed this notebook so that it can be run top to bottom to reproduce everything. Also to get the ipython notebook format, change "html" to "ipynb" at the end of the URL above to access the notebook file.
In this blog series we are going to build towards a rather complicated, deep learning model that can be applied to an interesting bioinformatics application. I don't want to explain the end goal right now, since I want to keep that confidential until it is published. However the end goal is complicated enough, and I am unfamiliar with both the deep learning models, and the tensorflow framwork, that we aren't going to try to implement the end goal in one shot, rather we are going to implement a series of smaller simpler tensorflow models, to build up some pieces that can then be assembled later.
In this series we will investigate deep generative models such as Variational Auto-Encoders (VAE) and Generative Adversarial Networks (GAN). We will also develop recurrent neural networks, including LSTM and especially seq2seq models.
To start off, in this notebook, we look below at a VAE on the MNIST dataset, just to get familiar with VAE's, and generative models overall (which will be a piece that we need later), and to get familliar with tensorflow.
A lot of this notebook is taken and modified from https://jmetzen.github.io/2015-11-27/vae.html
In [1]:
import sys
import os
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
%matplotlib inline
np.random.seed(0)
tf.set_random_seed(0)
# get the script bellow from
# https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/mnist/input_data.py
import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
n_samples = mnist.train.num_examples
In [2]:
# We have training data Images 28x28=782 gray scale pixels (a list of 782 length numpy vectors):
print(mnist.train.images[0].shape) # 784
print(len(mnist.train.images)) # 55000
print(sys.getsizeof(mnist.train.images)) # 172 million bytes (fits in memory, yay!)
# the labels are stored as one-hot encoding (10 dimensional numpy vectors)
print(mnist.train.labels[0].shape) # 10
print(len(mnist.train.labels)) # 55000
# We also have 5000 validation image-label pairs
print(len(mnist.validation.labels)) # 5000
# and 10000 testing image-label pairs
print(len(mnist.test.labels)) # 10000
Let me briefly explain in words how a VAE works. First, like all generative models, VAEs are inherently unsupervised, here we aren't going to use the "labels" at all and we aren't going to use the validation or testing sets. How a VAE works, is that it takes an input, in this case a MNIST image, and maps that onto an internal encoding/embedding, which is then used to reconstruct the original input image.
In the specific case of deep VAE, we have a neural network (the enncoder/recognition network) that transforms the input, x, into a set of parameters that determine a probability distribution. This probability distribution is then sampled from to get a vector z. That vector is fed into a second network (the decoder/generator network) which attempts to output the original image, x, with maximum accuracy. The whole thing can be trained iteratively with simple gradient decent via back propogation.
Note that n_z samples of the latent distribution will be drawn for each input image
In [3]:
# hyper params:
n_hidden_recog_1=500 # 1st layer encoder neurons
n_hidden_recog_2=500 # 2nd layer encoder neurons
n_hidden_gener_1=500 # 1st layer decoder neurons
n_hidden_gener_2=500 # 2nd layer decoder neurons
n_input=784 # MNIST data input (img shape: 28*28)
n_z=20
transfer_fct=tf.nn.softplus
learning_rate=0.001
batch_size=100
# CREATE NETWORK
# 1) input placeholder
x = tf.placeholder(tf.float32, [None, n_input])
# 2) weights and biases variables
def xavier_init(fan_in, fan_out, constant=1):
""" Xavier initialization of network weights"""
# https://stackoverflow.com/questions/33640581/how-to-do-xavier-initialization-on-tensorflow
low = -constant*np.sqrt(6.0/(fan_in + fan_out))
high = constant*np.sqrt(6.0/(fan_in + fan_out))
return tf.random_uniform((fan_in, fan_out),
minval=low, maxval=high,
dtype=tf.float32)
In [4]:
wr_h1 = tf.Variable(xavier_init(n_input, n_hidden_recog_1))
wr_h2 = tf.Variable(xavier_init(n_hidden_recog_1, n_hidden_recog_2))
wr_out_mean = tf.Variable(xavier_init(n_hidden_recog_2, n_z))
wr_out_log_sigma = tf.Variable(xavier_init(n_hidden_recog_2, n_z))
br_b1 = tf.Variable(tf.zeros([n_hidden_recog_1], dtype=tf.float32))
br_b2 = tf.Variable(tf.zeros([n_hidden_recog_2], dtype=tf.float32))
br_out_mean = tf.Variable(tf.zeros([n_z], dtype=tf.float32))
br_out_log_sigma = tf.Variable(tf.zeros([n_z], dtype=tf.float32))
wg_h1 = tf.Variable(xavier_init(n_z, n_hidden_gener_1))
wg_h2 = tf.Variable(xavier_init(n_hidden_gener_1, n_hidden_gener_2))
wg_out_mean = tf.Variable(xavier_init(n_hidden_gener_2, n_input))
# wg_out_log_sigma = tf.Variable(xavier_init(n_hidden_gener_2, n_input))
bg_b1 = tf.Variable(tf.zeros([n_hidden_gener_1], dtype=tf.float32))
bg_b2 = tf.Variable(tf.zeros([n_hidden_gener_2], dtype=tf.float32))
bg_out_mean = tf.Variable(tf.zeros([n_input], dtype=tf.float32))
# 3) recognition network
# use recognition network to predict mean and (log) variance of (latent) Gaussian distribution z (n_z dimensional)
r_layer_1 = transfer_fct(tf.add(tf.matmul(x, wr_h1), br_b1))
r_layer_2 = transfer_fct(tf.add(tf.matmul(r_layer_1, wr_h2), br_b2))
z_mean = tf.add(tf.matmul(r_layer_2, wr_out_mean), br_out_mean)
z_sigma = tf.add(tf.matmul(r_layer_2, wr_out_log_sigma), br_out_log_sigma)
# 4) do sampling on recognition network to get latent variables
# draw one n_z dimensional sample (for each input in batch), from normal distribution
eps = tf.random_normal((batch_size, n_z), 0, 1, dtype=tf.float32)
# scale that set of samples by predicted mu and epsilon to get samples of z, the latent distribution
# z = mu + sigma*epsilon
z = tf.add(z_mean, tf.mul(tf.sqrt(tf.exp(z_sigma)), eps))
# 5) use generator network to predict mean of Bernoulli distribution of reconstructed input
g_layer_1 = transfer_fct(tf.add(tf.matmul(z, wg_h1), bg_b1))
g_layer_2 = transfer_fct(tf.add(tf.matmul(g_layer_1, wg_h2), bg_b2))
x_reconstr_mean = tf.nn.sigmoid(tf.add(tf.matmul(g_layer_2, wg_out_mean), bg_out_mean))
In [5]:
# DEFINE LOSS AND OPTIMIZER
# The loss is composed of two terms:
# 1.) The reconstruction loss (the negative log probability
# of the input under the reconstructed Bernoulli distribution
# induced by the decoder in the data space).
# This can be interpreted as the number of "nats" required
# for reconstructing the input when the activation in latent
# is given.
# Adding 1e-10 to avoid evaluatio of log(0.0)
reconstr_loss = -tf.reduce_sum(x * tf.log(1e-10 + x_reconstr_mean) + (1-x) * tf.log(1e-10 + 1 - x_reconstr_mean), 1)
# 2.) The latent loss, which is defined as the Kullback Leibler divergence
## between the distribution in latent space induced by the encoder on
# the data and some prior. This acts as a kind of regularizer.
# This can be interpreted as the number of "nats" required
# for transmitting the the latent space distribution given
# the prior.
latent_loss = -0.5 * tf.reduce_sum(1 + z_sigma - tf.square(z_mean) - tf.exp(z_sigma), 1)
# Since reconstr_loss and latent_loss are in terms of "nats" they
# should be on simmilar scales. So we can add them together.
cost = tf.reduce_mean(reconstr_loss + latent_loss) # average over batch
# 3) set up optimizer (use ADAM optimizer)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
The above cells determine the tensorflow "graph" i.e. the graph consisting of nodes of ops connected by directed tensor-edges. But the graph above cannot be run yet. It hasn't been initialized into a "working graph" with statefull variables. The tensorflow tutorials refer to the graph as something like a "blueprint".
The session contains the state for a specific graph, including the current values of all of the tensors in the graph. Once the session is created, the variables are initialized. From now on, the variables have state, and can be read/viewed, or updated iteratively according to the data and the optimization op.
In [6]:
# INITALIZE VARIABLES AND TF SESSION
init = tf.initialize_all_variables()
sess = tf.InteractiveSession()
sess.run(init)
Because this is an ipython notebook and we are doing exploratory machine learning, we should have an ability to interact with the tensorflow model interactively. Note the line above sess = tf.InteractiveSession() which allows us to do things like some_tensor.eval(feed_dict={x: batch_xs}). Some tensors can simplly be evaluated without passing in an input, but only if they are something that does not depend on an input, remember tensor flow is a directed graph of op-nodes and tensor-edges that denote dependancies. Variables like weights and biases do not depend on any input, but something like the reconstructed input depends on the input.
In [7]:
# INTERACTIVE TESTING
# get a batch of inputs
batch_xs, _ = mnist.train.next_batch(batch_size)
# get the reconstructed mean for those inputs
# (on a completely untrained model)
# below is the "interactive version" to "read" a tensor (given an input in the feed_dict)
print(x_reconstr_mean.eval(feed_dict={x: batch_xs}).shape)
print(x_reconstr_mean.eval(feed_dict={x: batch_xs}))
print("----------------------------------------------------")
# below is the same but using explicit reference to the session state
print(tf.get_default_session().run(x_reconstr_mean, feed_dict={x: batch_xs}).shape)
print(tf.get_default_session().run(x_reconstr_mean, feed_dict={x: batch_xs}))
# and this is also the same thing:
print(sess.run(x_reconstr_mean, feed_dict={x: batch_xs}).shape)
print(sess.run(x_reconstr_mean, feed_dict={x: batch_xs}))
Note that the three printed reconstructions are not the same!! This isn't due to some bug. This is because the model is inherently stocastic. Given the same input, the reconstructions will be different. Especially here, with an untrained model.
Also note for the above tensors we printed, they are 100 by 784 dim tensors. The first dim, 100 is the batch size. The second dim, 784, is the flattened pixels in the reconstructed image.
This interactive testing can be usefull for tensorflow noobs, who want to make sure that their tensors and ops are compatible as they build thier evaluation graph one tensor/op at a time.
Remember, when using interactive tensorflow you have access to the current state of the model at any time. Just use eval. Even without interactive mode, it is still easy, you just have to make sure to keep the session which holds the tensorflow graph's state.
The session contains state, namely the current values of all of the variables (parameters) in the model. We can save this state to a file periodically which can be used to restart training after an interuption, or to reload the session/model for any reason (including perhaps deploying a trained model to be evaluated on another machine)
In [8]:
# make a saver object
saver = tf.train.Saver()
# save the current state of the session to file "model.ckpt"
save_path = saver.save(sess, "model.ckpt")
# a binding to a session can be restored from a file with the following
restored_session = tf.Session()
saver.restore(restored_session, "model.ckpt")
# prove the two sessions are equal:
# here we evaluate a weight variable that does not depend on input hence no feed_dict is nessisary
print(sess.run(wr_h1))
print("-------------------------------------------------------------------")
print(restored_session.run(wr_h1))
group tensors with with tf.name_scope('hidden') as scope:
and name tensors with the name property like a = tf.constant(5, name='alpha') prior to graph visualization.
Lets reorganize our computational graph with scopes and names, and add some summaries:
There are three summaries which we will add to the "cost" tensor. We will record its mean, standard deviation and histogram over a batch.
Below is the whole graph from above again. Lots of copying and pasting.
In [9]:
# remake the whole graph with scopes, names, summaries
wr_h1 = tf.Variable(xavier_init(n_input, n_hidden_recog_1))
wr_h2 = tf.Variable(xavier_init(n_hidden_recog_1, n_hidden_recog_2))
wr_out_mean = tf.Variable(xavier_init(n_hidden_recog_2, n_z))
wr_out_log_sigma = tf.Variable(xavier_init(n_hidden_recog_2, n_z))
br_b1 = tf.Variable(tf.zeros([n_hidden_recog_1], dtype=tf.float32))
br_b2 = tf.Variable(tf.zeros([n_hidden_recog_2], dtype=tf.float32))
br_out_mean = tf.Variable(tf.zeros([n_z], dtype=tf.float32))
br_out_log_sigma = tf.Variable(tf.zeros([n_z], dtype=tf.float32))
wg_h1 = tf.Variable(xavier_init(n_z, n_hidden_gener_1))
wg_h2 = tf.Variable(xavier_init(n_hidden_gener_1, n_hidden_gener_2))
wg_out_mean = tf.Variable(xavier_init(n_hidden_gener_2, n_input))
# wg_out_log_sigma = tf.Variable(xavier_init(n_hidden_gener_2, n_input))
bg_b1 = tf.Variable(tf.zeros([n_hidden_gener_1], dtype=tf.float32))
bg_b2 = tf.Variable(tf.zeros([n_hidden_gener_2], dtype=tf.float32))
bg_out_mean = tf.Variable(tf.zeros([n_input], dtype=tf.float32))
# 3) recognition network
# use recognition network to predict mean and (log) variance of (latent) Gaussian distribution z (n_z dimensional)
with tf.name_scope('recognition-encoding'):
r_layer_1 = transfer_fct(tf.add(tf.matmul(x, wr_h1), br_b1))
r_layer_2 = transfer_fct(tf.add(tf.matmul(r_layer_1, wr_h2), br_b2))
z_mean = tf.add(tf.matmul(r_layer_2, wr_out_mean), br_out_mean)
z_sigma = tf.add(tf.matmul(r_layer_2, wr_out_log_sigma), br_out_log_sigma)
# 4) do sampling on recognition network to get latent variables
# draw one n_z dimensional sample (for each input in batch), from normal distribution
eps = tf.random_normal((batch_size, n_z), 0, 1, dtype=tf.float32)
# scale that set of samples by predicted mu and epsilon to get samples of z, the latent distribution
# z = mu + sigma*epsilon
z = tf.add(z_mean, tf.mul(tf.sqrt(tf.exp(z_sigma)), eps))
# 5) use generator network to predict mean of Bernoulli distribution of reconstructed input
with tf.name_scope('generator-decoding'):
g_layer_1 = transfer_fct(tf.add(tf.matmul(z, wg_h1), bg_b1))
g_layer_2 = transfer_fct(tf.add(tf.matmul(g_layer_1, wg_h2), bg_b2))
x_reconstr_mean = tf.nn.sigmoid(tf.add(tf.matmul(g_layer_2, wg_out_mean), bg_out_mean))
reconstr_loss = -tf.reduce_sum(x * tf.log(1e-10 + x_reconstr_mean) + (1-x) * tf.log(1e-10 + 1 - x_reconstr_mean), 1)
latent_loss = -0.5 * tf.reduce_sum(1 + z_sigma - tf.square(z_mean) - tf.exp(z_sigma), 1)
batch_cost = reconstr_loss + latent_loss
cost = tf.reduce_mean(batch_cost) # average over batch
stdev_cost =tf.sqrt(tf.reduce_sum(tf.square(batch_cost - cost)))
tf.scalar_summary("mean_cost", cost)
tf.scalar_summary("sttdev_cost", stdev_cost)
tf.histogram_summary("histo_cost", batch_cost)
# 3) set up optimizer (use ADAM optimizer)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# this is the "summary op"
merged = tf.merge_all_summaries()
# INITALIZE VARIABLES AND TF SESSION
sess = tf.Session()
sess.run(tf.initialize_all_variables())
# this is the summary writer
train_writer = tf.train.SummaryWriter("summaries/train", sess.graph)
Now we finally run the training loop. We loop over all batches passing the feed dictionary in to the session run command with feed_dict. After display_step worth of epochs, w
In [11]:
# DO TRAINING
learning_rate=0.001
batch_size=100
training_epochs=10
display_step=1
save_ckpt_dir="model/"
os.mkdir(save_ckpt_dir)
saver = tf.train.Saver()
# Training cycle
for epoch in range(training_epochs):
avg_cost = 0
total_batch = int(n_samples / batch_size)
# Loop over all batches
for i in range(total_batch):
batch_xs, _ = mnist.train.next_batch(batch_size)
# Fit training using batch data ( don't want to eval summaries every time during training)
_, batch_cost = sess.run((optimizer, cost), feed_dict={x: batch_xs})
_, loss = sess.run([optimizer, cost],
feed_dict={x: batch_xs})
avg_cost += loss / n_samples * batch_size
if epoch % display_step == 0:
# At the end of every epoch print loss, save model, and save summaries
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
# run op to get summaries (don't want to be running this everytime during training)
summary = sess.run(merged,
feed_dict={x: batch_xs},
options=run_options,
run_metadata=run_metadata)
# write summaries
train_writer.add_run_metadata(run_metadata, 'epoch%03d' % epoch)
train_writer.add_summary(summary, epoch)
# write ckpt
save_path = saver.save(sess, save_ckpt_dir + "model_" + str(epoch) + ".ckpt")
# print ave cost
avg_cost += loss / n_samples * batch_size
print("Epoch:", '%04d' % (epoch+1), "average loss:", "{:.9f}".format(avg_cost))
One thing that we can do to evaluate training is print the reconstruction image with a new (untrained) session and compare that visually to the reconstruction that can be achieved with our trained model. See the plotting of those reconstructions bellow. The difference is immediately apparent!
In [12]:
# DO RECONSTRUCTION / PLOTTING
def do_reconstruction(sess):
x_sample = mnist.test.next_batch(100)[0]
x_reconstruct = sess.run(x_reconstr_mean, feed_dict={x: x_sample})
plt.figure(figsize=(8, 12))
examples_to_plot = 3
for i in range(examples_to_plot):
plt.subplot(examples_to_plot, 2, 2*i + 1)
plt.imshow(x_sample[i].reshape(28, 28), vmin=0, vmax=1)
plt.title("Test input")
plt.colorbar()
plt.subplot(examples_to_plot, 2, 2*i + 2)
plt.imshow(x_reconstruct[i].reshape(28, 28), vmin=0, vmax=1)
plt.title("Reconstruction")
plt.colorbar()
plt.tight_layout()
In [13]:
new_sess = tf.Session()
new_sess.run(tf.initialize_all_variables())
# do reconstruction before training
do_reconstruction(new_sess)
In [14]:
# do reconstruction after training
do_reconstruction(sess)
If everything worked correctly, during the training, a directory called summaries should have been created. We can use the tensorboard tool to visualize our computation graph, and our summaries/events over time.
tensorboard --logdir=summaries
and point your browser at the specified address. You should see a web gui. Looking uder the "events" tab you will see the graphs of the mean loss and stdev loss evaluated at the end of each epoch. The full computation graph is under "graph" complete with provided names of tensors (or the default generic names) and scopes will be clustered together, but there is no guarantee it looks good, my graph looks sort of messy. Never the less it can be a good way of visualizing your model. Finally, there is some histogram plotting of the loss avaliable under the "histograms" tab.
Ok in this notebook, we got started learning about some of the important features of tensorflow, and we implemented an example of a VAE on the MNIST data set. In future notebooks we will continue to explore tensorflow features, and also explore seq2seq and recurrent neural nets. Again, we will eventually build towards a model that is going to be really cool and novel, bioinformatics related, but more on that later.
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]: