Denoising Autoencoders

Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion - Vincent et al. 2010

Use this code with no warranty and please respect the accompanying license.

In [8]:
# Imports
%reload_ext autoreload
%autoreload 1

import os, sys

from tools_general import tf, np
from IPython.display import Image
from tools_train import vis_square
from tools_config import data_dir
from tools_train import get_train_params, plot_latent_variable
import matplotlib.pyplot as plt
import imageio
from tensorflow.examples.tutorials.mnist import input_data
from tools_train import get_demo_data

In [2]:
# define parameters
networktype = 'CDAE_MNIST'

work_dir = '../trained_models/%s/' %networktype
if not os.path.exists(work_dir): os.makedirs(work_dir)

Network definitions

In [3]:
from CDAE import create_encoder, create_decoder, create_cdae_trainer

Training CDAE

You can either get the fully trained models from the google drive or train your own models using the script.


Create demo networks and restore weights

In [5]:
iter_num = 30030
best_model = work_dir + "Model_Iter_%.3d.ckpt"%iter_num
best_img = work_dir + 'Rec_Iter_%d.jpg'%iter_num


In [6]:
latentD = 2
batch_size = 128

demo_sess = tf.InteractiveSession()

is_training = tf.placeholder(tf.bool, [], 'is_training')

Xph = tf.placeholder(tf.float32, [None, 28, 28, 1])

Xenc_op = create_encoder(Xph, is_training, latentD, reuse=False, networktype=networktype + '_Enc') 
Xrec_op = create_decoder(Xenc_op, is_training, latentD, reuse=False, networktype=networktype + '_Dec')

Enc_varlist = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope=networktype + '_Enc')    
Dec_varlist = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope=networktype + '_Dec')
saver = tf.train.Saver(var_list=Enc_varlist+Dec_varlist)
saver.restore(demo_sess, best_model)

Organization of the data on the latent space

Here we encode all the test set data and plot the corresponding 2D values. The color will repsent respective number.

In [11]:
#Get uniform samples over the labels
spl = 800  # sample_per_label
data = input_data.read_data_sets(data_dir, one_hot=False, reshape=False)
Xdemo, Xdemo_labels = get_demo_data(data, spl)
Zdemo = np.random.normal(size=[spl * 10, latentD], loc=0.0, scale=1.).astype(np.float32)

decoded_data =, feed_dict={Xph:Xdemo, is_training:False})
plot_latent_variable(decoded_data, Xdemo_labels)

Extracting ../data/train-images-idx3-ubyte.gz
Extracting ../data/train-labels-idx1-ubyte.gz
Extracting ../data/t10k-images-idx3-ubyte.gz
Extracting ../data/t10k-labels-idx1-ubyte.gz

Generate new data

So CDAE is not a generative model per se and complex sampling methods exist that enable generating new data from their latent code. c.f. Generalized Denoising Auto-Encoders as Generative Models