Generative Adversarial Networks (GANs)

So far in CS231N, all the applications of neural networks that we have explored have been discriminative models that take an input and are trained to produce a labeled output. This has ranged from straightforward classification of image categories to sentence generation (which was still phrased as a classification problem, our labels were in vocabulary space and we’d learned a recurrence to capture multi-word labels). In this notebook, we will expand our repertoire, and build generative models using neural networks. Specifically, we will learn how to build models which generate novel images that resemble a set of training images.

What is a GAN?

In 2014, Goodfellow et al. presented a method for training generative models called Generative Adversarial Networks (GANs for short). In a GAN, we build two different neural networks. Our first network is a traditional classification network, called the discriminator. We will train the discriminator to take images, and classify them as being real (belonging to the training set) or fake (not present in the training set). Our other network, called the generator, will take random noise as input and transform it using a neural network to produce images. The goal of the generator is to fool the discriminator into thinking the images it produced are real.

We can think of this back and forth process of the generator ($G$) trying to fool the discriminator ($D$), and the discriminator trying to correctly classify real vs. fake as a minimax game: $$\underset{G}{\text{minimize}}\; \underset{D}{\text{maximize}}\; \mathbb{E}_{x \sim p_\text{data}}\left[\log D(x)\right] + \mathbb{E}_{z \sim p(z)}\left[\log \left(1-D(G(z))\right)\right]$$ where $x \sim p_\text{data}$ are samples from the input data, $z \sim p(z)$ are the random noise samples, $G(z)$ are the generated images using the neural network generator $G$, and $D$ is the output of the discriminator, specifying the probability of an input being real. In Goodfellow et al., they analyze this minimax game and show how it relates to minimizing the Jensen-Shannon divergence between the training data distribution and the generated samples from $G$.

To optimize this minimax game, we will alternate between taking gradient descent steps on the objective for $G$, and gradient ascent steps on the objective for $D$:

  1. update the generator ($G$) to minimize the probability of the discriminator making the correct choice.
  2. update the discriminator ($D$) to maximize the probability of the discriminator making the correct choice.

While these updates are useful for analysis, they do not perform well in practice. Instead, we will use a different objective when we update the generator: maximize the probability of the discriminator making the incorrect choice. This small change helps to allevaiate problems with the generator gradient vanishing when the discriminator is confident. This is the standard update used in most GAN papers, and was used in the original paper from Goodfellow et al..

In this assignment, we will alternate the following updates:

  1. Update the generator ($G$) to maximize the probability of the discriminator making the incorrect choice on generated data: $$\underset{G}{\text{maximize}}\; \mathbb{E}_{z \sim p(z)}\left[\log D(G(z))\right]$$
  2. Update the discriminator ($D$), to maximize the probability of the discriminator making the correct choice on real and generated data: $$\underset{D}{\text{maximize}}\; \mathbb{E}_{x \sim p_\text{data}}\left[\log D(x)\right] + \mathbb{E}_{z \sim p(z)}\left[\log \left(1-D(G(z))\right)\right]$$

What else is there?

Since 2014, GANs have exploded into a huge research area, with massive workshops, and hundreds of new papers. Compared to other approaches for generative models, they often produce the highest quality samples but are some of the most difficult and finicky models to train (see this github repo that contains a set of 17 hacks that are useful for getting models working). Improving the stabiilty and robustness of GAN training is an open research question, with new papers coming out every day! For a more recent tutorial on GANs, see here. There is also some even more recent exciting work that changes the objective function to Wasserstein distance and yields much more stable results across model architectures: WGAN, WGAN-GP.

GANs are not the only way to train a generative model! For other approaches to generative modeling check out the deep generative model chapter of the Deep Learning book. Another popular way of training neural networks as generative models is Variational Autoencoders (co-discovered here and here). Variational autoencoders combine neural networks with variational inference to train deep generative models. These models tend to be far more stable and easier to train but currently don't produce samples that are as pretty as GANs.

Example pictures of what you should expect (yours might look slightly different):

Setup


In [1]:
from __future__ import print_function, division
import tensorflow as tf
import numpy as np

import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# A bunch of utility functions

def show_images(images):
    images = np.reshape(images, [images.shape[0], -1])  # images reshape to (batch_size, D)
    sqrtn = int(np.ceil(np.sqrt(images.shape[0])))
    sqrtimg = int(np.ceil(np.sqrt(images.shape[1])))

    fig = plt.figure(figsize=(sqrtn, sqrtn))
    gs = gridspec.GridSpec(sqrtn, sqrtn)
    gs.update(wspace=0.05, hspace=0.05)

    for i, img in enumerate(images):
        ax = plt.subplot(gs[i])
        plt.axis('off')
        ax.set_xticklabels([])
        ax.set_yticklabels([])
        ax.set_aspect('equal')
        plt.imshow(img.reshape([sqrtimg,sqrtimg]))
    return

def preprocess_img(x):
    return 2 * x - 1.0

def deprocess_img(x):
    return (x + 1.0) / 2.0

def rel_error(x,y):
    return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))

def count_params():
    """Count the number of parameters in the current TensorFlow graph """
    param_count = np.sum([np.prod(x.get_shape().as_list()) for x in tf.global_variables()])
    return param_count


def get_session():
    config = tf.ConfigProto()
    config.gpu_options.allow_growth = True
    session = tf.Session(config=config)
    return session

answers = np.load('gan-checks-tf.npz')

Dataset

GANs are notoriously finicky with hyperparameters, and also require many training epochs. In order to make this assignment approachable without a GPU, we will be working on the MNIST dataset, which is 60,000 training and 10,000 test images. Each picture contains a centered image of white digit on black background (0 through 9). This was one of the first datasets used to train convolutional neural networks and it is fairly easy -- a standard CNN model can easily exceed 99% accuracy.

To simplify our code here, we will use the TensorFlow MNIST wrapper, which downloads and loads the MNIST dataset. See the documentation for more information about the interface. The default parameters will take 5,000 of the training examples and place them into a validation dataset. The data will be saved into a folder called MNIST_data.

Heads-up: The TensorFlow MNIST wrapper returns images as vectors. That is, they're size (batch, 784). If you want to treat them as images, we have to resize them to (batch,28,28) or (batch,28,28,1). They are also type np.float32 and bounded [0,1].


In [2]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('./cs231n/datasets/MNIST_data', one_hot=False)

# show a batch
show_images(mnist.train.next_batch(16)[0])


Extracting ./cs231n/datasets/MNIST_data/train-images-idx3-ubyte.gz
Extracting ./cs231n/datasets/MNIST_data/train-labels-idx1-ubyte.gz
Extracting ./cs231n/datasets/MNIST_data/t10k-images-idx3-ubyte.gz
Extracting ./cs231n/datasets/MNIST_data/t10k-labels-idx1-ubyte.gz

LeakyReLU

In the cell below, you should implement a LeakyReLU. See the class notes (where alpha is small number) or equation (3) in this paper. LeakyReLUs keep ReLU units from dying and are often used in GAN methods (as are maxout units, however those increase model size and therefore are not used in this notebook).

HINT: You should be able to use tf.maximum


In [3]:
def leaky_relu(x, alpha=0.01):
    """Compute the leaky ReLU activation function.
    
    Inputs:
    - x: TensorFlow Tensor with arbitrary shape
    - alpha: leak parameter for leaky ReLU
    
    Returns:
    TensorFlow Tensor with the same shape as x
    """
    # TODO: implement leaky ReLU
    x = tf.where(x > 0, x, alpha * x)
    return x

Test your leaky ReLU implementation. You should get errors < 1e-10


In [4]:
def test_leaky_relu(x, y_true):
    tf.reset_default_graph()
    with get_session() as sess:
        y_tf = leaky_relu(tf.constant(x))
        y = sess.run(y_tf)
        print('Maximum error: %g'%rel_error(y_true, y))

test_leaky_relu(answers['lrelu_x'], answers['lrelu_y'])


Maximum error: 0

Random Noise

Generate a TensorFlow Tensor containing uniform noise from -1 to 1 with shape [batch_size, dim].


In [5]:
def sample_noise(batch_size, dim):
    """Generate random uniform noise from -1 to 1.
    
    Inputs:
    - batch_size: integer giving the batch size of noise to generate
    - dim: integer giving the dimension of the the noise to generate
    
    Returns:
    TensorFlow Tensor containing uniform noise in [-1, 1] with shape [batch_size, dim]
    """
    # TODO: sample and return noise
    z = tf.random_uniform([batch_size, dim], -1, 1)
    return z

Make sure noise is the correct shape and type:


In [6]:
def test_sample_noise():
    batch_size = 3
    dim = 4
    tf.reset_default_graph()
    with get_session() as sess:
        z = sample_noise(batch_size, dim)
        # Check z has the correct shape
        assert z.get_shape().as_list() == [batch_size, dim]
        # Make sure z is a Tensor and not a numpy array
        assert isinstance(z, tf.Tensor)
        # Check that we get different noise for different evaluations
        z1 = sess.run(z)
        z2 = sess.run(z)
        assert not np.array_equal(z1, z2)
        # Check that we get the correct range
        assert np.all(z1 >= -1.0) and np.all(z1 <= 1.0)
        print("All tests passed!")
    
test_sample_noise()


All tests passed!

Discriminator

Our first step is to build a discriminator. You should use the layers in tf.layers to build the model. All fully connected layers should include bias terms.

Architecture:

  • Fully connected layer from size 784 to 256
  • LeakyReLU with alpha 0.01
  • Fully connected layer from 256 to 256
  • LeakyReLU with alpha 0.01
  • Fully connected layer from 256 to 1

The output of the discriminator should have shape [batch_size, 1], and contain real numbers corresponding to the scores that each of the batch_size inputs is a real image.


In [7]:
def discriminator(x):
    """Compute discriminator score for a batch of input images.
    
    Inputs:
    - x: TensorFlow Tensor of flattened input images, shape [batch_size, 784]
    
    Returns:
    TensorFlow Tensor with shape [batch_size, 1], containing the score 
    for an image being real for each input image.
    """
    with tf.variable_scope("discriminator"):
        # TODO: implement architecture
        fc1 = tf.layers.dense(x, 256, activation=leaky_relu, kernel_initializer=tf.contrib.layers.xavier_initializer())
        fc2 = tf.layers.dense(fc1, 256, activation=leaky_relu, kernel_initializer=tf.contrib.layers.xavier_initializer())
        logits = tf.layers.dense(fc2, 1, kernel_initializer=tf.contrib.layers.xavier_initializer())
        return logits

Test to make sure the number of parameters in the discriminator is correct:


In [8]:
def test_discriminator(true_count=267009):
    tf.reset_default_graph()
    with get_session() as sess:
        y = discriminator(tf.ones((2, 784)))
        cur_count = count_params()
        if cur_count != true_count:
            print('Incorrect number of parameters in discriminator. {0} instead of {1}. Check your achitecture.'.format(cur_count,true_count))
        else:
            print('Correct number of parameters in discriminator.')
        
test_discriminator()


Correct number of parameters in discriminator.

Generator

Now to build a generator. You should use the layers in tf.layers to construct the model. All fully connected layers should include bias terms.

Architecture:

  • Fully connected layer from tf.shape(z)[1] (the number of noise dimensions) to 1024
  • ReLU
  • Fully connected layer from 1024 to 1024
  • ReLU
  • Fully connected layer from 1024 to 784
  • TanH (To restrict the output to be [-1,1])

In [9]:
def generator(z):
    """Generate images from a random noise vector.
    
    Inputs:
    - z: TensorFlow Tensor of random noise with shape [batch_size, noise_dim]
    
    Returns:
    TensorFlow Tensor of generated images, with shape [batch_size, 784].
    """
    with tf.variable_scope("generator"):
        # TODO: implement architecture
        fc1 = tf.layers.dense(z, 1024, activation=tf.nn.relu, kernel_initializer=tf.contrib.layers.xavier_initializer())
        fc2 = tf.layers.dense(fc1, 1024, activation=tf.nn.relu, kernel_initializer=tf.contrib.layers.xavier_initializer())
        img = tf.layers.dense(fc2, 784, activation=tf.tanh, kernel_initializer=tf.contrib.layers.xavier_initializer())
        return img

Test to make sure the number of parameters in the generator is correct:


In [10]:
def test_generator(true_count=1858320):
    tf.reset_default_graph()
    with get_session() as sess:
        y = generator(tf.ones((1, 4)))
        cur_count = count_params()
        if cur_count != true_count:
            print('Incorrect number of parameters in generator. {0} instead of {1}. Check your achitecture.'.format(cur_count,true_count))
        else:
            print('Correct number of parameters in generator.')
        
test_generator()


Correct number of parameters in generator.

GAN Loss

Compute the generator and discriminator loss. The generator loss is: $$\ell_G = -\mathbb{E}_{z \sim p(z)}\left[\log D(G(z))\right]$$ and the discriminator loss is: $$ \ell_D = -\mathbb{E}_{x \sim p_\text{data}}\left[\log D(x)\right] - \mathbb{E}_{z \sim p(z)}\left[\log \left(1-D(G(z))\right)\right]$$ Note that these are negated from the equations presented earlier as we will be minimizing these losses.

HINTS: Use tf.ones_like and tf.zeros_like to generate labels for your discriminator. Use sigmoid_cross_entropy loss to help compute your loss function. Instead of computing the expectation, we will be averaging over elements of the minibatch, so make sure to combine the loss by averaging instead of summing.


In [11]:
def gan_loss(logits_real, logits_fake):
    """Compute the GAN loss.
    
    Inputs:
    - logits_real: Tensor, shape [batch_size, 1], output of discriminator
        Log probability that the image is real for each real image
    - logits_fake: Tensor, shape[batch_size, 1], output of discriminator
        Log probability that the image is real for each fake image
    
    Returns:
    - D_loss: discriminator loss scalar
    - G_loss: generator loss scalar
    """
    # TODO: compute D_loss and G_loss
    D_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=tf.ones_like(logits_real), logits=logits_real)
                            + tf.nn.sigmoid_cross_entropy_with_logits(labels=tf.zeros_like(logits_fake), logits=logits_fake))
    G_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=tf.ones_like(logits_fake), logits=logits_fake))
    pass
    return D_loss, G_loss

Test your GAN loss. Make sure both the generator and discriminator loss are correct. You should see errors less than 1e-5.


In [12]:
def test_gan_loss(logits_real, logits_fake, d_loss_true, g_loss_true):
    tf.reset_default_graph()
    with get_session() as sess:
        d_loss, g_loss = sess.run(gan_loss(tf.constant(logits_real), tf.constant(logits_fake)))
    print("Maximum error in d_loss: %g"%rel_error(d_loss_true, d_loss))
    print("Maximum error in g_loss: %g"%rel_error(g_loss_true, g_loss))

test_gan_loss(answers['logits_real'], answers['logits_fake'],
              answers['d_loss_true'], answers['g_loss_true'])


Maximum error in d_loss: 0
Maximum error in g_loss: 0

Optimizing our loss

Make an AdamOptimizer with a 1e-3 learning rate, beta1=0.5 to mininize G_loss and D_loss separately. The trick of decreasing beta was shown to be effective in helping GANs converge in the Improved Techniques for Training GANs paper. In fact, with our current hyperparameters, if you set beta1 to the Tensorflow default of 0.9, there's a good chance your discriminator loss will go to zero and the generator will fail to learn entirely. In fact, this is a common failure mode in GANs; if your D(x) learns to be too fast (e.g. loss goes near zero), your G(z) is never able to learn. Often D(x) is trained with SGD with Momentum or RMSProp instead of Adam, but here we'll use Adam for both D(x) and G(z).


In [13]:
# TODO: create an AdamOptimizer for D_solver and G_solver
def get_solvers(learning_rate=1e-3, beta1=0.5):
    """Create solvers for GAN training.
    
    Inputs:
    - learning_rate: learning rate to use for both solvers
    - beta1: beta1 parameter for both solvers (first moment decay)
    
    Returns:
    - D_solver: instance of tf.train.AdamOptimizer with correct learning_rate and beta1
    - G_solver: instance of tf.train.AdamOptimizer with correct learning_rate and beta1
    """
    D_solver = tf.train.AdamOptimizer(learning_rate=learning_rate, beta1=beta1)
    G_solver = tf.train.AdamOptimizer(learning_rate=learning_rate, beta1=beta1)
    pass
    return D_solver, G_solver

Putting it all together

Now just a bit of Lego Construction.. Read this section over carefully to understand how we'll be composing the generator and discriminator


In [14]:
tf.reset_default_graph()

# number of images for each batch
batch_size = 128
# our noise dimension
noise_dim = 96

# placeholder for images from the training dataset
x = tf.placeholder(tf.float32, [None, 784])
# random noise fed into our generator
z = sample_noise(batch_size, noise_dim)
# generated images
G_sample = generator(z)

with tf.variable_scope("") as scope:
    #scale images to be -1 to 1
    logits_real = discriminator(preprocess_img(x))
    # Re-use discriminator weights on new inputs
    scope.reuse_variables()
    logits_fake = discriminator(G_sample)

# Get the list of variables for the discriminator and generator
D_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'discriminator')
G_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'generator') 

# get our solver
D_solver, G_solver = get_solvers()

# get our loss
D_loss, G_loss = gan_loss(logits_real, logits_fake)

# setup training steps
D_train_step = D_solver.minimize(D_loss, var_list=D_vars)
G_train_step = G_solver.minimize(G_loss, var_list=G_vars)
D_extra_step = tf.get_collection(tf.GraphKeys.UPDATE_OPS, 'discriminator')
G_extra_step = tf.get_collection(tf.GraphKeys.UPDATE_OPS, 'generator')

Training a GAN!

Well that wasn't so hard, was it? In the iterations in the low 100s you should see black backgrounds, fuzzy shapes as you approach iteration 1000, and decent shapes, about half of which will be sharp and clearly recognizable as we pass 3000. In our case, we'll simply train D(x) and G(z) with one batch each every iteration. However, papers often experiment with different schedules of training D(x) and G(z), sometimes doing one for more steps than the other, or even training each one until the loss gets "good enough" and then switching to training the other.


In [15]:
# a giant helper function
def run_a_gan(sess, G_train_step, G_loss, D_train_step, D_loss, G_extra_step, D_extra_step,\
              show_every=250, print_every=50, batch_size=128, num_epoch=10):
    """Train a GAN for a certain number of epochs.
    
    Inputs:
    - sess: A tf.Session that we want to use to run our data
    - G_train_step: A training step for the Generator
    - G_loss: Generator loss
    - D_train_step: A training step for the Generator
    - D_loss: Discriminator loss
    - G_extra_step: A collection of tf.GraphKeys.UPDATE_OPS for generator
    - D_extra_step: A collection of tf.GraphKeys.UPDATE_OPS for discriminator
    Returns:
        Nothing
    """
    # compute the number of iterations we need
    max_iter = int(mnist.train.num_examples*num_epoch/batch_size)
    for it in range(max_iter):
        # every show often, show a sample result
        if it % show_every == 0:
            samples = sess.run(G_sample)
            fig = show_images(samples[:16])
            plt.show()
            print()
        # run a batch of data through the network
        minibatch,minbatch_y = mnist.train.next_batch(batch_size)
        _, D_loss_curr = sess.run([D_train_step, D_loss], feed_dict={x: minibatch})
        _, G_loss_curr = sess.run([G_train_step, G_loss])

        # print loss every so often.
        # We want to make sure D_loss doesn't go to 0
        if it % print_every == 0:
            print('Iter: {}, D: {:.4}, G:{:.4}'.format(it,D_loss_curr,G_loss_curr))
    print('Final images')
    samples = sess.run(G_sample)

    fig = show_images(samples[:16])
    plt.show()

Train your GAN! This should take about 10 minutes on a CPU, or less than a minute on GPU.


In [16]:
with get_session() as sess:
    sess.run(tf.global_variables_initializer())
    run_a_gan(sess,G_train_step,G_loss,D_train_step,D_loss,G_extra_step,D_extra_step)


Iter: 0, D: 1.125, G:0.7955
Iter: 50, D: 0.3383, G:1.68
Iter: 100, D: 1.243, G:2.678
Iter: 150, D: 1.487, G:1.205
Iter: 200, D: 1.24, G:1.073
Iter: 250, D: 0.7256, G:2.039
Iter: 300, D: 1.321, G:0.5497
Iter: 350, D: 1.105, G:1.312
Iter: 400, D: 0.8634, G:1.69
Iter: 450, D: 0.9237, G:1.37
Iter: 500, D: 1.114, G:1.543
Iter: 550, D: 0.8725, G:1.376
Iter: 600, D: 1.296, G:1.066
Iter: 650, D: 2.942, G:1.321
Iter: 700, D: 0.9217, G:1.259
Iter: 750, D: 1.277, G:0.9742
Iter: 800, D: 1.18, G:1.243
Iter: 850, D: 1.471, G:2.51
Iter: 900, D: 1.133, G:1.537
Iter: 950, D: 1.232, G:1.381
Iter: 1000, D: 1.037, G:1.554
Iter: 1050, D: 1.203, G:1.042
Iter: 1100, D: 1.297, G:0.6495
Iter: 1150, D: 1.403, G:0.7953
Iter: 1200, D: 1.216, G:1.944
Iter: 1250, D: 1.085, G:0.9763
Iter: 1300, D: 1.303, G:1.355
Iter: 1350, D: 1.355, G:0.2764
Iter: 1400, D: 1.222, G:1.473
Iter: 1450, D: 1.151, G:1.003
Iter: 1500, D: 1.278, G:0.9864
Iter: 1550, D: 1.1, G:1.123
Iter: 1600, D: 1.136, G:1.054
Iter: 1650, D: 1.195, G:0.8604
Iter: 1700, D: 1.318, G:0.8552
Iter: 1750, D: 1.159, G:0.8724
Iter: 1800, D: 1.299, G:0.8135
Iter: 1850, D: 1.433, G:1.002
Iter: 1900, D: 1.266, G:0.9244
Iter: 1950, D: 1.334, G:1.008
Iter: 2000, D: 1.247, G:0.9392
Iter: 2050, D: 1.308, G:0.8412
Iter: 2100, D: 1.311, G:0.8677
Iter: 2150, D: 1.301, G:0.7867
Iter: 2200, D: 1.236, G:0.876
Iter: 2250, D: 1.282, G:0.8435
Iter: 2300, D: 1.264, G:0.9069
Iter: 2350, D: 1.295, G:0.8718
Iter: 2400, D: 1.375, G:0.7659
Iter: 2450, D: 1.39, G:0.7857
Iter: 2500, D: 1.331, G:0.8643
Iter: 2550, D: 1.417, G:0.8645
Iter: 2600, D: 1.299, G:0.9348
Iter: 2650, D: 1.391, G:0.8281
Iter: 2700, D: 1.332, G:0.8005
Iter: 2750, D: 1.297, G:0.868
Iter: 2800, D: 1.256, G:0.8507
Iter: 2850, D: 1.274, G:0.8872
Iter: 2900, D: 1.322, G:0.8064
Iter: 2950, D: 1.337, G:0.8275
Iter: 3000, D: 1.265, G:0.8494
Iter: 3050, D: 1.288, G:0.7733
Iter: 3100, D: 1.222, G:0.9034
Iter: 3150, D: 1.32, G:1.864
Iter: 3200, D: 1.353, G:0.9187
Iter: 3250, D: 1.294, G:0.7789
Iter: 3300, D: 1.254, G:0.7609
Iter: 3350, D: 1.323, G:0.7842
Iter: 3400, D: 1.313, G:0.8699
Iter: 3450, D: 1.332, G:0.7987
Iter: 3500, D: 1.253, G:0.8597
Iter: 3550, D: 1.293, G:0.7841
Iter: 3600, D: 1.297, G:0.8415
Iter: 3650, D: 1.307, G:0.8018
Iter: 3700, D: 1.364, G:0.7813
Iter: 3750, D: 1.339, G:0.8195
Iter: 3800, D: 1.312, G:0.7576
Iter: 3850, D: 1.403, G:0.6472
Iter: 3900, D: 1.346, G:0.8113
Iter: 3950, D: 1.261, G:0.8743
Iter: 4000, D: 1.297, G:0.8311
Iter: 4050, D: 1.276, G:0.772
Iter: 4100, D: 1.291, G:0.8803
Iter: 4150, D: 1.293, G:0.7718
Iter: 4200, D: 1.306, G:0.7689
Iter: 4250, D: 1.343, G:0.8302
Final images

Least Squares GAN

We'll now look at Least Squares GAN, a newer, more stable alternative to the original GAN loss function. For this part, all we have to do is change the loss function and retrain the model. We'll implement equation (9) in the paper, with the generator loss: $$\ell_G = \frac{1}{2}\mathbb{E}_{z \sim p(z)}\left[\left(D(G(z))-1\right)^2\right]$$ and the discriminator loss: $$ \ell_D = \frac{1}{2}\mathbb{E}_{x \sim p_\text{data}}\left[\left(D(x)-1\right)^2\right] + \frac{1}{2}\mathbb{E}_{z \sim p(z)}\left[ \left(D(G(z))\right)^2\right]$$

HINTS: Instead of computing the expectation, we will be averaging over elements of the minibatch, so make sure to combine the loss by averaging instead of summing. When plugging in for $D(x)$ and $D(G(z))$ use the direct output from the discriminator (score_real and score_fake).


In [17]:
def lsgan_loss(score_real, score_fake):
    """Compute the Least Squares GAN loss.
    
    Inputs:
    - score_real: Tensor, shape [batch_size, 1], output of discriminator
        score for each real image
    - score_fake: Tensor, shape[batch_size, 1], output of discriminator
        score for each fake image    
          
    Returns:
    - D_loss: discriminator loss scalar
    - G_loss: generator loss scalar
    """
    # TODO: compute D_loss and G_loss
    D_loss = .5 * tf.reduce_mean(tf.pow(score_real - 1, 2) + tf.pow(score_fake, 2))
    G_loss = .5 * tf.reduce_mean(tf.pow(score_fake - 1, 2))
    pass
    return D_loss, G_loss

Test your LSGAN loss. You should see errors less than 1e-7.


In [18]:
def test_lsgan_loss(score_real, score_fake, d_loss_true, g_loss_true):
    with get_session() as sess:
        d_loss, g_loss = sess.run(
            lsgan_loss(tf.constant(score_real), tf.constant(score_fake)))
    print("Maximum error in d_loss: %g"%rel_error(d_loss_true, d_loss))
    print("Maximum error in g_loss: %g"%rel_error(g_loss_true, g_loss))

test_lsgan_loss(answers['logits_real'], answers['logits_fake'],
                answers['d_loss_lsgan_true'], answers['g_loss_lsgan_true'])


Maximum error in d_loss: 5.91479e-17
Maximum error in g_loss: 0

Create new training steps so we instead minimize the LSGAN loss:


In [19]:
D_loss, G_loss = lsgan_loss(logits_real, logits_fake)
D_train_step = D_solver.minimize(D_loss, var_list=D_vars)
G_train_step = G_solver.minimize(G_loss, var_list=G_vars)

In [20]:
with get_session() as sess:
    sess.run(tf.global_variables_initializer())
    run_a_gan(sess, G_train_step, G_loss, D_train_step, D_loss, G_extra_step, D_extra_step)


Iter: 0, D: 1.176, G:0.3162
Iter: 50, D: 0.01627, G:0.7769
Iter: 100, D: 0.162, G:1.517
Iter: 150, D: 0.1555, G:0.4253
Iter: 200, D: 0.1043, G:0.7053
Iter: 250, D: 0.197, G:1.213
Iter: 300, D: 0.1333, G:0.6251
Iter: 350, D: 0.2211, G:0.6541
Iter: 400, D: 0.07317, G:0.4406
Iter: 450, D: 0.1072, G:0.5151
Iter: 500, D: 0.1342, G:0.7475
Iter: 550, D: 0.08195, G:0.4665
Iter: 600, D: 0.153, G:0.3617
Iter: 650, D: 0.09536, G:0.4049
Iter: 700, D: 0.2136, G:0.2555
Iter: 750, D: 0.1226, G:0.5376
Iter: 800, D: 0.06731, G:0.732
Iter: 850, D: 0.1122, G:0.4479
Iter: 900, D: 0.2991, G:0.2705
Iter: 950, D: 0.101, G:0.4012
Iter: 1000, D: 0.1394, G:0.3156
Iter: 1050, D: 0.176, G:0.2478
Iter: 1100, D: 0.2161, G:0.01675
Iter: 1150, D: 0.1395, G:0.2821
Iter: 1200, D: 0.2272, G:0.2852
Iter: 1250, D: 0.1888, G:0.3278
Iter: 1300, D: 0.1605, G:0.2085
Iter: 1350, D: 0.2186, G:0.1953
Iter: 1400, D: 0.2097, G:0.1997
Iter: 1450, D: 0.2247, G:0.2201
Iter: 1500, D: 0.1504, G:0.2666
Iter: 1550, D: 0.1701, G:0.2019
Iter: 1600, D: 0.1742, G:0.2465
Iter: 1650, D: 0.1911, G:0.2298
Iter: 1700, D: 0.1793, G:0.2149
Iter: 1750, D: 0.1649, G:0.2521
Iter: 1800, D: 0.2003, G:0.2441
Iter: 1850, D: 0.2124, G:0.2345
Iter: 1900, D: 0.232, G:0.2077
Iter: 1950, D: 0.194, G:0.2217
Iter: 2000, D: 0.243, G:0.1897
Iter: 2050, D: 0.1954, G:0.2113
Iter: 2100, D: 0.2055, G:0.09931
Iter: 2150, D: 0.2242, G:0.1733
Iter: 2200, D: 0.2009, G:0.2001
Iter: 2250, D: 0.2344, G:0.2038
Iter: 2300, D: 0.2372, G:0.1673
Iter: 2350, D: 0.2496, G:0.1749
Iter: 2400, D: 0.2403, G:0.1891
Iter: 2450, D: 0.2066, G:0.1749
Iter: 2500, D: 0.2271, G:0.1736
Iter: 2550, D: 0.216, G:0.195
Iter: 2600, D: 0.2294, G:0.1952
Iter: 2650, D: 0.2508, G:0.1663
Iter: 2700, D: 0.2282, G:0.1632
Iter: 2750, D: 0.2201, G:0.1642
Iter: 2800, D: 0.2239, G:0.1799
Iter: 2850, D: 0.2242, G:0.1555
Iter: 2900, D: 0.2482, G:0.1437
Iter: 2950, D: 0.2392, G:0.1651
Iter: 3000, D: 0.2107, G:0.1782
Iter: 3050, D: 0.2264, G:0.1976
Iter: 3100, D: 0.2083, G:0.2
Iter: 3150, D: 0.2312, G:0.1616
Iter: 3200, D: 0.2256, G:0.1858
Iter: 3250, D: 0.2297, G:0.1704
Iter: 3300, D: 0.2449, G:0.1778
Iter: 3350, D: 0.2194, G:0.1631
Iter: 3400, D: 0.2134, G:0.1806
Iter: 3450, D: 0.2422, G:0.1778
Iter: 3500, D: 0.2094, G:0.1809
Iter: 3550, D: 0.2276, G:0.1869
Iter: 3600, D: 0.227, G:0.173
Iter: 3650, D: 0.212, G:0.2
Iter: 3700, D: 0.2194, G:0.1599
Iter: 3750, D: 0.2192, G:0.1589
Iter: 3800, D: 0.2492, G:0.2088
Iter: 3850, D: 0.2399, G:0.1701
Iter: 3900, D: 0.2206, G:0.1902
Iter: 3950, D: 0.2099, G:0.1781
Iter: 4000, D: 0.2103, G:0.1717
Iter: 4050, D: 0.2337, G:0.1995
Iter: 4100, D: 0.2271, G:0.1691
Iter: 4150, D: 0.2236, G:0.1508
Iter: 4200, D: 0.2304, G:0.17
Iter: 4250, D: 0.2215, G:0.1689
Final images

INLINE QUESTION 1:

Describe how the visual quality of the samples changes over the course of training. Do you notice anything about the distribution of the samples? How do the results change across different training runs?

The distribution of samples at first is just a bunch of scattered points, during the course of training, the samples tend to cluster in specified locations to take on a sharper and clearer shape of digits.

Deep Convolutional GANs

In the first part of the notebook, we implemented an almost direct copy of the original GAN network from Ian Goodfellow. However, this network architecture allows no real spatial reasoning. It is unable to reason about things like "sharp edges" in general because it lacks any convolutional layers. Thus, in this section, we will implement some of the ideas from DCGAN, where we use convolutional networks as our discriminators and generators.

Discriminator

We will use a discriminator inspired by the TensorFlow MNIST classification tutorial, which is able to get above 99% accuracy on the MNIST dataset fairly quickly. Be sure to check the dimensions of x and reshape when needed, fully connected blocks expect [N,D] Tensors while conv2d blocks expect [N,H,W,C] Tensors.

Architecture:

  • 32 Filters, 5x5, Stride 1, Leaky ReLU(alpha=0.01)
  • Max Pool 2x2, Stride 2
  • 64 Filters, 5x5, Stride 1, Leaky ReLU(alpha=0.01)
  • Max Pool 2x2, Stride 2
  • Flatten
  • Fully Connected size 4 x 4 x 64, Leaky ReLU(alpha=0.01)
  • Fully Connected size 1

In [21]:
def discriminator(x):
    """Compute discriminator score for a batch of input images.
    
    Inputs:
    - x: TensorFlow Tensor of flattened input images, shape [batch_size, 784]
    
    Returns:
    TensorFlow Tensor with shape [batch_size, 1], containing the score 
    for an image being real for each input image.
    """
    with tf.variable_scope("discriminator"):
        # TODO: implement architecture
        batch_size = tf.shape(x)[0]
        x = tf.reshape(x, [batch_size, 28, 28, 1])
        conv1 = tf.layers.conv2d(x, 32, 5, 1, activation=leaky_relu, kernel_initializer=tf.contrib.layers.xavier_initializer())
        pool1 = tf.layers.max_pooling2d(conv1, 2, 2)
        conv2 = tf.layers.conv2d(pool1, 64, 5, 1, activation=leaky_relu, kernel_initializer=tf.contrib.layers.xavier_initializer())
        pool2 = tf.layers.max_pooling2d(conv2, 2, 2)
        flatten = tf.reshape(pool2, [batch_size, 4 * 4 * 64])
        fc = tf.layers.dense(flatten, 4 * 4 * 64, activation=leaky_relu, kernel_initializer=tf.contrib.layers.xavier_initializer())
        logits = tf.layers.dense(fc, 1, activation=leaky_relu, kernel_initializer=tf.contrib.layers.xavier_initializer())
        return logits
test_discriminator(1102721)


Correct number of parameters in discriminator.

Generator

For the generator, we will copy the architecture exactly from the InfoGAN paper. See Appendix C.1 MNIST. See the documentation for tf.nn.conv2d_transpose. We are always "training" in GAN mode.

Architecture:

  • Fully connected of size 1024, ReLU
  • BatchNorm
  • Fully connected of size 7 x 7 x 128, ReLU
  • BatchNorm
  • Resize into Image Tensor
  • 64 conv2d^T (transpose) filters of 4x4, stride 2, ReLU
  • BatchNorm
  • 1 conv2d^T (transpose) filter of 4x4, stride 2, TanH

In [22]:
def generator(z):
    """Generate images from a random noise vector.
    
    Inputs:
    - z: TensorFlow Tensor of random noise with shape [batch_size, noise_dim]
    
    Returns:
    TensorFlow Tensor of generated images, with shape [batch_size, 784].
    """
    with tf.variable_scope("generator"):
        # TODO: implement architecture
        batch_size = tf.shape(z)[0]
        fc1 = tf.layers.dense(z, 1024, activation=tf.nn.relu, kernel_initializer=tf.contrib.layers.xavier_initializer())
        bn1 = tf.layers.batch_normalization(fc1, training=True)
        fc2 = tf.layers.dense(bn1, 7 * 7 * 128, activation=tf.nn.relu, kernel_initializer=tf.contrib.layers.xavier_initializer())
        bn2 = tf.layers.batch_normalization(fc2, training=True)
        flatten = tf.reshape(bn2, [batch_size, 7, 7, 128])
        convT1 = tf.layers.conv2d_transpose(flatten, 64, 4, 2, padding='same', activation=tf.nn.relu, kernel_initializer=tf.contrib.layers.xavier_initializer())
        bn3 = tf.layers.batch_normalization(convT1, training=True)
        convT2 = tf.layers.conv2d_transpose(bn3, 1, 4, 2, padding='same', activation=tf.tanh, kernel_initializer=tf.contrib.layers.xavier_initializer())
        img = tf.reshape(convT2, [batch_size, 784])
        return img
test_generator(6595521)


Correct number of parameters in generator.

We have to recreate our network since we've changed our functions.


In [23]:
tf.reset_default_graph()

batch_size = 128
# our noise dimension
noise_dim = 96

# placeholders for images from the training dataset
x = tf.placeholder(tf.float32, [None, 784])
z = sample_noise(batch_size, noise_dim)
# generated images
G_sample = generator(z)

with tf.variable_scope("") as scope:
    #scale images to be -1 to 1
    logits_real = discriminator(preprocess_img(x))
    # Re-use discriminator weights on new inputs
    scope.reuse_variables()
    logits_fake = discriminator(G_sample)

# Get the list of variables for the discriminator and generator
D_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,'discriminator')
G_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,'generator') 

D_solver,G_solver = get_solvers()
D_loss, G_loss = gan_loss(logits_real, logits_fake)
D_train_step = D_solver.minimize(D_loss, var_list=D_vars)
G_train_step = G_solver.minimize(G_loss, var_list=G_vars)
D_extra_step = tf.get_collection(tf.GraphKeys.UPDATE_OPS,'discriminator')
G_extra_step = tf.get_collection(tf.GraphKeys.UPDATE_OPS,'generator')

Train and evaluate a DCGAN

This is the one part of A3 that significantly benefits from using a GPU. It takes 3 minutes on a GPU for the requested five epochs. Or about 50 minutes on a dual core laptop on CPU (feel free to use 3 epochs if you do it on CPU).


In [24]:
with get_session() as sess:
    sess.run(tf.global_variables_initializer())
    run_a_gan(sess,G_train_step,G_loss,D_train_step,D_loss,G_extra_step,D_extra_step,num_epoch=5)


Iter: 0, D: 1.386, G:0.6941
Iter: 50, D: 1.318, G:0.6941
Iter: 100, D: 1.273, G:0.7292
Iter: 150, D: 1.218, G:0.6247
Iter: 200, D: 1.374, G:0.718
Iter: 250, D: 1.198, G:0.6326
Iter: 300, D: 1.384, G:0.7219
Iter: 350, D: 1.378, G:0.7186
Iter: 400, D: 1.383, G:0.7103
Iter: 450, D: 1.386, G:0.7166
Iter: 500, D: 1.383, G:0.7094
Iter: 550, D: 1.203, G:0.6883
Iter: 600, D: 1.195, G:0.6157
Iter: 650, D: 1.327, G:0.6993
Iter: 700, D: 1.313, G:0.6794
Iter: 750, D: 1.329, G:0.691
Iter: 800, D: 1.268, G:0.692
Iter: 850, D: 1.38, G:0.7167
Iter: 900, D: 1.207, G:0.6099
Iter: 950, D: 1.217, G:0.6129
Iter: 1000, D: 1.281, G:0.6883
Iter: 1050, D: 1.289, G:0.6788
Iter: 1100, D: 1.177, G:0.5949
Iter: 1150, D: 1.194, G:0.6262
Iter: 1200, D: 1.149, G:0.5674
Iter: 1250, D: 1.169, G:0.6049
Iter: 1300, D: 1.178, G:0.6306
Iter: 1350, D: 1.164, G:0.6248
Iter: 1400, D: 1.158, G:0.5483
Iter: 1450, D: 1.202, G:0.6747
Iter: 1500, D: 1.181, G:0.669
Iter: 1550, D: 1.218, G:0.6396
Iter: 1600, D: 1.261, G:0.7025
Iter: 1650, D: 1.173, G:0.6533
Iter: 1700, D: 1.17, G:0.5939
Iter: 1750, D: 1.285, G:0.6884
Iter: 1800, D: 1.157, G:0.6276
Iter: 1850, D: 1.356, G:0.7101
Iter: 1900, D: 1.155, G:0.5715
Iter: 1950, D: 1.164, G:0.6266
Iter: 2000, D: 1.175, G:0.6692
Iter: 2050, D: 1.378, G:0.7221
Iter: 2100, D: 1.136, G:0.6777
Final images

INLINE QUESTION 2:

What differences do you see between the DCGAN results and the original GAN results?

Edges of characters are much sharper and clearer, without scattered points surrounding. And the training process converges faster.


Extra Credit

Be sure you don't destroy your results above, but feel free to copy+paste code to get results below

  • For a small amount of extra credit, you can implement additional new GAN loss functions below, provided they converge. See AFI, BiGAN, Softmax GAN, Conditional GAN, InfoGAN, etc. They should converge to get credit.
  • Likewise for an improved architecture or using a convolutional GAN (or even implement a VAE)
  • For a bigger chunk of extra credit, load the CIFAR10 data (see last assignment) and train a compelling generative model on CIFAR-10
  • Demonstrate the value of GANs in building semi-supervised models. In a semi-supervised example, only some fraction of the input data has labels; we can supervise this in MNIST by only training on a few dozen or hundred labeled examples. This was first described in Improved Techniques for Training GANs.
  • Something new/cool.

Describe what you did here

WGAN-GP (Small Extra Credit)

Please only attempt after you have completed everything above.

We'll now look at Improved Wasserstein GAN as a newer, more stable alernative to the original GAN loss function. For this part, all we have to do is change the loss function and retrain the model. We'll implement Algorithm 1 in the paper.

You'll also need to use a discriminator and corresponding generator without max-pooling. So we cannot use the one we currently have from DCGAN. Pair the DCGAN Generator (from InfoGAN) with the discriminator from InfoGAN Appendix C.1 MNIST (We don't use Q, simply implement the network up to D). You're also welcome to define a new generator and discriminator in this notebook, in case you want to use the fully-connected pair of D(x) and G(z) you used at the top of this notebook.

Architecture:

  • 64 Filters of 4x4, stride 2, LeakyReLU
  • 128 Filters of 4x4, stride 2, LeakyReLU
  • BatchNorm
  • Flatten
  • Fully connected 1024, LeakyReLU
  • Fully connected size 1

In [25]:
def discriminator(x):
    with tf.variable_scope('discriminator'):
        # TODO: implement architecture
        batch_size = tf.shape(x)[0]
        x = tf.reshape(x, [batch_size, 28, 28, 1])
        conv1 = tf.layers.conv2d(x, 64, 4, 2, activation=leaky_relu, kernel_initializer=tf.contrib.layers.xavier_initializer())
        conv2 = tf.layers.conv2d(conv1, 128, 4, 2, activation=leaky_relu, kernel_initializer=tf.contrib.layers.xavier_initializer())
        bn = tf.layers.batch_normalization(conv2, training=True)
        flatten = tf.reshape(bn, [-1, 5 * 5 * 128])
        fc = tf.layers.dense(flatten, 1024, activation=leaky_relu)
        logits = tf.layers.dense(fc, 1)
        return logits
test_discriminator(3411649)


Correct number of parameters in discriminator.

In [26]:
tf.reset_default_graph()

batch_size = 128
# our noise dimension
noise_dim = 96

# placeholders for images from the training dataset
x = tf.placeholder(tf.float32, [None, 784])
z = sample_noise(batch_size, noise_dim)
# generated images
G_sample = generator(z)

with tf.variable_scope("") as scope:
    #scale images to be -1 to 1
    logits_real = discriminator(preprocess_img(x))
    # Re-use discriminator weights on new inputs
    scope.reuse_variables()
    logits_fake = discriminator(G_sample)

# Get the list of variables for the discriminator and generator
D_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,'discriminator')
G_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,'generator')

D_solver, G_solver = get_solvers()

In [27]:
def wgangp_loss(logits_real, logits_fake, batch_size, x, G_sample):
    """Compute the WGAN-GP loss.
    
    Inputs:
    - logits_real: Tensor, shape [batch_size, 1], output of discriminator
        Log probability that the image is real for each real image
    - logits_fake: Tensor, shape[batch_size, 1], output of discriminator
        Log probability that the image is real for each fake image
    - batch_size: The number of examples in this batch
    - x: the input (real) images for this batch
    - G_sample: the generated (fake) images for this batch
    
    Returns:
    - D_loss: discriminator loss scalar
    - G_loss: generator loss scalar
    """
    # TODO: compute D_loss and G_loss
    D_loss = tf.reduce_mean(logits_fake - logits_real)
    G_loss = -tf.reduce_mean(logits_fake)

    # lambda from the paper
    lam = 10
    
    # random sample of batch_size (tf.random_uniform)
    eps = tf.random_uniform([batch_size, 1], minval=0, maxval=1)
    x_hat = eps * x + (1 - eps) * G_sample

    # Gradients of Gradients is kind of tricky!
    with tf.variable_scope('',reuse=True) as scope:
        grad_D_x_hat = tf.gradients(discriminator(x_hat), x_hat)[0]

    grad_norm = tf.norm(grad_D_x_hat, axis=1)
    grad_pen = lam * tf.reduce_mean(tf.pow(grad_norm - 1, 2))

    D_loss += grad_pen

    return D_loss, G_loss

D_loss, G_loss = wgangp_loss(logits_real, logits_fake, 128, x, G_sample)
D_train_step = D_solver.minimize(D_loss, var_list=D_vars)
G_train_step = G_solver.minimize(G_loss, var_list=G_vars)
D_extra_step = tf.get_collection(tf.GraphKeys.UPDATE_OPS,'discriminator')
G_extra_step = tf.get_collection(tf.GraphKeys.UPDATE_OPS,'generator')

In [28]:
with get_session() as sess:
    sess.run(tf.global_variables_initializer())
    run_a_gan(sess,G_train_step,G_loss,D_train_step,D_loss,G_extra_step,D_extra_step,batch_size=128,num_epoch=5)


Iter: 0, D: 15.48, G:-0.1789
Iter: 50, D: -2.097, G:-4.444
Iter: 100, D: -1.002, G:-3.004
Iter: 150, D: -1.538, G:-2.803
Iter: 200, D: -2.074, G:0.005068
Iter: 250, D: 0.3441, G:-1.859
Iter: 300, D: -0.09554, G:2.699
Iter: 350, D: 0.008937, G:0.4969
Iter: 400, D: -0.4918, G:4.466
Iter: 450, D: -0.1861, G:0.3008
Iter: 500, D: 0.1378, G:-0.951
Iter: 550, D: -0.2116, G:2.587
Iter: 600, D: 0.212, G:-2.206
Iter: 650, D: -0.7784, G:1.703
Iter: 700, D: -0.3046, G:-3.621
Iter: 750, D: -0.4267, G:-2.787
Iter: 800, D: -0.4423, G:-1.107
Iter: 850, D: -1.075, G:-0.1405
Iter: 900, D: -1.101, G:-0.977
Iter: 950, D: -2.373, G:-5.488
Iter: 1000, D: -0.9281, G:-1.945
Iter: 1050, D: -1.689, G:-2.093
Iter: 1100, D: -3.557, G:1.936
Iter: 1150, D: -4.37, G:-5.942
Iter: 1200, D: -0.996, G:-7.309
Iter: 1250, D: -1.148, G:-2.124
Iter: 1300, D: -1.837, G:-4.927
Iter: 1350, D: -3.932, G:-0.9184
Iter: 1400, D: 0.3364, G:-1.729
Iter: 1450, D: -0.4071, G:-0.9541
Iter: 1500, D: 0.3319, G:3.435
Iter: 1550, D: 0.3128, G:-1.526
Iter: 1600, D: -0.3123, G:2.782
Iter: 1650, D: -0.06422, G:0.1207
Iter: 1700, D: 0.1828, G:0.6551
Iter: 1750, D: 0.1192, G:-3.53
Iter: 1800, D: 0.1141, G:1.699
Iter: 1850, D: -0.03114, G:-0.4791
Iter: 1900, D: -0.2606, G:-0.131
Iter: 1950, D: 0.08934, G:3.095
Iter: 2000, D: -0.097, G:0.1247
Iter: 2050, D: -0.6976, G:-3.974
Iter: 2100, D: -1.23, G:-5.535
Final images