DCGAN: Deep Convolutional GAN

In this lab you are going to code a Deep Convolutional GAN (DCGAN), aka a GAN made of convolutions. As we are trying to generate images, convolutions are the perfect layer to use.

This lab is quite RAM & GPU intensive, thus we recommand you to run in a Google Colab.


In [ ]:
WORK_ON_COLAB = True

if WORK_ON_COLAB:
    try:
    # %tensorflow_version only exists in Colab.
    %tensorflow_version 2.x
    except Exception:
        pass

In [ ]:
import tensorflow as tf
tf.__version__

Data

We are download our grayscale dataset (it could be either MNIST or Fashion-MNIST). Note that we are resizing images to a larger scale to match the original paper.


In [ ]:
(images, labels), (_, _) = tf.keras.datasets.mnist.load_data()

In [ ]:
from skimage.transform import resize

images = resize(images, (images.shape[0], 64, 64, 1), preserve_range=True).astype("float32")
images = (images - 127.5) / 127.5

images.shape

In [ ]:
data_generator = tf.data.Dataset.from_tensor_slices(
    images).shuffle(60000).batch(100, drop_remainder=True)

In [ ]:
%matplotlib inline
import matplotlib.pyplot as plt

def display(images, row=2, col=10):
    fig = plt.figure(figsize=(20, 3))
    it = 0
    for r in range(row):
        for c in range(col):
            ax = plt.subplot(row, col, it + 1)
            ax.set_axis_off()
            ax.imshow(images[it, :, :, 0] * 127.5 + 127.5, cmap='gray')
            it += 1
    return fig

In [ ]:
display(images);

Exercice

Code the generator and the discriminator:

1. Generator:

  • Input noise of (1, 1, 100)
  • Transpose convolution of 1024 channels, kernel of 4, stride 1 and valid padding; followed by a BatchNorm and a LeakyReLU of 0.2
  • Transpose convolution with kernel of 4, stride 2 and same padding; followed by a BatchNorm and a LeakyReLU of 0.2. All of this line repeated 3 times with 512, 256, and 128 channels respectively.
  • A final transpose convolution with a kernel of 4, a stride of 2, a same padding.

Questions: Which activation for the last conv? How many channels?

2. Discriminator:

  • Image input
  • Four block of Conv (kernel 4, stride 2, padding same) + BatchNorm + LeakyReLU with respectively 128, 256, 512, and 1024 channels.

Questions: What additional layer is needed to have a scalar output? Which activation to use?

Finally why did we use LeakyReLU instead of ReLU?


In [ ]:
from tensorflow.keras.layers import Conv2D, Conv2DTranspose
from tensorflow.keras.layers import BatchNormalization, Input
from tensorflow.keras.layers import LeakyReLU
from tensorflow.keras.layers import # TODO one more layer is needed for discriminator end
from tensorflow.keras.models import Model


def get_generator():
    # TODO
    
    return Model(inputs=input_noise, outputs=x)


def get_discriminator():
    # TODO

    return Model(inputs=input_image, outputs=x)


generator = get_generator()
discriminator = get_discriminator()

In [ ]:
# %load solutions/dcgan.py

Now that our models are created, a sanity check must be done on the output dimension.


In [ ]:
import numpy as np

def get_noise(batch_size, nz=100):
    return tf.random.normal([batch_size, 1, 1, nz])

noise = get_noise(20)

print("init", noise.shape)
fake_images = generator(noise)
print("Fake images", fake_images.shape)  # Should be (_, 64, 64, 1)
preds = discriminator(fake_images)
print("Predictions", preds.shape)  # Should be (_, 1)

In [ ]:
# generator.summary()
# discriminator.summary()

Two losses are needed:

  1. The discriminator must classify original images as real and generated images as fake.
  2. The generator must force the discriminator to classify its generated images as real.

Note that we are using the binary_crossentropy with the options from_logits=True so that the loss include itself the sigmoid activation.

Fill the following losses:


In [ ]:
from tensorflow.keras.losses import binary_crossentropy

def discriminator_loss(preds_real, preds_fake):
    loss_real = # TODO
    loss_fake = # TODO
    return loss_real + loss_fake


def generator_loss(preds_fake):
    return # TODO

In [ ]:
# %load solutions/gan_losses.py

We create two optimizers, one for the discriminator and one for the generator. Remember that Adam keeps variables about the model, and thus cannot be shared as a SGD could.


In [ ]:
from tensorflow.keras.optimizers import Adam

optimizer_d = Adam(learning_rate=1e-4, beta_1=0.5)
optimizer_g = Adam(learning_rate=1e-4, beta_1=0.5)

We define our train step for a single batch. Fill the missing part.

Note that we use the decorator tf.function to "compile" the function and speed up a bit the training. This decorator should always be placed on top of computationaly intensive functions.


In [ ]:
@tf.function
def train_step(images):
    noise = get_noise(images.shape[0])

    with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
        generated_images = # TODO
        
        real_output = # TODO
        fake_output = # TODO

        gen_loss = # TODO
        disc_loss = # TODO

    gradients_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)
    gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.trainable_variables)

    optimizer_g.apply_gradients(zip(gradients_of_generator, generator.trainable_variables))
    optimizer_d.apply_gradients(zip(gradients_of_discriminator, discriminator.trainable_variables))

    return disc_loss, gen_loss

In [ ]:
# %load solutions/gan_step.py

We can now train our DCGAN.

Monitor the losses of our model. Beware that one doesn't converge to 0 while the other diverge. It would be mean that the "adverserial" part of the training is biased in favor of one of the model.

In less than 5 epochs you should see realist digits. In 30 epochs your model should have converged.


In [ ]:
from tensorflow.keras.metrics import Mean
from IPython.core.display import display as jupy_display
import numpy as np

epochs = 10

fixed_noise = get_noise(20)

print("Base noise:")
fake_images = generator(fixed_noise, training=False).numpy()
jupy_display(display(fake_images))

for epoch in range(epochs):
    print("====== Epoch {:2d} ======".format(epoch))

    epoch_loss_d = Mean()
    epoch_loss_g = Mean()
    
    epoch_len = tf.data.experimental.cardinality(data_generator)
    for i, real_images in enumerate(data_generator):
        loss_d, loss_g = train_step(real_images)
        epoch_loss_d(loss_d)
        epoch_loss_g(loss_g)
        
        if i % 50 == 0 and i > 0:
            print(i, end=" ... ")
            
    print("\nDiscriminator: {}, Generator: {}".format(
        epoch_loss_d.result(), epoch_loss_g.result()))
    fake_images = generator(fixed_noise, training=False).numpy()
    jupy_display(display(fake_images))

Exercise

  • Interpret your result
  • What is the two main problems the model have?

Homework

  • Try more epochs to have crisper images.
  • Try other datasets, like celeba. Beware of the number of channels.
  • Modify this network to make a cDCGAN.

In [ ]: