DCGAN: Deep Convolutional GAN

In this lab you are going to code a Deep Convolutional GAN (DCGAN), aka a GAN made of convolutions. As we are trying to generate images, convolutions are the perfect layer to use.

This lab is quite RAM & GPU intensive, thus we recommand you to run in a Google Colab.



In [1]:

    
WORK_ON_COLAB = True

if WORK_ON_COLAB:
  try:
    # %tensorflow_version only exists in Colab.
    %tensorflow_version 2.x
  except Exception:
    pass









    



TensorFlow 2.x selected.



In [2]:

    
import tensorflow as tf
tf.__version__









    Out[2]:





'2.1.0-rc1'

Data

We are download our grayscale dataset (it could be either MNIST or Fashion-MNIST). Note that we are resizing images to a larger scale to match the original paper.



In [0]:

    
(images, labels), (_, _) = tf.keras.datasets.mnist.load_data()



In [4]:

    
from skimage.transform import resize

images = resize(images, (images.shape[0], 64, 64, 1), preserve_range=True).astype("float32")
images = (images - 127.5) / 127.5

images.shape









    Out[4]:





(60000, 64, 64, 1)



In [0]:

    
data_generator = tf.data.Dataset.from_tensor_slices(
    images).shuffle(60000).batch(100, drop_remainder=True)



In [0]:

    
%matplotlib inline
import matplotlib.pyplot as plt

def display(images, row=2, col=10):
    fig = plt.figure(figsize=(20, 3))
    it = 0
    for r in range(row):
        for c in range(col):
            ax = plt.subplot(row, col, it + 1)
            ax.set_axis_off()
            ax.imshow(images[it, :, :, 0] * 127.5 + 127.5, cmap='gray')
            it += 1
    return fig



In [7]:

    
display(images);

Exercice

Code the generator and the discriminator:

1. Generator:

Input noise of (1, 1, 100)
Transpose convolution of 1024 channels, kernel of 4, stride 1 and valid padding; followed by a BatchNorm and a LeakyReLU of 0.2
Transpose convolution with kernel of 4, stride 2 and same padding; followed by a BatchNorm and a LeakyReLU of 0.2. All of this line repeated 3 times with 512, 256, and 128 channels respectively.
A final transpose convolution with a kernel of 4, a stride of 2, a same padding.

Questions: Which activation for the last conv? How many channels?

2. Discriminator:

Image input
Four block of Conv (kernel 4, stride 2, padding same) + BatchNorm + LeakyReLU with respectively 128, 256, 512, and 1024 channels.

Questions: What additional layer is needed to have a scalar output? Which activation to use?



In [0]:

    
from tensorflow.keras.layers import Conv2D, Conv2DTranspose
from tensorflow.keras.layers import BatchNormalization, Flatten, Input
from tensorflow.keras.layers import LeakyReLU
from tensorflow.keras.models import Model


def get_generator():
    input_noise = Input(shape=(1, 1, 100))

    x = Conv2DTranspose(1024, kernel_size=4, strides=1, padding="valid")(input_noise)
    x = BatchNormalization()(x)
    x = LeakyReLU(alpha=0.2)(x)

    x = Conv2DTranspose(512, kernel_size=4, strides=2, padding="same")(x)
    x = BatchNormalization()(x)
    x = LeakyReLU(alpha=0.2)(x)

    x = Conv2DTranspose(256, kernel_size=4, strides=2, padding="same")(x)
    x = BatchNormalization()(x)
    x = LeakyReLU(alpha=0.2)(x)

    x = Conv2DTranspose(128, kernel_size=4, strides=2, padding="same")(x)
    x = BatchNormalization()(x)
    x = LeakyReLU(alpha=0.2)(x)

    x = Conv2DTranspose(1, kernel_size=4, strides=2, padding="same", activation="tanh")(x)

    return Model(inputs=input_noise, outputs=x)


def get_discriminator():
    input_image = Input(shape=(64, 64, 1))

    x = Conv2D(128, kernel_size=4, strides=2, padding="same")(input_image)
    x = BatchNormalization()(x)
    x = LeakyReLU(alpha=0.2)(x)

    x = Conv2D(256, kernel_size=4, strides=2, padding="same")(x)
    x = BatchNormalization()(x)
    x = LeakyReLU(alpha=0.2)(x)

    x = Conv2D(512, kernel_size=4, strides=2, padding="same")(x)
    x = BatchNormalization()(x)
    x = LeakyReLU(alpha=0.2)(x)

    x = Conv2D(1024, kernel_size=4, strides=2, padding="same")(x)
    x = BatchNormalization()(x)
    x = LeakyReLU(alpha=0.2)(x)

    x = Conv2D(1, kernel_size=4, strides=1, padding="valid")(x)
    x = Flatten()(x)

    return Model(inputs=input_image, outputs=x)

generator = get_generator()
discriminator = get_discriminator()

Now that our models are created, a sanity check must be done on the output dimension.



In [10]:

    
import numpy as np

def get_noise(batch_size, nz=100):
    return tf.random.normal([batch_size, 1, 1, nz])

noise = get_noise(20)

print("init", noise.shape)
fake_images = generator(noise)
print("Fake images", fake_images.shape)  # Should be (_, 64, 64, 1)
preds = discriminator(fake_images)
print("Predictions", preds.shape)  # Should be (_, 1)









    



init (20, 1, 1, 100)
Fake images (20, 64, 64, 1)
Predictions (20, 1)



In [0]:

    
# generator.summary()
# discriminator.summary()

Two losses are needed:

The discriminator must classify original images as real and generated images as fake.
The generator must force the discriminator to classify its generated images as real.

Fill the following losses:



In [0]:

    
from tensorflow.keras.losses import binary_crossentropy

def discriminator_loss(preds_real, preds_fake):
    loss_real = binary_crossentropy(tf.ones_like(preds_real), preds_real, from_logits=True)
    loss_fake = binary_crossentropy(tf.zeros_like(preds_fake), preds_fake, from_logits=True)
    return loss_real + loss_fake


def generator_loss(preds_fake):
    return binary_crossentropy(tf.ones_like(preds_fake), preds_fake, from_logits=True)

We create two optimizers, one for the discriminator and one for the generator. Remember that Adam keeps variables about the model, and thus cannot be shared as a SGD could.



In [0]:

    
from tensorflow.keras.optimizers import Adam

optimizer_d = Adam(learning_rate=1e-4, beta_1=0.5)
optimizer_g = Adam(learning_rate=1e-4, beta_1=0.5)

We define our train step for a single batch. Fill the missing part.

Note that we use the decorator tf.function to "compile" the function and speed up a bit the training. This decorator should always be placed on top of computationaly intensive functions.



In [0]:

    
@tf.function
def train_step(images):
    noise = get_noise(images.shape[0])

    with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
      generated_images = generator(noise, training=True)

      real_output = discriminator(images, training=True)
      fake_output = discriminator(generated_images, training=True)

      gen_loss = generator_loss(fake_output)
      disc_loss = discriminator_loss(real_output, fake_output)

    gradients_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)
    gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.trainable_variables)

    optimizer_g.apply_gradients(zip(gradients_of_generator, generator.trainable_variables))
    optimizer_d.apply_gradients(zip(gradients_of_discriminator, discriminator.trainable_variables))

    return disc_loss, gen_loss

We can now train our DCGAN.

Monitor the losses of our model. Beware that one doesn't converge to 0 while the other diverge. It would be mean that the "adverserial" part of the training is biased in favor of one of the model.



In [21]:

    
from tensorflow.keras.metrics import Mean
from IPython.core.display import display as jupy_display
import numpy as np

epochs = 10

fixed_noise = get_noise(20)

print("Base noise:")
fake_images = generator(fixed_noise, training=False).numpy()
jupy_display(display(fake_images))

for epoch in range(epochs):
    print("====== Epoch {:2d} ======".format(epoch))

    epoch_loss_d = Mean()
    epoch_loss_g = Mean()
    
    epoch_len = tf.data.experimental.cardinality(data_generator)
    for i, real_images in enumerate(data_generator):
        loss_d, loss_g = train_step(real_images)
        epoch_loss_d(loss_d)
        epoch_loss_g(loss_g)
        
        if i % 50 == 0 and i > 0:
            print(i, end=" ... ")
            
    print("\nDiscriminator: {}, Generator: {}".format(
        epoch_loss_d.result(), epoch_loss_g.result()))
    fake_images = generator(fixed_noise, training=False).numpy()
    jupy_display(display(fake_images))









    



Base noise:






    












    



====== Epoch  0 ======
50 ... 100 ... 150 ... 200 ... 250 ... 300 ... 350 ... 400 ... 450 ... 500 ... 550 ... 
Discriminator: 0.4786030650138855, Generator: 5.168595790863037






    












    



====== Epoch  1 ======
50 ... 100 ... 150 ... 200 ... 250 ... 300 ... 350 ... 400 ... 450 ... 500 ... 550 ... 
Discriminator: 0.4876248240470886, Generator: 2.9606661796569824






    












    



====== Epoch  2 ======
50 ... 100 ... 150 ... 200 ... 250 ... 300 ... 350 ... 400 ... 450 ... 500 ... 550 ... 
Discriminator: 0.4462273120880127, Generator: 3.0399672985076904






    












    



====== Epoch  3 ======
50 ... 100 ... 150 ... 200 ... 250 ... 300 ... 350 ... 400 ... 450 ... 500 ... 550 ... 
Discriminator: 0.3929280936717987, Generator: 3.503840923309326






    












    



====== Epoch  4 ======
50 ... 100 ... 150 ... 200 ... 250 ... 300 ... 350 ... 400 ... 450 ... 500 ... 550 ... 
Discriminator: 0.24137373268604279, Generator: 4.099499225616455






    












    



====== Epoch  5 ======
50 ... 100 ... 150 ... 200 ... 250 ... 300 ... 350 ... 400 ... 450 ... 500 ... 550 ... 
Discriminator: 0.5583713054656982, Generator: 3.237457036972046






    












    



====== Epoch  6 ======
50 ... 100 ... 150 ... 200 ... 250 ... 300 ... 350 ... 400 ... 450 ... 500 ... 550 ... 
Discriminator: 0.3487876057624817, Generator: 3.8014731407165527






    












    



====== Epoch  7 ======
50 ... 100 ... 150 ... 200 ... 250 ... 300 ... 350 ... 400 ... 450 ... 500 ... 550 ... 
Discriminator: 0.33307182788848877, Generator: 3.9503493309020996






    












    



====== Epoch  8 ======
50 ... 100 ... 150 ... 200 ... 250 ... 300 ... 350 ... 400 ... 450 ... 500 ... 550 ... 
Discriminator: 0.29243534803390503, Generator: 3.9895596504211426






    












    



====== Epoch  9 ======
50 ... 100 ... 150 ... 200 ... 250 ... 300 ... 350 ... 400 ... 450 ... 500 ... 550 ... 
Discriminator: 0.1485026329755783, Generator: 5.0309576988220215



In [0]:

Homework

Try more epochs to have crisper images.
Try other datasets, like celeba. Beware of the number of channels.
Modify this network to make a cDCGAN.



In [0]: