Dcgan Faces Project - Notebook by Asutosh Patnaik (1507177)

Learn practical skills, build real-world projects, and advance your career

Updated a year ago

%matplotlib inline

DCGAN for Fake Face Generation

Introduction

Generative Adversarial Networks

What is a GAN?


GANs are a framework for teaching a DL model to capture the training
data’s distribution so we can generate new data from that same
distribution. GANs were invented by Ian Goodfellow in 2014 and first
described in the paper `Generative Adversarial
Nets <https://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf>`__.
They are made of two distinct models, a *generator* and a
*discriminator*. The job of the generator is to spawn ‘fake’ images that
look like the training images. The job of the discriminator is to look
at an image and output whether or not it is a real training image or a
fake image from the generator. During training, the generator is
constantly trying to outsmart the discriminator by generating better and
better fakes, while the discriminator is working to become a better
detective and correctly classify the real and fake images. The
equilibrium of this game is when the generator is generating perfect
fakes that look as if they came directly from the training data, and the
discriminator is left to always guess at 50% confidence that the
generator output is real or fake.

Now, lets define some notation to be used throughout tutorial starting
with the discriminator. Let x be data representing an image.
D(x) is the discriminator network which outputs the (scalar)
probability that x came from training data rather than the
generator. Here, since we are dealing with images the input to
D(x) is an image of CHW size 3x64x64. Intuitively, D(x)
should be HIGH when x comes from training data and LOW when
x comes from the generator. D(x) can also be thought of
as a traditional binary classifier.

For the generator’s notation, let z be a latent space vector
sampled from a standard normal distribution. G(z) represents the
generator function which maps the latent vector z to data-space.
The goal of G is to estimate the distribution that the training
data comes from (Pdata) so it can generate fake samples from
that estimated distribution (Pg).

So, D(G(z)) is the probability (scalar) that the output of the
generator G is a real image. As described in `Goodfellow’s
paper <https://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf>`__,
D and G play a minimax game in which D tries to
maximize the probability it correctly classifies reals and fakes
(logD(x)), and G tries to minimize the probability that
D will predict its outputs are fake (log(1-D(G(x)))).

In theory, the solution to this minimax game is where
Pg = Pdata, and the discriminator guesses randomly if the
inputs are real or fake. However, the convergence theory of GANs is
still being actively researched and in reality models do not always
train to this point.

What is a DCGAN?

A DCGAN is a direct extension of the GAN described above, except that it
explicitly uses convolutional and convolutional-transpose layers in the
discriminator and generator, respectively. It was first described by
Radford et. al. in the paper Unsupervised Representation Learning With Deep Convolutional Generative Adversarial Networks <https://arxiv.org/pdf/1511.06434.pdf>. The discriminator
is made up of strided
convolution <https://pytorch.org/docs/stable/nn.html#torch.nn.Conv2d>
layers, batch norm <https://pytorch.org/docs/stable/nn.html#torch.nn.BatchNorm2d>__
layers, and
LeakyReLU <https://pytorch.org/docs/stable/nn.html#torch.nn.LeakyReLU>__
activations. The input is a 3x64x64 input image and the output is a
scalar probability that the input is from the real data distribution.
The generator is comprised of
convolutional-transpose <https://pytorch.org/docs/stable/nn.html#torch.nn.ConvTranspose2d>__
layers, batch norm layers, and
ReLU <https://pytorch.org/docs/stable/nn.html#relu>__ activations. The
input is a latent vector, z, that is drawn from a standard
normal distribution and the output is a 3x64x64 RGB image. The strided
conv-transpose layers allow the latent vector to be transformed into a
volume with the same shape as an image. In the paper, the authors also
give some tips about how to setup the optimizers, how to calculate the
loss functions, and how to initialize the model weights, which
will be explained in the coming projects.

from __future__ import print_function
#%matplotlib inline
import argparse
import os
import random
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.optim as optim
import torch.utils.data
import torchvision.datasets as dset
import torchvision.transforms as transforms
import torchvision.utils as vutils
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from IPython.display import HTML

# Set random seed for reproducibility
manualSeed = 999
#manualSeed = random.randint(1, 10000) # use if you want new results
print("Random Seed: ", manualSeed)
random.seed(manualSeed)
torch.manual_seed(manualSeed)

Random Seed:  999

<torch._C.Generator at 0x7f2fad0741c8>

Inputs

Let’s define some inputs for the run:

dataroot - the path to the root of the dataset folder. We will
talk more about the dataset in the next section
workers - the number of worker threads for loading the data with
the DataLoader
batch_size - the batch size used in training. The DCGAN paper
uses a batch size of 128
image_size - the spatial size of the images used for training.
This implementation defaults to 64x64. If another size is desired,
the structures of D and G must be changed. See
here <https://github.com/pytorch/examples/issues/70>__ for more
details
nc - number of color channels in the input images. For color
images this is 3
nz - length of latent vector
ngf - relates to the depth of feature maps carried through the
generator
ndf - sets the depth of feature maps propagated through the
discriminator
num_epochs - number of training epochs to run. Training for
longer will probably lead to better results but will also take much
longer
lr - learning rate for training. As described in the DCGAN paper,
this number should be 0.0002
beta1 - beta1 hyperparameter for Adam optimizers. As described in
paper, this number should be 0.5
ngpu - number of GPUs available. If this is 0, code will run in
CPU mode. If this number is greater than 0 it will run on that number
of GPUs