+ - 0:00:00
Notes for current slide
Notes for next slide

Module 10

Unsupervised learning

Generative Adversarial Networks



Marc Lelarge

1/21       

Overview of the course:

1- Course overview: machine learning pipeline

2- PyTorch tensors and automatic differentiation

3- Classification with deep learning

4- Convolutional neural networks

5- Embedding layers and dataloaders

6- Unsupervised learning: auto-encoders and generative adversarial networks

  • Deep learning architectures - in PyTorch: recap!
1/21       

Generative Adversarial Networks (GAN)

practicals: implementing conditional GAN and (simplified) InfoGAN

2/21       

Generative Adversarial Networks (GAN)

Learning high-dimension generative models

The idea behing GANS is to train two networks jointly:

  • a discriminator D\mathbf{D} to classify samples as "real" or "fake"
  • a generator G\mathbf{G} to map a fixed distribution to samples that fool D\mathbf{D}

Goodfellow et al. Generative adversarial nets. 2014.

3/21       

GAN learning

The discriminator D\mathbf{D} is a classifier and D(x)\mathbf{D}(x) is interpreted as the probability for xx to be a real sample.

The generator G\mathbf{G} takes as input a Gaussian random variable zz and produces a fake sample G(z)\mathbf{G}(z).

The discriminator and the generator are learned alternatively, i.e. when parameters of D\mathbf{D} are learned G\mathbf{G} is fixed and vice versa.

4/21       

GAN learning

The discriminator D\mathbf{D} is a classifier and D(x)\mathbf{D}(x) is interpreted as the probability for xx to be a real sample.

The generator G\mathbf{G} takes as input a Gaussian random variable zz and produces a fake sample G(z)\mathbf{G}(z).

The discriminator and the generator are learned alternatively, i.e. when parameters of D\mathbf{D} are learned G\mathbf{G} is fixed and vice versa.

When G\mathbf{G} is fixed, the learning of D\mathbf{D} is the standard learning process of a binary classifier (Sigmoid layer + BCE loss).

5/21       

GAN learning

The discriminator D\mathbf{D} is a classifier and D(x)\mathbf{D}(x) is interpreted as the probability for xx to be a real sample.

The generator G\mathbf{G} takes as input a Gaussian random variable zz and produces a fake sample G(z)\mathbf{G}(z).

The discriminator and the generator are learned alternatively, i.e. when parameters of D\mathbf{D} are learned G\mathbf{G} is fixed and vice versa.

When G\mathbf{G} is fixed, the learning of D\mathbf{D} is the standard learning process of a binary classifier (Sigmoid layer + BCE loss).

The learning of G\mathbf{G} is more subtle. The performance of G\mathbf{G} is evaluated thanks to the discriminator D\mathbf{D}, i.e. the generator maximizes the loss of the discriminator.

5/21       

Learning of D\mathbf{D}

The task of D\mathbf{D} is to distinguish real points x1,,xNx_1,\dots, x_N from generated points G(z1),,G(zN)\mathbf{G}(z_1),\dots, \mathbf{G}(z_N).

The last layer of D\mathbf{D} is a Sigmoid layer, then learning of D\mathbf{D} is done thanks to the binary cross-entropy loss:

L(D,G)=n=1NlogD(xn)+log(1D(G(zn))). \mathcal{L}(\mathbf{D},\mathbf{G}) = -\sum_{n=1}^N \log \mathbf{D}(x_n)+\log \left( 1-\mathbf{D}(\mathbf{G}(z_n))\right).

6/21       

Learning of D\mathbf{D}

The task of D\mathbf{D} is to distinguish real points x1,,xNx_1,\dots, x_N from generated points G(z1),,G(zN)\mathbf{G}(z_1),\dots, \mathbf{G}(z_N).

The last layer of D\mathbf{D} is a Sigmoid layer, then learning of D\mathbf{D} is done thanks to the binary cross-entropy loss:

L(D,G)=n=1NlogD(xn)+log(1D(G(zn))). \mathcal{L}(\mathbf{D},\mathbf{G}) = -\sum_{n=1}^N \log \mathbf{D}(x_n)+\log \left( 1-\mathbf{D}(\mathbf{G}(z_n))\right).

For a fixed generator G\mathbf{G}, the optimal discriminator is

D=argminL(D,G). \mathbf{D}^* = \arg\min\mathcal{L}(\mathbf{D},\mathbf{G}).

6/21       

Learning of G\mathbf{G}

The task of G\mathbf{G} is to fool the discriminator.

For a fixed discriminator D\mathbf{D}, the optimal generator is

G=argmaxL(D,G)=argmaxn=1Nlog(1D(G(zn))). \mathbf{G}^* = \color{red}{\arg\max} \mathcal{L}(\mathbf{D},\mathbf{G})= \arg\max -\sum_{n=1}^N \log \left( 1-\mathbf{D}(\mathbf{G}(z_n))\right).

In practice, the loss for G\mathbf{G} is often replaced by: G=argmaxn=1Nlog(D(G(zn))). \mathbf{G}^* = \arg\max \sum_{n=1}^N \log \left(\mathbf{D}(\mathbf{G}(z_n))\right).

7/21       

Loss function for G\mathbf{G}

When the generator is weak compared to the discriminator, i.e. when D(G(z)<<1\mathbf{D}(\mathbf{G}(z)<<1, the modified loss boosts the learning of the generator thanks to the high slope of log\log around zero.

8/21       

GAN for 2-d point cloud

Creating the generator and discriminator.

import torch
import torch.nn as nn
z_dim = 32
hidden_dim = 128
net_G = nn.Sequential(nn.Linear(z_dim,hidden_dim),
nn.ReLU(), nn.Linear(hidden_dim, 2))
net_D = nn.Sequential(nn.Linear(2,hidden_dim),
nn.ReLU(),
nn.Linear(hidden_dim,1),
nn.Sigmoid())

The point cloud will be given as a numpy array X.

9/21       

batch_size, lr = 50, 1e-3
nb_epochs = 500
optimizer_G = torch.optim.Adam(net_G.parameters(),lr=lr)
optimizer_D = torch.optim.Adam(net_D.parameters(),lr=lr)
for e in range(nb_epochs):
np.random.shuffle(X)
real_samples = torch.from_numpy(X).type(torch.FloatTensor)
for real_batch in real_samples.split(batch_size):
#improving D
z = torch.empty(batch_size,z_dim).normal_()
fake_batch = net_G(z)
D_scores_on_real = net_D(real_batch)
D_scores_on_fake = net_D(fake_batch)
loss = -torch.mean(torch.log(1-D_scores_on_fake) + torch.log(D_scores_on_real))
optimizer_D.zero_grad()
loss.backward()
optimizer_D.step()
# improving G
z = torch.empty(batch_size,z_dim).normal_()
fake_batch = net_G(z)
D_scores_on_fake = net_D(fake_batch)
loss = -torch.mean(torch.log(D_scores_on_fake))
optimizer_G.zero_grad()
loss.backward()
optimizer_G.step()
10/21       

Loss curves

Contrary to standard loss minimization, we have no guarantee here that the network will stabilize. It can very well oscillate without convergence.

11/21       

GAN in action

A GAN fitting double moons.

12/21       

Mode collapse

The goal of a GAN is to find the best generator against any discriminator, i.e.

G=argmaxminDL(D,G). \mathbf{G}^* = \arg\max \min_D \mathcal{L}(\mathbf{D},\mathbf{G}).

But optimization alternates between finding the best generator against the current discriminator and finding the best discriminator against the current generator. In particular, we do not solve the maxmin\max\min problem but alternate between the two problems.

13/21       

Mode collapse

The goal of a GAN is to find the best generator against any discriminator, i.e.

G=argmaxminDL(D,G). \mathbf{G}^* = \arg\max \min_D \mathcal{L}(\mathbf{D},\mathbf{G}).

But optimization alternates between finding the best generator against the current discriminator and finding the best discriminator against the current generator. In particular, we do not solve the maxmin\max\min problem but alternate between the two problems.

As a result, a possible equilibrium of the game observed in practice, is for the generator to generates only 'easy' samples, i.e. those most difficult to classify for the discriminator.

14/21       

Mode collapse on MNIST

Those generated digits are clearly not following the origninal distribution of MNIST dataset and ones for example seems over-represented.

15/21       

Conditional GAN

When labels are available, mode collapse can be mitigated (more details in practicals).

Mirza et al. Conditional Generative Adversarial Nets. 2014.

16/21       

InfoGAN

Even when labels are not available, mode collapse can be mitigated (more details in practicals).

Chen et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. 2016.

17/21       

IGAN in action

An InfoGAN fitting double moons.

18/21       

Deep Convolutional GAN

"Historical attempts to scale up GANs using CNNs to model images have been unsuccessful. (...) We also encountered difficulties attempting to scale GANs using CNN architectures commonly used in the supervised literature. However, after extensive model exploration we identified a family of architectures that resulted in stable training across a range of datasets and allowed for training higher resolution and deeper generative models."

Radford et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. 2015.

19/21       

DC-GAN rules

  • replace pooling layers with strided convolutions in D\mathbf{D} and strided transposed convolution in G\mathbf{G}.

  • use batchnorm in both D\mathbf{D} and G\mathbf{G}.

  • remove fully connected hidden layers.

  • use ReLU in G\mathbf{G} except for the output using Tanh.

  • use LeakyReLU in D\mathbf{D} for all layers.

20/21       

Generative Adversarial Networks (GAN)

practicals: implementing conditional GAN and (simplified) InfoGAN

21/21       

The end.

21/21       

Overview of the course:

1- Course overview: machine learning pipeline

2- PyTorch tensors and automatic differentiation

3- Classification with deep learning

4- Convolutional neural networks

5- Embedding layers and dataloaders

6- Unsupervised learning: auto-encoders and generative adversarial networks

  • Deep learning architectures - in PyTorch: recap!
1/21       

Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow