Marc Lelarge
1- Course overview: machine learning pipeline
2- PyTorch tensors and automatic differentiation
3- Classification with deep learning
4- Convolutional neural networks
5- Embedding layers and dataloaders
6- Unsupervised learning: auto-encoders and generative adversarial networks
The idea behing GANS is to train two networks jointly:
Goodfellow et al. Generative adversarial nets. 2014.
The discriminator D is a classifier and D(x) is interpreted as the probability for x to be a real sample.
The generator G takes as input a Gaussian random variable z and produces a fake sample G(z).
The discriminator and the generator are learned alternatively, i.e. when parameters of D are learned G is fixed and vice versa.
The discriminator D is a classifier and D(x) is interpreted as the probability for x to be a real sample.
The generator G takes as input a Gaussian random variable z and produces a fake sample G(z).
The discriminator and the generator are learned alternatively, i.e. when parameters of D are learned G is fixed and vice versa.
When G is fixed, the learning of D is the standard learning process of a binary classifier (Sigmoid layer + BCE loss).
The discriminator D is a classifier and D(x) is interpreted as the probability for x to be a real sample.
The generator G takes as input a Gaussian random variable z and produces a fake sample G(z).
The discriminator and the generator are learned alternatively, i.e. when parameters of D are learned G is fixed and vice versa.
When G is fixed, the learning of D is the standard learning process of a binary classifier (Sigmoid layer + BCE loss).
The learning of G is more subtle. The performance of G is evaluated thanks to the discriminator D, i.e. the generator maximizes the loss of the discriminator.
The task of D is to distinguish real points x1,…,xN from generated points G(z1),…,G(zN).
The last layer of D is a Sigmoid layer, then learning of D is done thanks to the binary cross-entropy loss:
L(D,G)=−n=1∑NlogD(xn)+log(1−D(G(zn))).
The task of D is to distinguish real points x1,…,xN from generated points G(z1),…,G(zN).
The last layer of D is a Sigmoid layer, then learning of D is done thanks to the binary cross-entropy loss:
L(D,G)=−n=1∑NlogD(xn)+log(1−D(G(zn))).
For a fixed generator G, the optimal discriminator is
D∗=argminL(D,G).
The task of G is to fool the discriminator.
For a fixed discriminator D, the optimal generator is
G∗=argmaxL(D,G)=argmax−n=1∑Nlog(1−D(G(zn))).
In practice, the loss for G is often replaced by: G∗=argmaxn=1∑Nlog(D(G(zn))).
When the generator is weak compared to the discriminator, i.e. when D(G(z)<<1, the modified loss boosts the learning of the generator thanks to the high slope of log around zero.
Creating the generator and discriminator.
import torchimport torch.nn as nnz_dim = 32hidden_dim = 128net_G = nn.Sequential(nn.Linear(z_dim,hidden_dim), nn.ReLU(), nn.Linear(hidden_dim, 2))net_D = nn.Sequential(nn.Linear(2,hidden_dim), nn.ReLU(), nn.Linear(hidden_dim,1), nn.Sigmoid())
The point cloud will be given as a numpy array X.
batch_size, lr = 50, 1e-3nb_epochs = 500optimizer_G = torch.optim.Adam(net_G.parameters(),lr=lr)optimizer_D = torch.optim.Adam(net_D.parameters(),lr=lr)for e in range(nb_epochs): np.random.shuffle(X) real_samples = torch.from_numpy(X).type(torch.FloatTensor) for real_batch in real_samples.split(batch_size): #improving D z = torch.empty(batch_size,z_dim).normal_() fake_batch = net_G(z) D_scores_on_real = net_D(real_batch) D_scores_on_fake = net_D(fake_batch) loss = -torch.mean(torch.log(1-D_scores_on_fake) + torch.log(D_scores_on_real)) optimizer_D.zero_grad() loss.backward() optimizer_D.step() # improving G z = torch.empty(batch_size,z_dim).normal_() fake_batch = net_G(z) D_scores_on_fake = net_D(fake_batch) loss = -torch.mean(torch.log(D_scores_on_fake)) optimizer_G.zero_grad() loss.backward() optimizer_G.step()
Contrary to standard loss minimization, we have no guarantee here that the network will stabilize. It can very well oscillate without convergence.
A GAN fitting double moons.
The goal of a GAN is to find the best generator against any discriminator, i.e.
G∗=argmaxminDL(D,G).
But optimization alternates between finding the best generator against the current discriminator and finding the best discriminator against the current generator. In particular, we do not solve the maxmin problem but alternate between the two problems.
The goal of a GAN is to find the best generator against any discriminator, i.e.
G∗=argmaxminDL(D,G).
But optimization alternates between finding the best generator against the current discriminator and finding the best discriminator against the current generator. In particular, we do not solve the maxmin problem but alternate between the two problems.
As a result, a possible equilibrium of the game observed in practice, is for the generator to generates only 'easy' samples, i.e. those most difficult to classify for the discriminator.
Those generated digits are clearly not following the origninal distribution of MNIST dataset and ones for example seems over-represented.
When labels are available, mode collapse can be mitigated (more details in practicals).
Mirza et al. Conditional Generative Adversarial Nets. 2014.
Even when labels are not available, mode collapse can be mitigated (more details in practicals).
Chen et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. 2016.
An InfoGAN fitting double moons.
"Historical attempts to scale up GANs using CNNs to model images have been unsuccessful. (...) We also encountered difficulties attempting to scale GANs using CNN architectures commonly used in the supervised literature. However, after extensive model exploration we identified a family of architectures that resulted in stable training across a range of datasets and allowed for training higher resolution and deeper generative models."
Radford et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. 2015.
replace pooling layers with strided convolutions in D and strided transposed convolution in G.
use batchnorm in both D and G.
remove fully connected hidden layers.
use ReLU
in G except for the output using Tanh
.
use LeakyReLU
in D for all layers.
The end.
1- Course overview: machine learning pipeline
2- PyTorch tensors and automatic differentiation
3- Classification with deep learning
4- Convolutional neural networks
5- Embedding layers and dataloaders
6- Unsupervised learning: auto-encoders and generative adversarial networks
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |