0:00

In this video, we're going to learn what are Generative Adversarial Networks or GANs.

GANs are neural networks that learn to create

synthetic data similar to some known input data.

They were first introduced in 2014

by a group of researchers at the University of Montreal,

led by Yann Lecun, fellow from Open Al.

After that, GANs quickly gained

popularity in the community of researchers of deep learning.

Recently, many scientific efforts are devoted to the application of

GANs to the task of machine learning and to the stabilization of their training.

You can compare a GAN with a pair of forger and the police.

The forger or a generator,

falsifies money and wants to deceive the policeman or a discriminator,

who in turn tries to distinguish real bills from forged ones.

Unlike standard approaches, not only the model is

training but so does the last function or discriminator.

GAN is a generative machine learning model in

which two neural networks are competing against each other.

Instead of using a standard fixed cost function,

we learn the cost function with the neural network.

We alternate between training different parts of the network.

With GANs, researchers have generated convincing images from photographs of everything,

from bedroom's to album covers and they display

a remarkable ability to reflect higher order semantic logic.

For example, which of these photos are real?

And the spoiler is none.

Let's take a closer look at the structure of a GAN.

The discriminator takes an image of a bear,

for example and its task is to distinguish

whether its generated by generator or is this a real bear picture.

That is, the output of the discriminator is the probability that the picture is real.

The GAN training framework,

works as if two adversaries were playing against each other in a game.

Each player is represented by

a differentiable function controlled by a set of parameters,

typically a deep neural network.

The game plays out in two scenarios.

In one scenario, training examples X are randomly sampled from

the training set and used as input for the first player, the discriminator,

represented by the function D. The goal of the discriminator is

to output the probability that its input is real rather than fake.

Under the assumption that, half of the inputs

has ever shown a real and the half of them are fake.

In this first scenario,

the goal of the discriminator is for D(x) to be near one.

In the second scenario,

inputs G to the generator are randomly

sampled from the models prior over the latent variables.

The discriminators strives to make D(G(z)) approach,

zero while the generative network strives to make the same quantity approach one.

If both models have sufficient capacity,

then the Nash equilibrium of this game corresponds to the G(z) being drawn from

the same distribution of the training data and D(x) to be one for all X.

In the learning process,

the parameters of the discriminator are adjusted in such a way as to

maximize the cross-entropy between the responsers and the real classes of pictures.

The second part the generator,

takes an input random vector generated for example,

from a normal distribution and produces a bear image.

The task of the generator is to deceive the discriminator so its parameters are adjusted

in such a way as to maximize the discriminator error on the generated pictures.

Since the learning processes of

the discriminator and the generator occur at the same time,

the learning processes of GAN can be

represented as the search for a saddle point in a Minimax game.

Generative Adversarial Nets are trained by simultaneously updating

the discriminator blue so that it

discriminates between samples from the data generated distribution,

depicted in black and samples of the generative distribution, in green.

Latent variable Z is

uniformly sampled noise and X represents samples in the target domain,

say images of bears.

After updating the discriminator in the B image,

it's hard at distinguishing between real and generated data.

Then, after an update to generator in the C image,

gradient of discriminator has guided generator to

flow to regions that are more likely to be classified as data.

After several steps of training,

they will reach a point at which both can

not improve because distribution of generator and

data distribution are the same and

the discriminator is unable to differentiate between the two distributions.

Obviously, for a fixed discriminator,

the optimal generator is the one that produces X on any noise vector

for which the real probability of the discriminator is maximal.

It is also easy to prove that,

with a fixed generator,

the optimal discriminator is the one that gives the ratio of the values of the densities.

Estimating this ratio is the key approximation mechanism used by GANs.

Most GANs today are at least loosely based on the DCGAN architecture from Alex Radford.

DCGANs stands for, Deep Convolutional GAN.

Though GANs were both deep and convolutional prior to the DCGAN,

thus the name DCGAN is useful to refer to this specific style of architecture.

Some of the key insights of the DCGAN architecture,

were to use batch-normalization layers and most layers of

both the discriminator and then generator with

the two mini batches for the discriminator are normalized separately.

The last layer of the generator and the first layer of the discriminator are not

batch-normalized so that the model can

learn the correct mean and scale of the data distribution.

The overall network structure is mostly borrowed from the old convolutional net.

This architecture contains neither pooling nor

unpooling layers and consists only of convolutional layers.

When the generator needs to increase the spatial dimension of a representation,

it uses transpose convolution with a stride greater than one.

And third, the use of Adam optimizer was chosen rather than SGD with momentum.

To summarize, GANs are an interesting new idea in

deep learning really recent that is based on alternatively posed optimization problems.

So, it yields both a generative and a discriminated model.

It's fairly hard to train because it features

unstable gradients and nowadays

many efforts are targeted at stabilization of their training,

and DCGAN lays the foundation for the use of GANs in computer vision.