0:00

[MUSIC]

So, basically, the idea behind the model we're going to study now, and

the model is called generative adversarial networks.

The idea is that we want to train a model specifically to tell us whether

a generated image is good enough or not.

So, it wants to use the kind of pre-trained plus mean

squared error metric, it's just a specific model that has only one task.

It says if an image is good enough or not.

Now, basically this makes two networks now.

We have the first network, which is called a generator, it takes some kind of scene.

Maybe a random noise or some conditional what kind of face or

cheer it has to generate and it has to output some kind the some image.

Of course, at the very beginning those images won't be good,

won't be a [INAUDIBLE] really once.

The second part, this kind of specially trained metric are your distinct nature.

And it's called so because it discriminates between our real kind of

reference images from a data set, and the images the generator generates.

At first this problem is going to be very easy, because generator is generating

rubbish, because it's just random initialization usually.

And your [INAUDIBLE] discriminator will very easy tell the difference between

noise and well, nonsensical images from generator and actual images.

But this task is not solved for the sake of itself.

You train discriminator in order to train generator.

Now you want to tune your generative network, the guy on the left,

in a way that tries to fool the discriminator.

So if the discriminator tries to say whether you image is real or not,

you then try to adjust the image, adjust the generative mode.

So that this image becomes more real in the eye of the discriminator.

Let's see how it looks in terms of scheme, so that it gets clear perhaps.

First network is generator.

It's the thing that takes parameters, perhaps a tree orientation or

face features, and

some random noise to make sure that it generates different stuff each time.

And then tries to produce some kind of generated fake image.

And this model is basically your usual, kind of,

the data that you tried to train with MSCA.

Except it won't be trained with mean root square error anymore.

Second part is called discriminator, it's going to take image,

either a real image or an image from the previous model.

And it's going to output just one sigma, well,

kind of probability estimation of the final error.

This is going to be the probability of this image being fake.

Or it could be the probability of it being real.

The idea here is that this model is a discriminator, it can train for

a simple classification problem by using positive examples from data sets,

like non-fake images.

And negative fakes from what generator [INAUDIBLE] generates.

But then, here's the catch.

This discriminator's a new network.

So it's usually differentiable.

It was viewed differentially so it can train viable propagation.

Now it can trick the, it can take this, permuse your fake or permuse of being

real, it can repeat the gradient of the probability of being real over the pixels

of an image, like vector propagate through the entire discriminator.

But this image is, if it's generated, is just an out [INAUDIBLE] generator,

meaning the whole generator is entirely differentiable.

So you propàgate the gradient through the discriminator, back to generator,

through this image.

And, what you get, is you get, kind of, gradients,

that tell the generator how to fool discriminator.

Now you tune this generator to [INAUDIBLE] then it produces better images.

But then the [INAUDIBLE] is no longer able to distinguish between real [INAUDIBLE].

It was easy when the images were just basically white noise but

once they are not it becomes slightly harder.

So lets train Discriminator again.

This time again I feeded the kind of fake examples from the new better Generator.

Again so this process makes Discriminator stronger.

So now let's [INAUDIBLE] Generator.

Let's propagate the gradient through the new Discriminator back into Generator and

through that again.

Now, for this image, it's weaker, so let's do an [INAUDIBLE] and so the loop goes

until, well, essentially Generator generates three [INAUDIBLE] good images.

And Discriminator is more or less kind of from image critic.

So, if you tried to apply and formulize this logic in a math formula,

discriminator is solving classification problem.

So he tries to optimize [INAUDIBLE], so what you have here is the variable form

of the logarithm of probability of real data being real,

and the probability of fake data being fake.

Which is what you usually optimize.

But for the generator, you do this particular thing.

You have the logarithm of disciminator, thinking that you are real.

And you compute the gradients over the parameter of generator, and

you tune the generator this way.

And those list functions, D, of course they can contradict one another.

You can see that the generator's objective is essentially contradicting

the discriminator's goal to try to make discriminator, to classify,

to discriminate fake images as fake.

This [INAUDIBLE] contradicts with one of the [INAUDIBLE] objectives.

You have this algorithm of probability of being fake given image from [INAUDIBLE].

[INAUDIBLE] tries to minimize it while generator tries to assign larger

probability.

This actually means that you'll have to train those two networks simultaneously

and you'll have to train them to make a few alterations on each of them in this

training loop.

To make sure that the [INAUDIBLE] is useful and

the generator actually gets better full the [INAUDIBLE] from this [INAUDIBLE].

So the [INAUDIBLE] becomes the [INAUDIBLE].

You sample Of those noise, maybe some kind of task descriptions for

your what kind of face, what kind of chair you want to generate?

We sample the actual chairs, the reference images.

Then you spend a few generations training the discriminator.

You have actual images and you wanted to maximize

the probability of being real for those images.

So minimize the probability of being fake.

While minimizing this probability of maximizing fakeness for

the upload generator.

Then comes the second stage where you actually want to tune the generator so

that it fakes the discriminator.

Again, for one or more apples, you take this kind of task and maybe random noise.

Then you train your generator in a way that follows the gradients from

discriminator and tries to fake it.

And so the process go.

Now, if you're experienced enough with what kind of training curricula and

how you actually get deep learning models to train efficiently.

This particular fashion of trained algorithm should

probably [INAUDIBLE] because there is all those K, M, all those [INAUDIBLE].

Now this is your yield problem with generative models,

they are really unstable.

Because you generally have two networks that hate each other, and

you have to somehow accent profit from them hating each other.

And if one of them wins, basically this means that you have to start

the the entire process all over again.

For example, if discriminator wins, then it's kind of [INAUDIBLE] probability

estimate of [INAUDIBLE] of being fake or real.

It's already near 0 or near 1, which means that the gradients vanish.

Basically, the [INAUDIBLE] has very small variance near the activation of 1 or 0.

If you've now followed those screens with the generator, this won't get you anywhere

because the gradients are super small, negatable, in fact.

If generator wins, I mean if generator is constantly able to train faster than

discriminator, then you have an even worse situation.

It doesn't only stop training, but it starts learning the wrong things.

Because right now, the generator is fast enough to fool discriminator.

Discriminator is not able to give it clear gradients of how to improve.

Therefore you can expect your generator to learn basically nonsensical stuff diverge.

Basically you have to find some kind of equilibrium here where

the two models have more or less equal chances.

And ideally, mathematically, this whole process should terminate as a situation

where Generator wins but after a large number of steps.

So ideally generator should perfectly mimic the data distribution.

Should be indistinguishable.

That is discriminator should surrender.

But in real life, in most cases you wont see this thing happening.

You terminate.

I mean you get old faster than this happens So as the training loop iterates

your generator gets progressively better at well, generating stuff.

And this gives it more and more accurate data because it trains as well.

[INAUDIBLE] If you stop this process after some large amount of iterations

to see the final product you'll actually notice that the generated images here are.

Somewhat qualitatively different to the images that [INAUDIBLE] decoder generates

if you try to feed decoder with random noise.

Previously, [INAUDIBLE] decoder had to generate kind of average blind

images because this is how it could produce mean squared error.

Now, the generator won't be able to pull that trick.

The reason here is that if you generate a blurry image.

It may be good for mean squared error, but

a discriminator will easily differentiate between blurry and non-blurry images.

So if the generated images are, of course, non-perfect,

we can see some of them here having flaws, minor ones.

But they, of course, non-ideal, but they are non-ideal in different ways.

They are more or less, they are plausible.

Each image is in its way kind of trying to resemble the actual sample,

not trying to average over them.

And this is the core kind of advantage of generative adversarial networks.

They try to mimic a data set,

not to just try to learn probability distribution over it.

In fact, they do generate [INAUDIBLE] on the probability, but

instead of learning the distribution itself, it learns the sample,

which is kind of simpler in the case of images.

[MUSIC]