[MUSIC]

In this video, we'll see a couple of examples of how

Bayesian optimization can be applied to real world problems.

So the first one is hyperparameter tuning.

You usually train your neural networks and you have to retrain them

many times by finding optimal number of layers, the layer sizes,

whether to use dropout or not, to use batch normalization, and

which nonlinearity should you use, the RELU, SELU, and so on.

Also you have training parameters like learning rate, momentum, or

maybe it can change the different optimizers.

For example, and SGD.

So what you could do is you could use the Bayesian optimization to find

the best values of all of those parameters for you automatically.

It usually finds better optima than when you tune those by hand, and

also it allows for honest comparison with other methods when you do research.

For example, you came up with a brilliant method and

you spend a lot of time tuning the parameters for it, and

in your paper you want to compare your model with some other models.

And it is really tempting to not to spend much time tuning the parameters for

the other model.

However, what you could do is you could run the automatic

hyperparameter tuning to find the best variables for

those parameters for the model that you are comparing with, and

in this case the comparison would be more honest.

So the problem here actually is that we have a mixture of discrete and

continuous variables.

For example, we have the learning rate that is continuous, and we have

the parameter whether to use drop out or not, which is actually a binary decision.

So how can we mix the continuous and discrete variables for Gaussian process?

Well, the simple trick is like this.

You treat discrete variables as continuous when you fitting process.

So for example, when you use drop out, this would be the value one and

when you don't use it, it would be a value of zero.

And then when you try to maximize the acquisition function, you optimize

it by [INAUDIBLE] forcing whole possible values of discrete variables.

So for example, you will find the maximum of the acquisition

function without drop out, we'll find it with drop out, and

then select the one case that is better for you.

One special case is when all variables are discrete.

Those are called multi-armed bandits, and

they are widely used in information retrieval tasks.

For example, when you're building a search engine result page,

you can select a lot of hyperparameters that are discrete, and

for this case, the Bayesian optimization is really useful.

Another application is drug discovery.

We have some molecules that can probably be the drugs for some severe diseases.

In this case, I have the molecule of and we can represent it using string.

This string is called SMILES and

it can be constructed from the molecule very simply.

What you can do then is you can build an autoencoder that will try

to take the SMILES as an input and reproduce itself as an output.

You can use a variational autoencoder that we talked about in week

five to make the latent space dense, that is you can move along the space and

for each point, you will be able to reconstruct some valid molecule.

And now here's a trick, you know that some molecules are useful for

curing some diseases and some are not.

And so here I have a plot of a latent space, and in this latent space,

you want to find the position of the maximum,

that is the molecule that will be best for cures and diseases.

After you find the maximum in the latent space,

you simply plug it in to the decoder and reconstruct the molecule, and

then you can do some trials, for example in vitro or in viva.

And after this, you get the value of the point, the new value.

You add it to a model.

You reconstruct the Gaussian process and

find the new maximum of an acquisition function.

And just by [INAUDIBLE], you can quickly find new drugs for different diseases.