[MUSIC] In this video, we'll see a couple of examples of how Bayesian optimization can be applied to real world problems. So the first one is hyperparameter tuning. You usually train your neural networks and you have to retrain them many times by finding optimal number of layers, the layer sizes, whether to use dropout or not, to use batch normalization, and which nonlinearity should you use, the RELU, SELU, and so on. Also you have training parameters like learning rate, momentum, or maybe it can change the different optimizers. For example, and SGD. So what you could do is you could use the Bayesian optimization to find the best values of all of those parameters for you automatically. It usually finds better optima than when you tune those by hand, and also it allows for honest comparison with other methods when you do research. For example, you came up with a brilliant method and you spend a lot of time tuning the parameters for it, and in your paper you want to compare your model with some other models. And it is really tempting to not to spend much time tuning the parameters for the other model. However, what you could do is you could run the automatic hyperparameter tuning to find the best variables for those parameters for the model that you are comparing with, and in this case the comparison would be more honest. So the problem here actually is that we have a mixture of discrete and continuous variables. For example, we have the learning rate that is continuous, and we have the parameter whether to use drop out or not, which is actually a binary decision. So how can we mix the continuous and discrete variables for Gaussian process? Well, the simple trick is like this. You treat discrete variables as continuous when you fitting process. So for example, when you use drop out, this would be the value one and when you don't use it, it would be a value of zero. And then when you try to maximize the acquisition function, you optimize it by [INAUDIBLE] forcing whole possible values of discrete variables. So for example, you will find the maximum of the acquisition function without drop out, we'll find it with drop out, and then select the one case that is better for you. One special case is when all variables are discrete. Those are called multi-armed bandits, and they are widely used in information retrieval tasks. For example, when you're building a search engine result page, you can select a lot of hyperparameters that are discrete, and for this case, the Bayesian optimization is really useful. Another application is drug discovery. We have some molecules that can probably be the drugs for some severe diseases. In this case, I have the molecule of and we can represent it using string. This string is called SMILES and it can be constructed from the molecule very simply. What you can do then is you can build an autoencoder that will try to take the SMILES as an input and reproduce itself as an output. You can use a variational autoencoder that we talked about in week five to make the latent space dense, that is you can move along the space and for each point, you will be able to reconstruct some valid molecule. And now here's a trick, you know that some molecules are useful for curing some diseases and some are not. And so here I have a plot of a latent space, and in this latent space, you want to find the position of the maximum, that is the molecule that will be best for cures and diseases. After you find the maximum in the latent space, you simply plug it in to the decoder and reconstruct the molecule, and then you can do some trials, for example in vitro or in viva. And after this, you get the value of the point, the new value. You add it to a model. You reconstruct the Gaussian process and find the new maximum of an acquisition function. And just by [INAUDIBLE], you can quickly find new drugs for different diseases. [MUSIC]