Now in this video we're going to start optimizing our process. Using the idea of a response surface method (RSM) or a responsive optimization. I will say this as encouragement, more than any single other technique I use, the concept of optimization has always led to the most significant increase in profit and value in the companies I've worked with. Students that have used the concepts from this course in their work have been promoted many times, and I still get emails from them telling me about this. Now, full disclosure: I didn't invent any of these ideas so I don't deserve the credit, but let me try to explain them to you. Now people have been asking about linear versus non-linear systems on the forums. And to address the issue of repeated experiments and noise, we will look at these topics in this example. That's the idea of just-in-time learning. Now some fair warning, this video is longer than most, but it carries a single case study through and introduces many critical concepts. So back to the popcorn example from the prior video. We had a single factor that affects the system, factor A, the cooking type. And we develop the model y = 90 + 15x_A We said that the interpretation of the 15 x_A term in the model is that if we increase the cooking time from -1 to 0, or from 0 to +1 in coded units, that's a 1 unit increase, then the number of popped but unburnt popcorn increases on average by a value of 15 units. This would be a good time to have a discussion about how we connect our coded values over here in the model to real world values of cooking time. The simple rule for continuous factors is that we use the (coded value) = (real world value) minus the (center point) divided by half the range. In the cooking time factor here the center point is the middle, between the low value and the high value. That corresponds to 135 seconds. The range refers to the high level minus the low level. So 150 minus 120 equals 30, and half the range is then 15 seconds. Let's try using this formula for the real world units of 135 seconds of cooking time. So that's 135 - 135 = 0, divided by half the range of 15, which is still 0. So the coded value for 135 seconds is 0. The cooking time of 150 seconds should correspond to +1 in coded units. And the formula shows that that's correct. You could use the formula in the backwards direction. What does a cooking time of +2 in coded units correspond to in real world units? If we work in reverse, we multiply our coded value by half the range. So that's plus 2 times 15, which equals 30. Then instead of subtracting the center point, we add it. So 30 plus 135 = 165 seconds So here's a summary of how to go from real world units to coded units and back again. Keep this close by. We're going to use this many times in this section. So let's visualize this. The actual experiments are shown here with black circles. And the prediction model is shown as a blue line. Notice at the low level in coded units of -1, the prediction is 75 and it corresponds closely to the experimental values of 74 and 76. At the baseline where the coded value is 0, we have a prediction using the model of 90 unburned popcorns. But notice, we never run an experiment actually at the base line. The model simply tells us that if we did run one, that would be the number of unburnt popcorn's to expect. And at a coded value of +1, or 150 seconds in real world units, we make a prediction of 105 unburnt popcorns. Now on the forums, people have been hinting and asking about the idea of whether the linear model is valid. People have been suggesting that all practical systems have non-linear behaviour. Another way of stating that is to ask, what's going to happen as we extend our blue line prediction model further and further to the right? The blue line indicates that as we cook longer we should obtain more and more white popcorn. But clearly that is not true. If I cook the popcorn for longer times I would expect that I'm going to start to burn it. And this outcome variable will quickly start to decrease. Definitely not increase. But back over here in this region on the left the model is perfectly valid. This is a great illustration of the famous quote by George Box, who I've now mentioned several times in the course videos. He said, "all models are wrong, but some are useful". He also extended that by asking: " ... the practical question is, how wrong do they have to be before they are not useful?" I paraphrased that least piece and that is exactly what we're dealing with here in the video. Another way of looking at the issue is to ask, how do we know when the model is not useful anymore? And I'm going to show you the answer. The mode is not useful when it is not achieving its objective. Well, what is the objective? That comes back to the issue that we saw at the start of this module's videos. Our objective is clearly to get good predictions using the model, so we can find this optimum. So perhaps let's re-write the answer then. The model is not useful when it stops providing accurate predictions. And we need accurate predictions in order to optimize. Let's go try it out. What is the predicted value when cooking popcorn for 135 seconds? We showed earlier that prediction was 90. Now let's go run an actual experiment to confirm, and if we do that we get a value of 92. Is that a reasonable prediction? One way to tell is by comparing our prediction error to the error that is built into the system. What do I mean by error that's built into the system? Another term for that is noise and noise simply means that if we repeat the experiment exactly in the same way as all the others from beginning to the end we don't expect the same answer. We expect some variation -- and that is the noise. In my prior experiments at the minus 1 point with 120 seconds of cooking time, I achieved outcome values of 74 and 76. Now imagine if I run that again, I might get 78 and a 4th time I might get 73. This gives me some idea of the reproducibility in my system; that is noise and it's also called pure error. It combines both the error in my measurements, and the error due to setting up and operating the experiments. One way to estimate the noise is by running replicate experiments. Even if I don't run these two extra experiments, I can get an idea of the reproducibility by seeing 74 and 76 here, and 106 and 104 over there. Gives me some measure of variation in the system. Now we could look into various ways of quantifying noise, talking about standard deviations and normal distributions; but I don't want to lose focus of our goal here, which is to figure out the optimum, and if our model is good enough. So back to the baseline. My prediction was 90. The actual value I achieved was 92. And compared to the noise in the system, a deviation of 2 popped corns isn't a big deal. It is comparable to the level of noise I saw earlier, about 2%. So, right now, this model is great. Let's go use it to optimize. If we want the maximum number of white and burned popcorn, we should be going to right, using longer and longer cooking times but how big of a step should we take. We will never really know until we try. I usually start by making a one-coded units change away from the baseline. So from x_A equals 0 to x_A equals 1. But we've already run that experiment. So let's jump another unit up to x_A equals plus 2. In real world units, you showed a few minutes ago, that that new point corresponds to 165 seconds. Now before you attempted to run your experiment here's a great idea and good practice to make a prediction first. Use your model and show that the prediction is going to be 90 + 15*2 = 120. When we run the experiment, we get a value of 113 popped corns. So our model actually overestimated. It was wrong by about 6%. That's a lot higher than the 2% error we have just from the pure noise in the system. So here's the first indication that our model is not quite as accurate as we had hoped for. If we plot these points, and superimpose the prediction on them we see there's some sort of downward trend. Now as long as the model was making useful predictions, it's worth using it. In fact, this model is still useful. At least it points us in the right direction. You can keep stepping in the direction of x_A equals +3 and maybe a little further and you will still be okay. I'm going to ask you to pause the video here. And using the experimental data points available to you so far, these four data points, where would you predict the optimum lies? At what value of x_A will you get the most popcorn? I'm going to show you how we can figure that out. We have evidence here that we should account for some non-linear behaviour in our system. We can see it visually in the plot because the deviation away from our linear model. Now for those of you that don't have a background in system modelling an easy way to account for a non-linear component, is to add a quadratic term. The b_AA x_A^2 term over here. You'll likely recall seeing these in high school. Adding this non-linear component, should account for the non-linear behaviour in our system and improve our predictions. And here is the R code to fit this new model with the quadratic term added. Please note that we use the capital letter "I" wrapped around the quadratic portion. This tells R that we want to treat the quadratic in the formula as is. "I" is known as the as is operator. So our new model is y = 91.8 + 14.9 x_A - 2 (x_A^2). How does this new model's prediction compare? Let's go try using at x_A = +1. We get a prediction of 91.8 + 14.9 - 2 That's 104.7. It agrees with our numbers. Try using the model at x_A equals 2 plus 2. You should have obtained the value of 113.6, which also agrees well with experimental data. We are in better shape than before. Let's go repeat the approach we've seen so far. Make a prediction in the correct direction, then run the actual experiment. Compare that to the prediction and rebuild the model if necessary based on the evidence of deviation. So where should our next step be? It's quite feasible that you step out to x_A equals +3 using the rules from before that implies a cooking time of three minutes. But we might be able to use our model in a better way. Let's visualize the predictions from our model and I've added those quadratic predictions using the red line. Here's some R code to create this plot yourself. Copy and paste it from the course website. We can either find the maximum from our revised model by looking or eyeballing the plot. Or you can go find it mathematically. It doesn't matter too much, because George Box's reminder is all that all models are wrong. Visually we see the model predicting an optimum at about 3.6, and a value of 118 popcorn's. What is a coded value of 3.6 represent in real world units? You're going to be an expert at this soon. It's 3.6 times 15 for the half range. And add to it a value of 135 for the center. So it's a real world value of 189 seconds of cooking time. Let's go run that experiment now. We predicted a value of 118 and the actual experiment gives us a value of 116. That's pretty impressive. In 7 experiments, we seem to have achieved very good predictions and a local optimum in the system. Let's recap our experiments. The first four experiments were used to rule out factor B, the oil type. And those experiments also gave us our initial model. Then we run a center point at 135. The center point confirmed that our system was linear and we used it to check the models predictions. Experiment six was our first step outside the box. We really tested the model's prediction capacity, we used this as justification to rebuild the model as well. And then our seventh experiment seems to be near the optimum. And I'll say more about that next. But first, some advice. You should always be able to justify the value achieved from each experiment, even if they're cheap. In industry and manufacturing environments, experiments can cost hundreds of dollars and you may have to justify the need for each experiment to your manager. So, always aim to justify the reason for running that experiment. Our seventh experiment is not necessarily the last one. We could have run one here at x_A = +3, if we weren't quite sure what our model's prediction ability was. Sometimes rule make smaller, and smaller, and more conservative steps. Especially if we're concerned about safety. Safety about your employees, or safety of the equipment. We also make smaller steps as we approach the optimum. Because we don't want to jump past it. We could, and should, if the time and budget allow, run an eighth experiment, maybe at x_A = +5, to confirm that something decreases after that optimum. Here's why. The very definition of a maximum is that everything else is lower. So we should, at the very least, have some evidence that things really are lower on the other side. With the information we have right now, it might be that we've just levelled off, or that we had an unlucky experiment and the optimum is still further along. So let's go verify we are in the region of an optimum by going past it. And I'm going to try an experiment at x_A = +5, but remember always predict first and we can predict with a current model shown here in red that we should get a value of? That's right you should have predicted a value of 116 using that equation. When you actually run the experiment, you get a value of 109, shown with a diamond symbol here on the plot. Ah-ha! This confirms that we've reached the optimum. Again, this was a great demonstration that the model was useful, but has some inaccuracy. We predicted 116, but our actual value was 109. Since we have this eighth data point, we might as well use it to improve our model. And here is the revised R code for that. So there we are. The updated model is 92 + 15 x_A - 2.3 x_A^2 And you will notice the location of the optimum has shifted just slightly. It's not so bad that we have to go relocate it. The revised model is shown in dark green. Some people will point out that they could have done this with fewer experiments, maybe by doing experiments at -1, +1, + 3, and +5, using an equally spaced grid. The grid approach works if you have to search in one or two dimensions for an optimum. But it quickly becomes inefficient after that. If you look at the approach we've taken in this video, it's been very thoughtful and carefully planned. And we're going to extend it to two dimensions in one of the future videos. So here's a summary of the important points that you learned in this video. We learned how to convert between coded units and real world units and back again. We showed several times how models can be wrong, but are very useful, despite their shortcomings. We saw a systematic approach for seeking the optimum, which consists of collecting experimental data, building a model. Testing the models predictions and then rebuilding the model if it had short comings. Finally, we learn to always justified the use of every experiment. Running a new experiment, can be very, very costly. So, careful planning to make sure that, that experiment is actually needed is critical. I do want to point out one final issue, and this will help to extend the knowledge of students who don't have a modelling background. Notice here, in this region of the model, that the linear model works perfectly well for the predictions and for describing the system. By that I mean we can clearly see that if we increase cooking time, we get more white unburned popcorn. We're not going to make any serious errors over here by using this linear model. The idea is that over a small enough range, practical systems, even if they're nonlinear, will behave as if they are a linear system. Finally, I want you to notice how the work we've covered in this video. Addresses objectives 1, 3 and 4 of the general data analysis objective that I spoke about in the previous video. And we seamlessly switched between these objectives as our analysis proceeded.