In this video, we demonstrate how the response surface strategy changes as we reach the optimum. Issues of curvature and non-linearity become important at the peak of the mountain. One advantage of response surface methods is that we learn about the region around us as we go. Remember that analogy of walking with a ski pole in your hand? Well, we never really know the region around us. So when we use that ski pole to figure out what the terrain looks like, we need to have a way to know when we've reached the top. Let's just quickly contrast the response surface approach with the OFAT approach. The COST approach, or the OFAT approach, makes you think that you're at the optimum, but you can never really be sure. In this case that we saw earlier with two factors, you would alternate between optimizing factor A, then factor B, then optimize factor A again, then B again. And you'll eventually get to an optimum, but will you be sure you're at the peak? How do you know you don't need to do another round of optimizing in A and B again? Also, if I'd optimized B first and then A, I would have arrived at the optimum faster. This seems like a lottery! Sometimes you get to the peak quickly, and sometimes slower. Not surprisingly, statisticians don't like this sort of thing. Furthermore, this approach doesn't scale well. If you had five factors, for example, A, B, C, D, and E, then this haphazard searching across the five factors leads to inefficient experimentation. By using the COST approach you will not learn about the interactions in your system. Recall from an earlier video in this module that learning more about our systems was the first way we can use data to improve our processes. So let's resume and continue with the model built on points 11, 12, 13, 14, and the baseline at point 10. We pointed out that the contour plots exhibit curvature. The lines are not parallel. These curved lines come from the interaction term, indicating that the interaction coefficient is important relative to the main effects. In prior models, the interaction term was small. Notice though, that the steepest ascent method will still send us up in the correct direction if we ignore the interaction terms. The interaction term, if we had accounted for it, would send us in a slightly different angle. But in this example, the discrepancy is not so bad. Had their interaction term sent us into a different direction, we would definitely follow that direction instead of the steepest ascent that is determined only with the linear terms. But more on that to come with this topic of "curvature". Let's quickly go take a step in the direction for run number 15. And because you are good at this now, I am going to take a step of "Delta x_T" equal to the size two and the corresponding "Delta x_P" is equal to to minus two-thirds. You can do the rest of the calculations yourself and show that the predicted value of profit at this location is $742, and that corresponds to these real world values and these coded values. When we run the actual experiment, we record a profit of $735. That's an overestimate of $7. This overestimate is comparable to the main effect. And we also have visual evidence now of curvature. This is starting to tell me that I should change my strategy. When we start to enter a region of curvature in response surface methods, it is the presence of a change in the surface's linearity that's apparent. We're becoming more nonlinear, and likely approaching an optimum. It is desirable to know when this is happening. And one indication of that already is that our interaction terms are large, they cannot be ignored. And visually, we see that as these non-parallel lines in the contour plot. The second indication that an optimum is close by is that we are levelling out. Levelling out means that my outcome values, in the neighbourhood, are getting closer and closer, even when I'm taking reasonable step changes. Let's see this. The spread in profit values in the first factorial was around a $300 difference. In the second factorial over here, that spread was around $150. And now in this third factorial, my spreads are only $15 to $20. We're not making the gains we had made earlier. And if we're not careful, we can be affected by noise. If we don't know the level of noise around us, we might be misled. How do I know whether that spread of $15 to $20 is any different to the noise in the system? Another way to ask that is if we repeated those corner experiments, would we get similar values or different values? So let's go calculate what the noise level is. Run at least three or four repeated experiments at the same condition. And we typically use the baseline. So here at the base of the factorial. I previously had an outcome of $732, and two more runs give me an outcome of $733 and $737. So there's a spread of about $5. That spread is very different to the spread over the corner points of the factorial. Indicating, I'm still seeing signal over the noise. The third indication of an optimum, is whether our predictions are too high, or too low. We saw here at point 15, we had a prediction error of $7, just over our level of noise. This indicates the model can be improved. We often observe strong changes in the model's surface near the optimum. For example, if you're making a product, you want to make it long enough to bring out the beautiful colours and caramelization flavours that occur. But go just a little bit too far and it becomes burnt. We also see this in engineering systems. Often, our optimal point of operation is right at the edge of a cliff, and if we go just a little bit further, we fall over to the edge of the cliff and see our outcome value drop down rapidly. Another good reason to take small steps near the optimum. A fourth way to detect curvature is that our model does not fit the surface very well. A linear model cannot fit a curved surface well. And we use the terminology, "lack of fit", to quantify that. Let me show you. In our first factorial, the center point was $407, but the predicted center point was $390. That's a difference of $17. Now that might seem large, but it really isn't when we compare it to the main effect of 55 and 134. Recall what the interpretation of that number 55 is again? So a $17 difference really is small, indicating a small lack of fit. In the second factorial, the actual center was $657 while the predicted center was $645. A difference of $12. That again is small when compared to the neighbourhood we're in. In this third factorial though, the actual center is at the average of these three baseline values, $734. Compare that to the predicted center value of $724. That's a difference of $10. Which when compared to the largest effect of 7.5 and to the level of noise of about $5, indicates an important deviation in the model, versus the actual surface that we're on, at least in the center. So if we're getting large deviations at the center, we cannot hope to get good predictions outside of the range of the model. And good predictions are essential to optimize in the correct direction. So there are four ways that we've shown to check for inadequacy in the model. And those of you with a statistical background can go calculate the confidence intervals on the model coefficients, and observe that they're very wide. None of the terms in the model are statistically significant. Well, as we saw in the single-variable popcorn example, when faced with a poorly predicting model in a region that has curvature, we can add terms to account for the nonlinearity: "quadratic terms". So let's go add these now. There are two options: adding points on the face of the cube, or adding points a little bit further called "axial points" or "star points". These points are at a distance denoted as alpha from the center. Alpha is a value greater than 1 to ensure they are outside the cube. The design on the left works well if you hit into a constraint, or can not leave the factorial space. The design on the right, comes from a class of designs called central composite designs or CCD, and they're preferred for the statistical reason called rotatability. Just a quick aside, rotatability simply means that the prediction error is equal for any two points that are the same distance from the center. And it's a desirable statistical property. Now, there are various choices on the distance alpha and the number of center points to use, but that's a messy discussion that you can research quite easily. The general advice is this though: run the factorials first; then run the star points afterwards at a distance of alpha equal to 2\^k taken to the power of 0.25. So, if you have two factors, alpha = 1.41, and if you had three factors, you would have alpha = 1.68. Also, add three to four center points to assess lack of fit. And run these center points at different times, not all after each other. Notice this though, from the individual perspective of factor T and from factor P, each of these have runs at five distinct levels, and that's what helps us go accurately fit that quadratic model. Let's go do this! The first star point is run number 18 at a value of +alpha for factor T in coded units, and a value of zero in factor P. Let's add that to the table, and also calculate the real world units for it in the usual way. So that's 343 parts per hour, and a sales price of $1.63. You can go practice reproducing the other three star points, and let's go add one final center point experiment, number 22, so that we have a total of four center points. Now we go run these experiments, in random order of course, and report the values here in standard order. Notice firstly that the center point 22 is similar to the prior values indicating that the system is still stable and reproducible. Well we've got quite the collection of data here. A central composite design (CCD), always has the factorial points, center points, and star points. Now I've arranged them in that order in the R code. When we run that code, we get the quadratic model from them all. I will leave it as a small challenge to you, to go prove the following two things. Firstly, the model's prediction of the center point, when compared to the average of the four center points has a very small deviation. So this model fits well, at least at the center. Secondly, this quadratic model's prediction of the other points, for example, one of the corner points, or one of the star points, or even experiment 15 over here, is a very good prediction. There is little prediction error. So we have confidence in this model's prediction. Now let's go visualize those as a contour plots. And right away, we can see we are in fact near the optimum. Visually, the axial point is pretty close to the predicted optimum region from the model. That's good enough to stop here and use as our optimum. But let's say the quadratic model had looked like this one instead. Then you would go run your next experiment over here based on the model at that predicted optimum. And then you would go verify the model's prediction ability at that point to check that you've reached the optimum. Now we can be a bit more precise -- for those of you who don't like to trust the visual judgement. We can take this quadratic equation, differentiate it with respect to the coded variables, set it equal to zero, and you will get a set of two linear equations and two unknowns, which you can then solve using your favourite linear algebra software, or by hand. When you go do that, you get the predicted optimum at 343 parts per hour, and a selling price of $1.59. The quadratic model tells us to expect a profit of $740 at this point. Running that 23rd experiment gives an actual profit of $739; that's very close agreement. This is definitely the largest value we've observed over the entire approach followed. So this video has answered the last question we had in an earlier video in this module: "How do we know when to stop?" We know that we can stop when our model matches the surface well; and the model predicts an optimum. Using the model, we know that we've reached the peak of the mountain, even though we cannot see the actual mountain around us . So let's recap our entire approach. Start by building successive linear models, shown here in blue, green, and orange, respectively. I'm showing you the prediction contours in those colours for the local region around each model. Each of those local models had their baseline or 0-0 value. These past videos have also shown that we should incorporate the baseline points, as well as other points in the neighbourhood in our model, to help improve their estimates. We use our models as long as we have confidence in their predictions. We rebuild the model once we demonstrate those predictions are poor, judged by comparing the predictions to the actual values, and taking noise into account. As we approach the optimum, issues regarding curvature, which we studied in four points, become apparent. We have to change our strategy. If we pick up we have curvature, based on these criteria, we have to start decreasing our step size and to start fitting quadratic models. The principle of an optimum is that it's nonlinear. Points around us must be lower. And so our last prediction model that we build, shown here in red, illustrates that quite nicely. To end off with though, let me show you the true surface in a grey colour. This is obviously something you would never seen in practice. But seeing it here gives you good confidence that we're doing the right thing all along. You can see how the models in blue, green, and orange approximated the non-linear surface very well in their local region. Outside their local neighbourhood, they start to deviate. The non-linear model fits the surface over a wider region. That isn't too surprising. The information to build that non-linear model, required four plus four, plus four, or 12 experiments. And we use that non-linear model to place our final experiment(s) very close to the true optimum. To end this video, I will add one point: the real optimum, may move. Our system could deteriorate and change, so that optimum that you found - won't stay there. They are experimental tools that continually keep searching and moving towards the optimum. We won't have time to cover them in this course, but the topic of Evolutionary Operation (EVOP) is what you should search for if that interests you. It is particularly applicable to manufacturing systems that are never stable. That mountain is moving and you have to move as well in order to remain at the peak.