All right. So in this lecture, we're going to introduce some more concepts related to the mean squared error, as we introduced in the previous lecture. So in particular, we talked about MSE in the previous lecture, and said that might be a quantity we'd like to minimize. Well, if we're trying to minimize that quantity, what I really mean is, what value of the MSE is good enough, and that's going to lead us to this statistic, which will introduce called the R-squared statistic. Okay. So that's a basic question we're trying to answer here. How low does the mean squared error have to be, before we consider our model accurate or before we say it's low enough. Well, that's a very hard question to answer, and in particular, it depends. There's no one right answer. The MSE is going to be proportional to the variance of the data. So what does that really mean? Imagine you're trying to predict something like a star rating, and if you are typically off by half a star or something, you might consider that to be an accurate prediction. You're MSE might be something like a half or a half squared. Now, if you're trying to predict ratings out of 100, and values were off by five points, again, you might consider that to be a good prediction, but your mean squared errors will be much larger. You're dealing with larger values of the labels, so you'll have correspondingly, large values of the error. That's exactly what I mean when I say the mean squared error is proportional to the variance of the data or the variance of the labels. Okay. So can we write down a few equations to demonstrate that point? We did a few things here just the mean, the variance, and the mean squared error as we defined on the previous lecture. So just to remind ourselves of what those are, the mean or y bar is just going to be the summation over all of my data points from one to n of my labels. So when I say the mean, I'm talking about the average value of the label, and I'll just divide that by n. So the variance, precisely speaking, is a measure of the average data points variation from the mean. So that's going to look like the following. Variance of the label is going to be equal to, again, an average of all of my data points, of the difference between the mean value and that data point squared. That's our definition for variance. Mean squared error, just to remind ourselves from the last lecture, is going to look like the following. It's going to be the sum of all of our data points from one to n of the model's prediction xi.theta, minus the label squared, and we're averaging that by dividing by n. So already, you might start to see this relationship between the mean squared error and the variance. The second and third equations here look very similar to each other. The only difference is that in the second equation, I have this expression y bar of the average value, and in the third equation, I have the prediction of the model xi.theta. So that's the relationship between the mean squared error and variants of my labels. Okay. So here's the same thing, just written down a little bit more neatly. All right. So what I basically showed is that the mean squared error is going to be proportional to the variance of the data. So a more robust error metric, which is not going to be proportional to the variance of the data, would just be to normalize our mean squared error by taking the MSE and dividing it by the variance of our labels. This is going to be an expression which is known as the fraction of variance unexplained. The point of this is going to give us a measurement between zero and one. That measures the difference between a trivial predictor or one that is predicts the mean or the time and a perfect predictor, or one that has no error or MSE of zero. So think about a trivial predictor. If I just predicted the main everywhere, then my mean squared error is going to be equal to my variance. So the denominator will be equal to the numerator, and the fraction of variants on experiment will be equal to one. Otherwise, if it means squared error was zero, then the numerator would be zero, and the fraction of variance unexplained would be zero. So this is little bit more reliable, it's just an error measure that's going to give us a value between zero and one, to interpolate between a trivial and a perfect predictor. Now, the commonly used error measure is what's called the R-squared statistic. It's very closely related to the fraction of variance unexplained. Really, it's just one minus the fraction of variance unexplained. So this is just doing the opposite thing, where it's going between zero for trivial predictor and one for a perfect predictor. The reason we have these two different definitions is just due to different ways that one can arrive at these different statistics. But in this case, we say we have an R-squared value of zero, if we were to just predict the mean everywhere and R-squared value of one, if we were to make perfect predictions. So quick follow up question, would it be possible to have a negative value of this R-squared statistic? So could we do worse than what we said is a trivial predictor? So a definition of a trivial predictor was one that just predicted the mean all the time. If you'd like to estimate sentiment or a star rating, the most trivial way you might do that is to compute the average star rating, and predict that everywhere. But could you actually do something worse than that? All right. So a trivial predictor has an R-squared statistic of zero. But we could have something worse than that. For example, if we use the mean, we get an R-squared statistic of zero, but if you use anything other than the main, we'd actually get a negative R-squared statistic. So if we predicted all ratings zero, all ratings one, all ratings any other value besides the mean, this would kind of be worse than our most naive or our most trivial predictor, and that would actually have an R-squared statistic, that was a negative value. But for any reasonable value of a predictor you should invent, you should typically get an R-squared between zero and one. Okay. So that's the end of this lecture. Really all we did in this lecture was to introduce these two new statistics the fraction of variance unexplained and the R-squared statistic which is basically equivalent, and we explained the relationship between the mean squared error and the mean and the variance. So how can we develop some statistic that is normalized to the variance of the data, and we'll take this calibrated value between zero and one corresponding to a perfect or trivial predictor.