Okay, so let's work on our press residuals and show that we can fit them without actually refitting the model. Just to remind ourselves, the press residuals were the ith data point minus the fitted value for the ith data point where that was fitted with the ith data point deleted. So to do this, there's really two kind of crazy tricks that you'd never really think of on your own, or at least I wouldn't, that really just make it work. And one of them is how we write some matrix multiplication. And the second is the Sherman–Morrison-Woodbury Theorem. The Sherman–Morrison-WoodburyTheorem is one of those things that is just kind of really a clever little matrix inversion technique that seems to get used over and over again at regression. So, let me let x equal to Z1 transpose up to Zn transpose. So, this is different than how we normally partition x, where we usually partition x as say, x1 up to xp. Its columns. Here, I'm partitioning x in terms of it's rows. However, I'm writing each of its rows as a column matrix, or as a column vector. Then, consider the matrix x transpose x. That works out to be the summation of the Zi Zi transpose, given how we've written it out there. So The x transpose x matrix where I've deleted the ith data point then. Okay is just the sum over the let me say i prime not equal to i where I'm summing over i prime. Of Zi prime Zi prime transpose, where over here, I would say summing over i prime but I was summing over high colon n all of the indices. So I can write my x transpose x matrix with the ith data point deleted. As x transpose x [NOISE] x transpose x minus Zi Zi transpose, okay. Simply because x transpose x is the sum over all of them and x transpose x minus the i data point is just the sum minus that extra data point. Then we can invert this matrix, because remember, if we're going to find the fitted values, having fit with the ith data point deleted, we need x transpose x inverse. Okay, so we need xi minus i transpose x minus i we need that whole thing inverted and we've written it out like this so we can use the Sherman Morrison Woodbury Theorem so if you look back to Sherman Morrison Woodbury Theorem this works out to be x transpose x inverse plus x transpose x, inverse Zi Zi transpose x transpose x inverse all divided by 1- Zi transpose x transpose x inverse Zi. This denominator here is a scalar. Now another nifty little fact. If I were to take the hat matrix h of x which is x, x transpose x inverse x transpose. Okay, and then imagine pre and post. If I just wanted to grab the ith diagonal of the hat matrix. Well, one way to grab the ith diagonal would be to post multiply it by, say, a vector delta i. Which has let's say delta i transpose, which. Or delta i, which is m by 1. But that also only has a 1. It's a bunch of 0s with a 1 in the ith data point. And then we'll pre-multiply it by delta i transpose, which is the same vector. Again, this will just grab the ith diagonal data point of any matrix, right? Pre-multiplying it by a vector that grabs the ith row and post multiplying it by a vector that grabs the ith column will just grab the ith diagonal. But then we can see that when we multiply here and we multiply here, we get Zi transpose, x transpose x inverse, Zi. So this denominator right here and let's call that hii for the ith diagonal, the hat matrix. This diagonal right here is 1- hii. Okay? Now, we need to keep working along here. The second thing we're going to need is x- i transpose x transpose times the y vector minus i. With the ith point deleted. Well, using the same technique as before with matrix multiplications we can see that this is the ordinary x transpose y with the full data in the ith data point included. Minus zi yi. And you can see that using just the same logic as before. Now, We need to get yi hat Minus i which is the fitted value for the ith data point where the ith data point has been removed from the fitting process. Okay. So what is that just by definition. That is just the fitted regression coefficients. X minus i transpose x-i inverse x-i transpose times y-i, that would be the, this would be the beta coefficient. I'm sorry, this would be the beta coefficient that we obtained by removing the ith data point from both the response and our predictor. And then if we want the actual fitted value added we would pre-multiply by zi transpose. Okay, so that would be what y hat minus i is. So now, let's plug in some of our quantities and show that this simplifies quite a bit. Okay, so if you go back to our earlier definitions. Okay, there's my zi transpose, then my x transpose x inverse, right? I'm going to make it big here, is x transpose x. X transpose x inverse + (x transpose x inverse) Zi Zi transpose x transpose x inverse all over 1- hii. Okay? So that takes care of that part right there. And then if I take this part right there, using our work from before. That's x transpose y- z i times yi. Okay now we can get some simplification. So when I take this quantity, and multiply it by x, transpose x inverse and multiply it time that quantity, I get good old fashioned y i hat. The fitted value for the ith data point where the ith data point was actually included. In it. Now, I'm going to take this value again, and I multiply it times this thing, okay? I'm going to get, and let me just do some scratch work over here. What I'm going to get is Zi transpose times x transpose x inverse Zi, Zi transpose x transpose x inverse times x transpose y over 1 minus hii. Well this right here is 1- hii. This is just hii. And this right here is just yi hat, okay? I'm going to erase that so I have some room. So what I wind up with is hii over 1- hii times yi hat. Okay, so that takes care of multiplying times this guy, okay? Now let's take this term right here and multiply it through. So we get a minus, okay, and then we get a minus zi transpose x transpose x inverse times Zi yi, and that quantity right there is hii, okay? So, that takes this term and multiples it times that one and then times the Zi out front. Okay, now let's take this term and multiply it times this part. And then we're going to get minus Zi transpose. X transpose x inverse times Zi. ZI transpose x transpose x inverse. Times zi yi and then all over 1 minus hiii. And then you'll see that this is hii and this is hii. So, what do we get? We get yi hat plus hii over 1 minus hii times yi hat minus hii times yii minus hii squared over 1 minus hii times yi. Okay, now you can work through the arithmetic here yourself. And get that this can work out to be yi hat over 1-hii + yi- yi/1-hi. And just remember what we're solving for is yi hat minus i. Okay. Now, then if I were to take y minus yi minus yi hat minus i. Then when I subtract yi minus yi hat, this term right here the yi will just go away and I just get left with these two terms right here, that term and that term, okay. And so it works out to be yi- yi hat over 1- hii. Kind of a startling result. Or we could just call it the ith residual over 1-the ithi hat diagonal. So it is kind of a startling result. If you take the ordinary residuals and divide them by 1-the hat diagonal, the relevant hat diagonal, then you get the same residual that you would obtain by refitting the model with the ith data point removed, okay? And then the prediction at that data point. So in other words, if you're familiar with cross-validation the in linear models the leave-one-out cross-validated errors actually do not require you to refit the model. So kind of a startling little result. And it also goes back to show that we know exactly what the term would be from our mean shift model that we covered in the previous lectures in the other way where we motivated the press residuals. So I find this to be a tremendously useful result, especially in big data sets. And I, something that I happen to use all the time. And it turns out that it's just a consequence of some clever tricks. You know, not tricks that would be easy to just come up with on the spot, but after you see them they're not hard at all. Okay, great, so now let's do some computing experiments.