In the previous videos, we looked at some methods for imputing missing values. And the fact that you're making up data, those are not real data, needs to be accounted for when you estimate variances, so that's what we'll cover in this section. Now the typical effect is that imputations add variance to estimates, as opposed to what you'd get if those were real data that were not generated by you. So we don't want to treat those imputations as real data because you'll end up with standard error estimates that are too small for most estimates. Now how do we account for that added variance? There are different ways of doing it and there are different theoretical arguments that lead you in different directions here. One approach will lead to specialized formulas that depends on how the imputations were made. So those are inconvenient, at least in the sense that these specialized formulas are not available in most pieces of software that people use. Another one that is more readily available is called multiple imputation, and it is in some pieces of software, and I'll give you an example of that. One thing about it is that it does require some randomness in how imputations are generated. But in the previous videos, we saw how you can do that, how you can add some randomness. So that's not a big hurdle here. Now, how do we do this multiple imputation? The idea is you impute more than one value for each missing value. So, in general, the usual notation is to say you impute little m values, m equals five is a popular choice. But you could certainly generate more than five, if you've got a lot of data. There must be, as I said, some random element in how the imputation is made to allow that. And then, you use a special formula to account for the imputation variance. So just in words, what it looks like is the variance of your estimator that includes some imputed data is equal to the variance treating the imputations as real. Plus the average of the variance between estimates using different imputed values. So we'll see the exact formula in the next slide or so. So here's the idea, we compute an estimate, call it cat Q of t, for each of the t = 1 through little m completed datasets. So if little m is 5, you've got 5 estimates. And what does a completed dataset mean? It means you take the real data that you observed, you keep it every time. And then for the missing data, you take the t'th imputed value and you fill it in it, then you generate a Q sub t. So for the next value of t, number 2, say, you take all the complete data still, that stays the same. And then you just go up to the next imputed value, the second one for each item. So we generate little m of these Qts and then what's our estimate that we report? It's just the average across these Qts. Next we calculate a variance estimate. And, how do we do that? The first thing we do is, we calculate a direct variance estimate and call that U sub t. That's from the t'th completed dataset. So U sub t can be many different things, it's any variance estimate that's appropriate for your sample design and estimator. It could come from an exact formula, it could come from linearization approximate formula, it could come from a replication variance, all those are possibilities. So, for example, if you get a stratified simple random sample without replacement and my Q sub t is really the stratified sample mean, call it y bar t. Then what I do is, I calculate U sub t is equal to the usual formula for the variance of a stratified mean. I sum over the strata this is, Wh is the proportion of the population stratum at h. I've got 1 minus the sampling fraction in that stratum. And then this s squared sub ht is the estimate of the unit or population variance among your y variables in stratum h based on the t'th completed dataset. And little n sub h, as usual, is the number of cases or units that we sampled in stratum h. So that includes cases that may have imputed data or missing data. Now, what do I do with those Uts and the Qts to get a final variance estimate? I put them together this way. I want the variance of my Q bar estimate. Remember, Q bar is how we combine the little m complete dataset estimates. I take the mean of the Us, the mean of my direct estimates. And then I add on 1 + 1 over m, the number of imputations, times a factor capital B, and what's U bar? U bar is just the mean of the direct estimates of variance. B is the variance among the Qts, the completed data estimates, that's completed in past tense with a d on the end of it. So this includes the imputations that I've made for the t'th dataset. And this is a variance that's going to be positive. So if I had no imputations, this U bar would be just kind of a replicated estimate of the variance. This B term is the increment that accounts for the imputation variance. And you can see that if I make a whole lot of imputations, 1 over m is going to be a small number, so this factor here would be close to 1. Otherwise, for m = 5, say, it does amount to something and it needs to be included to get an approximately unbiased variance estimate. So this formula is nice because of its generality. You could have mean imputation with a random error. You could have hot deck with random draws from complete units. You could have regression with a random error. You could have predictive mean matching. All those things involves some randomness, all of them can be inputs to this multiple imputation variance formula. Now the pros and cons of multiple imputation. There are a lot of advantages that are listed here. It's got a simple variance formula. The same variance formula applies to many types of estimates, means, totals, quantiles, other things, regression parameter estimates. In contrast to some other competing methods, you don't have to derive any formula for every different type of estimator that you're looking at. The point estimates and variance estimates of point estimates are approximately unbiased, if the imputation model's correct. It uses all the available data, you don't lose any cases, so that's good. One disadvantage of the MI variance estimator is it can be positively biased, too big in some kinds of cluster samples. Now this may or may not be a serious problem depending on your particular application. But, generally, having a positively biased variance estimate is considered conservative and not a terrible error to be making. So that's a disadvantage that may have less weight than these advantages do because of the generality in multiple imputation.