So, let's go back to our classroom example.

So we've been talking about the blocks and we can see how that works,

but let's go back to the classroom example now.

So the blocks we were sampling, I know it wasn't people but

it was some kind of records perhaps for these housing units.

It was some kind of data for the housing units.

But maybe it was also for

the people who were there in their aggregates in units called households.

But now let's go to another example of cluster sampling.

Or return to one in which we have 1000 classrooms in our school district of

elementary school children.

Maybe they're in their first year, their second year, and

we've sampled now in this case 20 of them.

So instead of doing ten as we did before and take all 24 children.

What we're going to do in this case is sample 20 of them and

take 12 children in each.

Now why would we do that?

Well because we know that that design effect is driven by how many,

in part, how many elements we select per cluster.

If we do half as many kids per cluster, the design effect should go down.

And, indeed, if we were to do this and empirically examine the results,

we would see design effects decreasing when we did this.

So from the capital N, we've drawn a sample again of 240.

But, in this case, lower case a is 20 and lower case b is 12.

We're taking a sub-sample.

Here are the results now for the immunizations now.

You can see what I'm telling you about.

We now have twice as many classrooms.

We have different rates, different fractions for these.

I put them on two lines.

That's the data we're working with.

And we have a couple of classrooms here where everybody's immunized.

And the smallest is about a third of them being immunized.

And the sum of the numerators there, just for this illustration, is 160,

so that I've got the same result that I had before.

It's just that I've got it spread across twice as many clusters.

Well the overall proportion immunized is still 0.67.

This design is unbiased, the sampling process is unbiased for

the proportion or the mean.

On average if I did all possible cluster samples of 20 with 12 elements each.

It's a complicated design, and

counting the number of samples it's a more complicated process.

But on average I'm going to get the right result across all of those possible

samples.

And the sampling variance, as we just noted,

would be calculated in the same way.

Treating the clusters now of 12 students per classroom.

I'd get the immunization rate for each of the 20 classrooms,

and look at the variability of those around that 0.67, just as we did before.

Calculating an Sa squared, add then a 1-f/a Sa squared.

But now my sampling fraction is comprised of two parts, a/A, 20 from 1,000, 0.02.

And then 12 from 24, one half.

It's the same sampling rate as we had before because it's the same sample size.

Lower case a20 times lower case b12 is still 240.

So I've still got the same sampling fraction but

now I've divided it across these two stages in the sample selection.

But I've got the same sampling variance and

the same standard error calculation that we've been doing, that we did before.

I won't go through and do the calculation here.

If you want to you could try it out and

see what you get in the way of sampling variance and design effects.

We're going to get a design effect though which is a ratio of the actual variance to

the simple random sampling variance for a sample of the same size.