Now, totals are the easiest thing to talk about.
So let's think about those.
If you've got weights that are scaled in such a way that they take
the sample up to the population they project the small set that
you've got up to the big set of the population.
Then you estimate totals in the following way.
You sum over the sample units.
And that's what i an element of s means.
i and x is the units, s is the set of sample units, sum all of those and
we take the weight for each unit times its data value.
And that'll be an estimated total for whatever this y variable is, income.
If y is zero or one, it could be a number of people who have got diabetes or
number of people whose water supply is somehow
contaminated, it can be all sorts of things.
Now, for the mean, all we do is we take that estimated total
here and we divide by the sum of the weights.
Now, again, if the weights is scaled to estimate population totals, the sum
of the weights is going to be an estimate of the number of units in the population.
Also, if we were to sum over just the subset like the males in your sample,
if you're sampling people.
That will be an estimate of the number of males in the population.
It will be an estimate of the count of units in whatever subgroup you sum over.
So that's very handy thing about the standard
way of constructing complex survey weights.
Now, model parameter estimates typically depend on estimated totals.
So if you can figure out how to estimate totals you can typically
figure out how to estimate model parameters.
And there are routines in the software that will do that for you.
Quantiles are a little bit different and
the software choices are more limited, but here's how the algorithm goes.
What we do is we first identify the variable we want a quantile on.
So this would be the quantitative variable like income.
Where years of education or something like that.
So we sort the file from low to high based on that y variable.
And then associated with each unit we've got a weight.
So we cumulate the weights until we reach a certain point.
So in the case of estimating the median we want cumulate
the weights until we get to the 50% point.
And 50% of the sum of all weights is reached.
And then what you do is you look for the y value for
the first unit that's got a cumulative of 50% or more of the total weight.
And that'll be your median value.
And sometimes that requires some rounding off because of the discreteness of
the sample.
But that sort of thing is built into the software also.