A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

Loading...

来自 约翰霍普金斯大学 的课程

Statistical Reasoning for Public Health 1: Estimation, Inference, & Interpretation

207 评分

A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

从本节课中

Module 4B: Making Group Comparisons: The Hypothesis Testing Approach

Module 4B extends the hypothesis tests for two populations comparisons to "omnibus" tests for comparing means, proportions or incidence rates between more than two populations with one test

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

So in this section we'll continue the party

of comparing multiple populations with one test and we'll

do it for proportions, and the test we'll use

is one we've already looked at, the chi-square test.

So in this lecture section you will learn to interpret a

p-value for a hypothesis test comparing

proportions between more than two populations.

And, you'll get some sense of the idea behind the test.

So, the method for getting the p-value

is an extension of the chi square test shown in

lecture ten, it is also called a chi square test.

So, chi square is the family of tests that

can compare proportions between two or more than two populations.

So let's look at our first example.

This is an article from the American Journal of Public Health.

Looking at healthcare indicators by immigrant status.

And what the authors did was use data from the 1999 National

Survey of America's Families, to create

three subgroups of immi, immigrant children.

U.S. born children with non-citizen parents.

Foreign born children who were naturalized U.S.

citizens.

And foreign born children with non citizen parents.

And additionally, a fourth group was considered in the comparison.

Which was U.S. born children born to U.S. citizen parents.

So, the author's going to say chi-squared, and then

we'll put these in parenthesis because this will be coming

up in the second term, but chi-squared was used

to examine relationships between immigrant status and health access variables.

So let's look at an example of what they're talking about here.

So here's a table that gives, and this was a

large table and this was only a part of it.

And then I'll, I'll blow it up in the next

slide so you can see it a little bit better.

But this gives health care status and health care access and

health care uses information by the four categories of immigrant status.

We have the U.S. born children born to

citizen parents, the U.S. children born to non-citizen parents,

the foreign born children who are naturalized citizens

and the foreign born children who are not citizens.

And you can see they have a bunch of indicators here so.

The proportion for example who reported their current status at

the time of the survey as being fair or poor.

I also want and so forth.

So let's look at some of the indicators of health care access.

Let's focus in on lack of medical insurance

at any time in the past 12 months.

Well what they do here is they report the proportion

in each, of children in each of the four citizen

groups who's, who's reported that their family had lacked

medical insurance any time in the past 12 months.

For, for the U.S. chi, citizens born to U.S. citizen parents, that was 15.34%.

the U.S. born children born to non-citizen parents, it was 34.4%.

For the naturalized citizen children it was almost 13%.

And for the foreign born non-citizen children it was 52.3%.

So you can see there's a lot of variation in these four summary measures, and it

does appear that there is some association between

non-citizen status of the child or their parents.

And increased risk of being lacking in medical

insurance in 12 months prior to the survey.

But of course we would want to actually account for the sampling

variability before making the final conclusion.

So what we could do is a chi squared test and the chi square

approach for this set, would actually extend the null as we had seen before.

It would compare the proportions lacking

health insurance between the four citizens groups.

And the null, is that the proportion

who of, children whose families were lacking health

insurance in the year prior,

to the survey, was equal, in the

underlying population from which the samples were taken.

And the alternative,

is that at least two of the four groups, two of the

four populations,

could be more than two, but at least two have

different

proportions. So just a FYI

in case you're interested, the way this test works is, really, totally an

extension of the way we solve, or

when we're comparing proportions between two populations.

We could actually take these data for this outcome and for all outcomes

if we wanted to, and show it in the two by two table format.

So our outcome interest here is lack of health insurance.

So we could do lack of health insurance for those

who at least lacked some of the period in the 12

months prior to the survey versus no lack, and we could

do this two by two table, we could report the numbers

in each of our samples, who lacked and who didn't among the

four immigrant groups.

And that would report the data that they observed in the study, in the survey.

And then what the researchers could do, or what the computer would really do is

behind the scenes create an expected table, what the expected distribution

of these counts would be across the four immigrant groups if the null were true.

And then just like we saw before the chi

squared test creates a measure of discrepancy between the observed,

between the observed and expected,

and then compares that to the distribution expected of

such discrepancies when the null is true the distribution reflecting

the sampling variability in this discrepancy across samples that could have

come from four populations with the same underlying proportion.

And we turn that into a p-value to figure out how likely our sample

is amongst those that could have happened under the null.

So just for your information, the p-value for this would be

obtained from a chi squared test with three degrees of freedom.

So instead of one degree of freedom which we had with

the two-by-two table comparisons, here where we have a four-by-two table

the degrees of freedom is three for the particular distribution of

the distribution of the differences

between the observed, inspected values under

the null hypothesis.

Generally speaking, when performing a chi squared test to where you

have a variable number of rows and columns, the resulting distribution for

the null hypothesis, the degrees of freedom for it, is the

number of rows minus 1 times the number of columns minus 1.

So in our case we had 4 rows, 4 minus 1 is 3,

2 columns 2 minus 1 is 1, so 3 times 1 is 3.

This is just for your information because

sometimes you will see the degrees of freedom

referred to in a paper and it will always be 1 like we had seen before.

So, here we go back to lack of medical

insurance at any time in the past 12 months.

There's other things in this table, like no usual

source of care, other than the emergency room, etcetera.

And what they so, and I've cut and pasted certain

parts of this table, just so we could see it.

But what they show in the footnotes is

that all chi squared p's were less than 0.05.

So what they mean is for every comparison they did of these axis and utilization

measures across the four immigrant

populations, all results were statistically significant.

Meaning that all of these were statistically

associated with the immigration status of the children.

Now of course this doesn't tell us where the differences are, what groups

have the biggest differences or which groups

are statistically significantly different from each other.

But I think it's pretty clear from looking at a lot

of these measures that, the groups

with either non

citizen parents or the children are not citizens have worse outcomes and it seems

like there's that's pretty much where the divide is between the two

groups that are non citizens versus the two groups that are citizens.

But if we wanted to, we have

the proportions, we could, they give us the standard error here.

Of course if we knew how many people were in these

groups, which appears elsewhere in the data, we could estimate that ourselves.

We could go ahead and put a confidence intervals

on the difference in proportion or the relative risks.

Comparing each two way comparison of immigrant groups, if we wanted

to quantify that and look at the group to group significance.

But, on the whole, this is telling us there is an association

between healthcare utilization factors and immigration status.

And, this gives us the summary data to help us understand what that looks like.

Let's look at another example.

Out of pocket spending, medication adherence

amongst dialysis patients in 12 countries.

And so what the authors did is

they used representative samples of dialysis patients

from 12 countries, and they are ostensibly interested in looking at out of cost,

out of pocket medication spending. And cost-related non-appearance.

But in doing this, they also provided a description of their sample, which

is important to understand who is

in these different countries dialysis patient pools.

So in the paper, and I'm only showing a part of this

for space reasons, but they showed

descriptive measures of the cross-sectional patients

sample for dialysis patients in the 12 countries.

And here they have the first six.

Australia, New Zealand, Belgium where there's

468 patients, Canada where there's 503, etcetera.

And then in this table, they give us characteristics.

So, for example, they look at the proportion of patients in

each sample who report being a minority in their own country.

So in Australia and New Zealand,

it's 21.5%, in Belgium it's 5.3%. 18.7% for Canada, etcetera.

So, the authors didn't choose to do hypothesis testing for these

descriptive measures, but if they wanted to assess whether there was

an association between minority status and the population of dialysis patients

from each country, they could do a chi square test to test

the null that the proportion of minorities was

the same in the underlying 12 country populations,

versus the alternative that at least some countries were different.

I won't write this out fully, but.

And, they could use a chi-square test to do that.

Another utility of the chi-square is it gives us a

quick snapshot for comparing outcomes that are more than two categories.

And we won't do much of that in this class but I just want to point this out to you.

So, for example, something like income level,

and this has been standardized to the U.S.

dollar, so there's three levels not two.

And they actually show the distribution in

each of the country's sample of dialysis patients.

If they actually wanted to actually test whether the income

distribution differences were statistically significant,

in other words they different between

at least some of the countries at the population level,

they could do a chi squared test for this as well.

So you can kind of think

of it as comparing multiple proportions Between more

than two groups so, if it would be something like this

the null I'm just going to say, that P1, which

is the proportion of individuals in the less than

20,000 income category, for example, is the same

between the 12 countries. And additionally, P2,

which is the proportion of the $20,000 to $39,000

income group is also... The same

between the 12 countries. And then what's implied, since there's

a third group is we could write that out but we give the first two proportions

of the same across the two countries then the third would have to be.

So you can, this chi-square can be extended to test

for the different distributions of

multi-categorical outcomes across groups as well.

So sometimes you'll see p-values reported for that

type of comparison in this type of table.

Also, if we were actually doing hypothesis testing,

and putting p values to these data, we could.

You can see I'm

just letting this creep in from the previous

lecture, but they also report continuous measures like age.

The average age of the subjects in each of the samples from each of the countries.

We could do another bunch of tests whether the average age of

dialysis patients was different between at least some of the 12 countries.

Finally, let's look at our academic physician salaries study once again.

Remember this study is the one that compares

the salaries of male and female academic physicians.

In the last section with ANOVA we looked at a table where they showed us different

factors and the mean salaries in the different groups by factor and they used

a nova to test for example whether the mean salaries were different by the four

tiers of the NH funding or the four

regions of the country which the facility sampled

for etcetera.

And they wanted to show what factors

besides sex potentially were associated with salary.

Since they're also interested in looking at the relationship of salaries by sex,

they also want to show what the relationship between sex of physician is

and some of these other characteristics, so they presented another table where they

do this, and they look at the race distribution between men and women

At the age whether or not the

researcher has children, the marital status, etcetera.

Let me just hone in on this and give you an example.

So here's something where they give a p-value for.

So what they're doing is comparing the racial makeup of the males and females.

And seeing if they're statistically different.

So you could think of this for the null test.

If, you know,

one way to think of this is comparing the proportion

of females across the five different racial groups,

and the null could be phrased that the proportion

of females is the same, in the white Asian Pacific

Islander, African American, other and then

they included unknown for people who didn't report their race.

So they have five racial groups.

You can think of the null as being

that the proportion of females as the same versus

the alternative then at least some racial groups

differ in their proportion of females and hence males.

The p-value from this is 0.7, which was higher

than the threshold of 0.05, so the researchers would fail

to reject the null.

And given the larger sample size likely, conclude that there is no association

between race and sex in these population from which the sample was taken.

Other factors, however, do show an association.

If you look at the distribution of marital status, comparing the males and females,

the p-value from a chi-square is 0.02, indicating a statistically

different proportion of males and females and at least

some of marital status groupings of the population level.

So in summary, this pretty much looks about the same conceptually as what we

did with the nova, but the chi squared allows us the flexibility to compare,

binary outcomes across two as we saw in lecture

ten or more than two as we've just seen, populations.

And it works on the same exact principles all our

hypothesis testing it starts by assuming the null is true,

[NOISE]

compute a distance measure.

Or measure a discrepancy between what was observed in

the study and what would be expected under the null.

Then looks at the variability of that discrepancy measure

due to sampling variability; when the null is the underlying truth.

And figures out whether the study results are

consistent or inconsistent with the null, filters it through

a p value and allows us to make a decision.