Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Loading...

来自 Johns Hopkins University 的课程

Mathematical Biostatistics Boot Camp 2

41 个评分

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

从本节课中

Discrete Data Settings

In this module, we'll discuss testing in discrete data settings. This includes the famous Fisher's exact test, as well as the many forms of tests for contingency table data. You'll learn the famous observed minus expected squared over the expected formula, that is broadly applicable.

- Brian Caffo, PhDProfessor, Biostatistics

Bloomberg School of Public Health

Hi, my name is Brian Caffo, and this is

mathematical biostatistics bootcamp two, lecture eight on Chi-Squared tests.

Okay, so specifically when we talk about chi-squared testing,

there's a lot of different variations of chi-squared testing.

Here we're going to be talking

about chi-squared testing for contingency tables.

and the most classic contingency table test is testing independence.

we'll relate that to testing independence of several proportions.

We'll go through the natural

generalizations to higher order contingency tables.

and then we'll talk about Monte Carlo variations

to get exact tested independence.

we'll finish it off with a discussion of

goodness of fit testing, which is a special kind

of contingency table test that's useful for testing

whether or not data arrive from a particular distribution.

Okay, so we've talked about testing in a 2 by 2 table, equality of two proportions.

say binomial proportions.

And an alternative approach to doing this the so-called Chi-squared testing.

And the form for the Chi-squared test is the same.

It's the summation of the observed data, the

observed cell counts minus the expected cell counts.

And I'll talk about what that means later.

square divided by the expected.

And we will talk about where the motivation for this formula comes as well.

So the, here the observed and the re observed counts.

The expected or the expected counts under the non-hypothesis, and this sum

is over all of the cells of contiguity tables, successes and failures.

And in the 2 by 2 table cast for

testing a quality proportions, this ones are being I squared.

this distribution.

I know the null hypothesis has a Chi-squared distribution

with one degree of freedom.

and it turns out the, the, and what,

maybe I'll, I'll assign this somehow as a homework.

the Chi-squared test is exactly the square

of the difference in proportion score statistic,

where you have the The common proportion in the denominator for the standard error.

and notice because the, so, so notice,

basically the statistic is a, is a distance.

It's the

distance between the observed cell counts and what would

you would expect to, to get under the model.

but notice it's squared so there's no directionality.

It's just whether it's, things are kind of different than what you would expect

to see under the model, so, under the model, under the null hypothesis.

so it doesn't test directionality, it doesn't test whether, it,

it tests it, it's effectively to not equal to hypothesis.

This is always.

So, let's go back to our familiar example where we were, we had a treatment.

here the treatment is x and y.

Let's say x is the treatment and y is the placebo.

And we want er, probably more accurately.

Let's suppose we're comparing two equally effective treatments,

and we want to see if one adds more.

side effects in the other.

So x and y are two different treatments, and we want to determine whether or

not one has more side effects.

Here, we were, we assigned 100 people to x and 120

people to y, and we're going to treat the binomial, the counts

of the number of people with side effects out of the

total as if they were binomial, that will be our model.

Okay, and here are the rates of side effects in two proportions.

p1 for x and p2 for y.

And we're interested in whether p1 equals p2.

Okay, let's see if we can logic our way through implementing this formula.

So the x-squared statistic I've said is the summation of the

observed counts, minus the expected, over the expect-, squared over the expected.

Well part of it's easy, right?

The observed counts o-11, o-21, o-12, and o-22, if we label the cells

of the rows and columns of the two by two table that way, then we have those.

44, 77, 56, and 43.

Now what about the expected counts? Well, let's kind of think this through.

Okay.

If the rate of side effects was the same, for the two groups, for the two

for the treatments, then our estimate of the common proportion,

just like in the denominator of the score statistic, would have to be 121 over 220.

Right?

And now, we don't know that, so let's, let's use that

as our best estimate as for the overall proportion of side effects.

Regardless of treatment because under the

null hypothesis, the proportions are, or the

true proportions are assumed equal. So that's our estimate of that.

And then how many people we expect to see with the side effects.

well if that's the estimate of this common proportion than

100 times that would be, we would expect to see 55.

and so on.

And you go through it we would expect to see, out of the

120 receiving the second treatment, we would expect to see 120 times this

proportion or 66. I'm rounding here.

The expected counts by the way don't have to be integers.

They shouldn't be integers. You should carry out the calculations

to, y'know, many decimal places.

I just, I don't know why I round them, I think for

didactic reasons, I rounded them up for images here, for integers here.

But in general, the expected counts don't have to be

integers in the same way the observed counts must be integers.

so any rate, just a reminder, you want to carry

these calculations out not, not just round them at 55.

Okay and so on, and then, and then

1 minus this probability is then 99 over 220.

and that's how many, non-side effects we would, at

times 145, is how many non-side effects we would expect

in the x treatment and 99 times 120 or

54 is what we would expect in the Y treatment.

So these counts here on the right hand side.

Represent what we would expect to see under the null hypothesis using

our best estimate for the common proportion.

So the margins the 100 and 120 are fixed

by the design, and the best estimate of our common

proportion is, is that one, so this is kind of

our best guess is what we would expect to see.

And if these counts are very different from the observed count, then

that would shed some light maybe that the new hypothesis is not true.

Okay, so then our test statistic is 44 minus 55 squared over 55 and so on.

add all those up. And it turns out to be 8.96.

compare that to a chi-squared with one degree of freedom.

And again, we're rejecting for large values, right?

Because this is the distance between the observed and the expected

counts, we favor the alternity of the further away from the expected

counts we are, so the bigger the test statistic is, is going

to favor the alternative, so we're going to reject for large values.

So lets do pchi

squared 8.96.

It's one degree of freedom when you have a 2 by 2 table.

I am going to give you a general role when you don't have a 2 by 2 table.

And we say lower tail goes false, because we want

the upper probability, not the lower probability, the result is 0.002.

In the other way you can think about this of'course is chi

square with one degree freedom, is actually the square of \a standard normal.

So it's unlikely for a standard normal to be above two or

below minus-two, right?

There's only a 5% chance of that happening.

So Chi-squared, it's unlikely going to be above four, right?

The square of two and the square of minus-two.

So that's going to have about 5% probability, so chi-squared over about

four is about the same benchmark as a normal of about two.

A chi-squared of about nine is

about the same as a standard normal for about three.

Again, remembering that you're testing both bigger than two and less than

two, or bigger than three and less than three, because remember the chi-squared.

Always does a two sided test.

So in this case, the result is 0.002, there is some evidence

to suggest that there is a difference in the rate of side

effects between the two treatments, though of course the side effect, the

result of the chi squared test doesn't tell you which direction it goes.

[INAUDIBLE]

Okay. So that's how we do it.

And here's some simple R code for executing it in R E, so you don't have to

do the calculations with a calculator. so we just create a data matrix.

It's this matrix command here, and then chisq.test(dat).

And then, you'll notice, if you do this, you

don't get exactly the same test statistic that we got.

And the reason is because the chi-squared approximation

you know, it's the, it's an asymptotic approximation.

But the counts are discrete, and you can improve the chi-squared approximation by

fudging a little bit, you know, in, in the way that, that, you know,

In a way that kind of, if you're doing kind of numerical integration with

boxes, you can maybe do a little better with trapizoids, or something like that.

It's along that line of thinking.

And that boils down to adding a little bit to every cell.

And that, that's called a continuity correction, basically accounting for

the fact that the cell counts are discon, are, are counts,

and it can improve on the asymptotic approximation.

So if you actually put in correct equals false, it won't do that continuity

and correction, you'll see then that you'll

get the exact same answer that we did.

You do want to do the, so for didactic purposes, we're

not presenting the continuity correction, but when you actually do the test,

you want to leave correct equals true on there, because it

does yeild a better approximation, that's why it's the default in r.

Okay, so let's recap.

We're going to reject if the chi-squared statistic is too large or is large.

the alternative is always two sided.

You know, you're always comparing whether the proportions are different.

You do not divide your alpha by two, even though it

is a two sided test, remember, we're dividing, the reason we're dividing

the alpha by two In the standard Gaussian cases, because you're checking

bigger than, you're checking less than, because we've squared the statistic and

the chi-squared is only a positive statistic, we don't need to do that.

alpha divided by 2 for the quantile.

a small chi-squared statistic implies

little difference between the observed values

and those expected under h nought, so it supports h nought.

you can think of the chi-squared statistic as actually distance

and then what we'll talk about, and it's really kind

of a fun subject I think, the chi-squared statistic and

approach generalizes to other kinds of tests and larger contingency tables.

It's also one of these phenomenons that often occurs in statistics where

the same procedure arises out of several different settings and data structure.

And so the interpretation changes but

the actual procedure stays the same and we'll

go through another one next where we have

we think about this problem in a different

way we wind up with an identical procedure.

And that happens a lot in statistics where even though you think

about the problem in a different way, you get the same procedure.

The mean is general, is frequently a good estimator you know even if the data

is IID exponentional, or IID normal, it's

going to estimate the mean into the data well.

And so the mean,

is, you know has, it pops above over the place.

Well you know the same sort of

thing happens in Chi-squared testing where you get

kind of the equivalencies despite very, very different

sampling strategies and, and assumptions underlying the data.

And there's a neat com-, computational form.

In a two-by-two case, it looks like this, where

I'll go, I'll have the notation on the next slide.

But where here the n

i,j's are the cell counts.

And you put a little plus in front of the index, if you're summing over that cell.

So these are the margins.

And here briefly is the notation that I'm

going to be using, where i-j indexes, cell count.

I'll call n plus one meaning something over the first index.

So that's this margin. N plus two is this margin.

And one plus is this margin. And two plus.

Its this margin.

And if I need to refer to n1 and n2, let's say those are

the row margins, and then n'll be the sum, this, this, corner cell right here.

The sum of, the total number of observations.

So, you know, kind of an interesting fact about the

chi-squared statistic if we look at it We transpose the table

the statistics doesn't, doesn't actually change its value, which

is kind of interesting, right. It, it means it doesn't care about which

margin is sort of fixed by the samp, the study design.

It, it, it's, it doesn't care about that. and it, and it, and if you errantly was

thinking of, of side effects as being the, outcome, you get the same thing as

if you were thinking about which treatment they received was the outcome.

You are going to put the same test and you know only

one of them, they are correct, way in which the experiment was conducted.

So that's interesting, and it ties into, its utility then ties into a lot of these

instances where, you, you really want to play around with the interpretation

of the, of the, what variable is the outcome and what variable is

the predictor, and we'll talk about that when we talk about case control studies.

It's interesting that in case control studies, some of the fundamental

work was done right here at Hopkins by, by person in cornfield.

Any rate,

the so the, so the Chi Squared statistic, it can rise.

You can state a model for which the Chi

Squared statistic is kind of the obvious thing to do.

if the rows are fixed, so you have binomial, you can do it.

If the columns are fixed, which is just

maybe a different kind of binomial if you're.

You know, again here, we're assuming.

Binomial Ness of the data, neither of these cases.

But let's imagine if you didn't assume that the, data were in binomial

that the rows or columns are fixed.

Let's say that you assumed that only the total sample size is fixed.

So imagine as an example of that You collected, instead of randomly assigning

100 people to receive the treatment, and randomly assigning 120 of the

other people to not receive, to receive the other treatment, imagine if you

just happened to go out and collect a bunch of people, and ask

them what treatment they had, and ask them whether or not they had side effects.

Now, granted, it's a different experiment, they way in which you

would interpret the result of a chi squared test would be different.

But but there you, you would, you would

might think oh, I sampled 220 people, you know?

That was really my, the part that was fixed by the design.

And you

know, however people fell out in terms of s-, side effects and

In treatment, well, you know, that's part of the randomness, and, and so I'm

going to model that whole thing as if

they were multinomial with four possible things.

They could be taking treatment x with no side effects, treatment x with

side effects, treatment y with no side effects, treatment y with side effects.

I think I hit all four, but you see what I mean.

There's four possible elements in a two by two table.

So in that case, it wasn't fixed.

And I'll go through it right now.

But you wind up with the same exact test statistic,

you wind up with the same exact test statistic if you assume multinomial-ness.

And the specific null hypothesis, namely, a test of independence.

So that's, I find that cool.

And that's why often, people don't, people are a little bit loosey-goosey about

their assumptions in multinomial settings, because

they kind of apply in different ways.

And I do think there's a lot of problem with that,

because Yes, the number of the, that comes out from the test statistic is the same.

But, it doesn't the interpretation's very different,

different, different, experiments lead to very different interpretations.

Very different interpretations of the assumption.

in, in my just example I just gave you, in one case we randomized a treatment,

or in my fictitious case that I'm making

up, in one case we randomized the treatment.

And in the other case we just went out and sampled people observationally.

Those are very different interpretations.

I think I would, you know, most people would agree that the

randomization would kind of average over confounding effects and other things that

on, on observed variables where in the observational

case, that, that wouldn't necessarily be the case.