Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Loading...

来自 Johns Hopkins University 的课程

Mathematical Biostatistics Boot Camp 2

52 个评分

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

从本节课中

Techniques

This module is a bit of a hodge podge of important techniques. It includes methods for discrete matched pairs data as well as some classical non-parametric methods.

- Brian Caffo, PhDProfessor, Biostatistics

Bloomberg School of Public Health

So there's a, a great example of this Imagine just going back

to our death penalty example, and we here we have two, two by two tables.

We have the defendant's race and whether or not they got the death penalty.

And then we stratified that by a third variable, victim's race.

So let me let nijk be the ijk

entry of table k.

So in this case, k equals 1 to 2, k equal 1

was, be the first victim's race, k equal 2, k equal 2

is be the second root victim's race, and the inj would index

defendant's race and whether or not the person got the death penalty.

So in our first example, we had two 2 by tables stacked right on top of the others.

k equal 1 indexes the first one, k equal 2 indexes the second one,

and then nij indexed the individual elements of that 2 by 2 table.

So then the k'th odds ratio. The case sample odds ratio.

So remember the odds ratio was the cross product

ratios, the, the main diagonal divided by the off diagonal.

So N11 times N22, divided by N12 times N21.

So in this case, everything indexed by K,

referencing the K table.

So the kth sample odds ratio is sigma hat sub k.

Then the Mantel Haenszel estimator is exactly

our weighted average of these straightest specific estimates.

It's summation weight rk times theta hat sub k divided by the sum of the rks.

And by the way, you know, when we had two, when we had two, it was r1

times x1 plus r2 times x2 divided by r1 plus r2.

But if we had three, it would be, r1 times x1 plus r2

times x2 plus r3 times x3 divided by the sum of the r's.

If we had four, same thing.

Anyway, this is the Mantel Haenszel estimate, this

is the sum of the weights times the

straightest specific odds ratio, the a hat sub k divided by the sum of the weights.

So we just get a weighted average of simplicial

convex combination of the straightest specific odds ratios.

And then what are the weights?

Well, okay.

So the weights in this case are this little formula.

Right here, and I'll describe where they come from.

But the the the motivation for the

weights is that they're inverse variances.

That's 100% the motivation for the weights,

they're inverse variances from a hyper-geometric distribution.

So you can just think of we're exactly doing the same thing

we did with the scales, only now in terms of the odds ratio.

At any rate, this simplifies this so-called Mantel Haenszel estimator.

Here's the formula right here.

I would suggest that you look in Agresti's book, page 235.

In the version I was looking at, or

you can look it up, the version's probably changed,

Rosner's book which is very comprehensive, page 656.

They give the standard error, it's a long formula, I'm

sure you can find it on the internet as well.

This is the so-called Mantel Haenszel estimator, named

after the two great epidemiologists Mantel and Haenszel.

Okay, so here's an example.

and here's a great example of, of this Mantel Haenszel estimator.

So here we had an active drug, T, and C being a placebo, control, I guess, and

then here we had success versus failure, and I'm

going to abstract what the specifics were of the experiment.

And then what people were concerned about was whatever policies and practices

existed at the various centers at which the data were collected.

So they stratified by center.

One two three four five six seven eight centers.

So they got eight odds ratios. Right?

And they were worried that the the center was a confound, there, there's

several reasons, actually, you might want to do

the mental hands estimator in this case.

but let's talk about it in terms of confounding first.

So imagine if you thought that the center. Was specifically associated

with the treatment application some centers tended

to apply the treatment more than others.

and that the, the center was associated with the zess of the treatment because of

different policies associated with the center and how

the treatment was delivered or something like that.

Then you wouldn't want to adjust for the center as a potential confounder.

And then here you would, you know, one and

we're, we're going to adjust for this confounder by

statifying my center, getting an odds ratio

specific to every center, and then averaging

over the odds ratio, but factoring in the inverse variance of the odds ratio.

Some centers have more patience, right?

This one had 73, this one had 14, so we want to weight the one with

say, 73 more than the one with 14,

because they have a better, more precise odds ratio.

And so that's what the Mantel Haenszel estimate does.

And so you get this, you get an odds ratio of 2.13.

the log odds ratio is 0.758, and the standard error, cause

of the standard error calculate It uses a delta method type argument.

Standard error of the log odds ratio works out to be 0.303.

You would take 0.758, add and subtract say two standard errors, and

exponentiate the end points to get the confidence interval for the odds ratio.

let me talk a little bit about.

Without the anoth-, another rational for why we might want to

to, do some sort of stratified estimate.

It's also often the case in this case we think

that, that, that center is at some level, this random effect.

Type.

modify modifier of the treatment of efficacy.

And we're willing to think of these centers as

sort of a random draw from the population of centers.

Then in that case, it still actually makes a lot of sense to combine them.

Because we're not so much interested in center specific effects.

We don't care if the treatment works at center number 1.

What we care is, if it works overall.

And it turns out, you know, even if the

center isn't just a confounder, but it really modifies the

effect of the treatment, then you know, some places

are just better at doing the treatment than the others.

The really, kind of what you, in, in, but if you're willing to

make that the center, make the assumption that the centers are the sort

of random draw from a population of centers, then this CMH

estimate makes a lot of sense saying okay, across, averaging across centers.

Here's the effect of the, of the treatment.

And that's another instance where you, where

you would consider doing something like CMH.

Or I'm sorry, a Mantel Haenszel log a common odds ratio estimate.

So this is what is often

called as the common odds ratio, the common odds ratio across centers.

So

then there's, there's a famous test for testing whether or not

the odds ratios are, the common odds ratios are equal to 1.

So the test is usually stated as the null hypothesis.

That all of the straight is specific odds ratios are equal

and they happen to be equal to 1, versus the alternative.

And there's some amount of dispute over the

alternative, but I'm going to teach the alternative this way.

The alternative is that they're all equal, right, but they're not equal to one.

Okay?

So notice this is different than the

alternative that they're not necessarily equal, even.

so here we're assuming that we have a common odds ratio, under the null

and alternative, and we just want to test

whether that common, common across straight odds ratio

is 1 versus nought.

And it, it turns out that CMH tests applies to other

alternatives but it's more powerful for the particular alternative given above.

and, and it, I'll, I'll also mention that this test is exactly the same

as testing for conditional independence of response,

and exposure given the strata find variable.

and so, this Cochran Mantel Haenszel test it, the way it's executed

is, the conditions on the rows and columns for each of the contingency tables,

exactly like Fisher's exact.

Test resulting in hyper geometric decay, hyper geometric distributions,

and then leaving only the upper left-hand cell of each

table free, just exactly like we did in Fischer's exact

test, only this time doing it for each specific cell.

So let me, I'll go through the mechanics now.

Okay.

So under this conditioning, and under the null hypothesis under both

those circumstances, then the expected value of the upper left-hand cell of each

table is this value, the variance of the upper left-hand cell is this value.

So the Cochran Mantel Haenszel test statistic works out

to be this guy right here, kind of, the sum

of the deviation of the upper left-hand cells, from their

expected values, but summed up and then squared, unlike the

chi square test, where they're squared and then summed up.

So, summed up and then squared.

And then, regardless of how many tables you have,

under the null hypothesis this is a chi squared one.

so, remember this is a different test than the

chi squared test we talked about in the previous lecture.

So that's why it's a different test statistic.

And the idea of testing conditional

independence, or testing for this confounder relationship is a different

idea, and that's why you get a, a different test statistic.

I think it's a little bit beyond the scope

of this class to derive the CMH test statistic.

But the idea is here you want to test for whether or

not the odds ratio is one, given that it's common across Australia.

versus the odds ratio given that it's common across Australia is not one.

So here I'm going to, to implement it, its, its a bit, a bit of a pain to

actually execute the CMH test by hand, so here I'm putting the data into an array.

in this case eight two by two tables and then you

can do this mantelhaen test correct equals false I put again

correct equals false here, so that if you do the calculations

by hand you'll get an agreeing result to the R output.

You generally want to leave this correct equals true.

The result is the test statistic is 6.38.

Compare that to a chi square 1 value or, you know,

again, the only reject for larger values is the test statistic.

Again, it being a two-sided test. because it's a squared statistic and

so, so in this case the p value is 0.012 so the task presents evidence to suggest

that a treatment and response are not conditionally independent given center.

some final notes on this testing is

You know, it's possible to perform an analogous

test using a kind of what I consider to be a little more modern of approach.

Using a random effects logit model.

And, and the reason I like, but again we're not you know, we haven't covered

aggressions or we can't cover mixed models and

then we can't cover generalize the mixed models.

And all the machinery that you would need to cover this.

This it's

possible anyway cause this I, I think you know, you should take the time if

you're going to work as a statistician or use a lot of statistics in your life.

You should take the time to build your way up to

where you're studying mixed effect models and generalize your mixed effect models.

And there you can do exactly things like the CMH test is doing.

Only you can do it in a very general way that

allows for other variables to be specified in the model, and so on.

The, the reason for presenting it this way in this class

is to just give you a sense about the idea of confounding.

To give you a sense of what, what you can do

just in particular, in the case of 2 by 2 tables.

And then later on, I'm hoping that you'll take some more.

Statistics classes and learn about mixed models

in general, and so, linear mixed models.

it's also, you know, so here we assumed all the odds ratios are

equal versus the alternative that they were all equal, but not equal to 1.

It's also possible to test whether or not all the odds ratios are equal.

There's a test for that, and it's called Wolf's

test, and you know, that's a very good test.

I don't have time to cover it.

The final thing I would mention also is that, you know, we have these K

hypergeometric distributions that we used in the CMH test statistic.

So you could probably guess that you can exactly

do some sort of exact test in this case.

And you can, you know, just, in R, you can just

do exact equals TRUE as an argument in the Mantel–Haenszel test.

But you can probably envision how, how,

how it's done, Imagine this within each center

you were to permute the do this permutation

process that we talked about for Fisher's exact test.

Imagine if you were to do that now within each

strata and recreate the chi squared statistic each time and

do that over and over. Over and over again.

In that simulation would yield an exact p-value.

So I think you could actually probably come pretty close to doing exactly what

the exact quals TRUE argument is doing in R by that permutation process.

Of course, they, in, in this case, they can do the calculations exactly without.

Monte Carlo.

So it's faster, but conceptually, that's exactly what they're doing.

Okay, well that's the end of the lecture.

And so this was a teaser on the idea of confounding.

And confounding is, I mean the one of, probably the biggest obstacle to

generating knowledge from observational data, data where you don't have

a heavy amount of control over the design of the experiment.

which is most data, right.

The easy data to collect is observational data.

so this gives you a teaser.

In how that data is collected, in, or I mean, how that data

is analyzed, and there's a tremendous art to analyzing that data that I

hope as you learn more statistics, you'll get more refined at that art.

And just like any other art, you can spend

a lifetime perfecting your craft, and you'll never really.

Hit a limit.

it's a very hard topic.

and then again, I highly recommend considering

learning some, something about causal inference, which is

the, the most modern attempt at, at

addressing this problem in a mathematically formal way.