Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Loading...

来自 Johns Hopkins University 的课程

Mathematical Biostatistics Boot Camp 2

41 个评分

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

从本节课中

Discrete Data Settings

In this module, we'll discuss testing in discrete data settings. This includes the famous Fisher's exact test, as well as the many forms of tests for contingency table data. You'll learn the famous observed minus expected squared over the expected formula, that is broadly applicable.

- Brian Caffo, PhDProfessor, Biostatistics

Bloomberg School of Public Health

Ok, so let's discuss it a little bit.

so in this case we have more tumours under the treated than the controls, so

we at least have an indication, but we'd

like some sort of inference associated with that.

So let's see if we can get a p value.

So but we're going to want to calculate an exact p value.

And our exact p value is going to use this conditional distribution.

the, the conditional distribution fixes both the row and the column

totals and we talked about when Fisher originally devloped this test.

He developed it wanting to fix the margins because he had

assumed that the lady, who was guessing would would have known.

That he would have randomized equal numbers to to t first

and milk first so he assumed that she would have fixed

those margins as well. So that went into why he thought that way.

and you know we talked about how this yields the

same test regardless of whether the rows or the columns are.

And that the hyper-geometric distribution, which

we derived as a conditional distribution, is

identical to the permutation distribution that we

discussed, randomly permuting treatment and control labels.

if we were to string the data out as the

full data set, not just as the two by two table.

And so, let's actually, and, and, and so all, kind of versions

of, all one-sided versions of Fisher's exact test yield the same inference.

For two-sided the, the way in which, this, you know, so

we have the null distribution but we need the test statistic.

And it turns out that, you know, all test statistics.

Are not equal in two sided tests but I'm going to give you the easiest one but

it doesn't exactly much, say for example, what r does or exactly what was

in Fisher's original version so I just want you to be aware of that.

OK, so let's consider our example from before where now we're testing that.

the tumour probability for the treated mice is

higher than the tumor probability for the control mice.

So, so this would, the p value would just require tables

as or more extreme under the alternative than the one observed.

Well and recall we're fixing the margin totals and the observed table was 4213.

A more

extreme table would be if not just four mice

got tumours from the treated group but if five did.

Right?

So if we plug in five then we know this cell has to be zero, this

cell to be one and this cell has to be four because we're fixing the margins.

And we can't find another more extreme table, because it we were to, say, put a

6 in there, then that would have to

be negative 1 instead of 0, which can't happen.

So, so there's only one table that's more extreme that honors the margin.

Okay, so plugging into the hypergeometric distribution we get 0.238 for the

observed table. And 0.024 for the for the other table.

And so the p value, the probability. The conditional probability of obtaining

a table as or more extreme in favour of the alternative then was observed is 0.238

plus 0.024 which works out to be 0.262.

So as you can see and you probably guessed is in this data set.

The only way we could have gotten a

5% significant test, is with the most extreme table.

And that's, you know, part of the consequence of you know, exact testing.

Of, of, of you know, these tests, and

it can be shown that these tests will guarantee

the error rate. That, but only guarantee

that the err rate is at most 5%, so let's say you compare this p value to 5%.

it does not guarantee that the err rate is exactly 5%.

You can't do that because the data's discreet

and there's only so many probabilities available to.

To the p value.

there was an effort at one point to

introduce supplemental randomization, to try and get truly exact

p values that not only honoured the 5%

level, but gave you exactly a 5% error rate.

But no one is willing, very few people are willing to accept that is a solution.

where things not in your data yield the, the info,

have important consequences to the inference.

So I think what you wind up having to do is, if you

want, if you demand, your analysis de, demands an exact small sample task.

Then you just have to deal with the fact that

these Small sample tests tend to be a little bit conservative.

I would say there is also another, there's

another strategy that comes up too, that's pretty good.

What some people will do is they'll calculate the so called mid p-value.

So,

they'll only attach half weight to the observed table.

and that p-value is somewhere in between, of course, the observed table and the,

the, the strictly conservative p value, and other strategies.

but of course it's not, it's no longer exact.

So, I figure if you want an exact test, do an exact test

and, in this data test for example, the only way you could reject.

Is if you got the most extreme table.

Okay, so let me just show you briefly how to do

this in our matrix c, 4, 1, 2, 3 comma 2.

We'll just create the matrix, type it in, and you'll see.

Type.

fisher.test, and you just do fisher.test on the little two by two matrix, in this

case we say alternative is greater.

Let's see we get 0.2619 which is what we calculated when we did it directly.

And it gives you all the information.

It gives you an odds ratio estimate and a, confidence interval.

Which, in a, in a subsequent lecture, we'll show you

exactly how to calculate, how they calculate this confidence interval.

Or at least we'll give you the intuition behind it.

It uses the so-called non-central hyper-geometric distribution.

okay. So that's pretty easy to do.

Okay, so the two sided p value is a little bit hard.

there's other ways to do it and r uses another way and well

I'll describe it, but right now I'm saying for the easy thing is calculate.

Double the, the one sided p value.

The smaller of the two one sided p values.

You have to double, you know?

[INAUDIBLE]

makes.

I mean, it's easier to remember which p value, which

of the two one sided p values you double, you know?

Because if you double the larger of the two one sided p

values, you wind up with a, a p value bigger than one, usually.

Or always, I think.

And, So, so you, you know you've done something wrong.

So, to two side a p value, you double the smaller of the one sided p values.

That mirrors exactly what we do in normal

tests, and t tests, and that sort of thing.

So what Fisher did initially was he, so, the other

strategy for creating a two sided test is you need a two sided test statistic.

A test statistic that measures.

Whether you're, whether a table is as or more extreme than the observed table.

So one example would you could, you could do the chi squared statistics.

So you could calculate the chi squared statistic.

Of your observed data.

And then calculate the, the, hyper jet geometric probability

for every two by two table satisfying the margins.

And then calculate, add up the probabilities associated

with those tables with the chi squared statistic.

Bigger I more favour of the alternative in the observed table that's

one example.

and that's fine that you could do that, right so you could

given the information I'd given you you should be able to do that.

Am, you should be able to do it with any statistic.

that any statistic that measures the direction of the alternative.

Of course.

So you might say, well, how do, how do you pick

a statistic in this case, since you can do any one, right.

Because you can calculate the associated hyper-geometric

probabilities for every table, you can calculate the associated statistics.

Then summing up the probabilities for those tables with statistics more

extreme than the observed in favour of the alternative is easy.

But what's conceptually hard is them paying this statistic.

The problem being that there is

no uniformly optimal statistic in this setting.

There is no so called uniformly most powerful statistic.

So there's a trade off

of power for when you do statistics and

for which statistic you choose and its property.

which is too bad but it is what it is.

There's no solution you can work out for a given data set, what constitutes

you, you can, you can work out basically every statistic for a given data set.

And convince yourself that there is no uniform and most powerful solution.

There's merely tradeoffs.

I should talk a little bit about what Fisher did in this case.

So Fisher's statistic was really interesting.

Fisher's statistic was the hyper-geometric probabilities

themselves, okay, so when he did the test, he would calculate the probability,

the hyper-geometric probability.

were both the statistics and the probabilities.

So he would add up, you know, he would calculate all

tables that satisfied the margins,

and calculate every hyper geometric probability.

And every table that had a hyper geometric probability smaller.

Then the observed hyper-geometric probability, he added all those

up plus, plus the hyper-geometric probability of the observed table.

And that was his p value.

So what he's doing then is he's

using the hyper-geometric probability as the test statistic.

because as we just talked about you could use anything as

a test statistic, any function of the table is a test statistic.

And certainly the hyper-geometric. Probabilities are a function of the table.

His logic went something like this.

if something came, arose out of not the

null distribution, but out of the alternative distribution.

Then those tables.

Would be, would have a low probability

under the null distribution, so he used that

logic to say well okay the probability under

the null distribution, is a reasonable test statistic.

And that was his logic. Fisher was, Fisher was a smart guy.

Okay, so p values

are, are, are usually large for small n is the second point.

what do I mean by that? I wrote that.

Oh, so, yeah, that's a poor way to put it, but, but what I, what I

mean is the discreteness of the problem usually

dictates that you wind up with a large P-value.

For example, here we have the second most extreme

table we could possibly obtain, and our p-value is 26%.

That's a consequence of a, the discreteness, and b, demanding a

demanding a exact test. It is what it is.

so the Fisher's exact test doesn't distinguish between rows and the

columns, transpose the table, you get the same p value out.

the common p under the null hypothesis is called a nuisance parameter.

So the procedure that we're doing.

To get rid of it is, is called eliminating nuisance parameters.

And,

conditioning on the total number of successes eliminates

the nuisance parameters, and we're going to go through a

couple more examples of this throughout the class

of conditioning on a sum or something like that.

To get rid of a Nuisance Parameter, and

then Fisher's exact test guarantees the type 1

error rate but it doesn't guarantees its obtained

exactly only guarantees that its obtained as a bond.

And here is, there was another great fight that accord in Statistics, you might

not think that Statistics nerd fights are fine but I do.

And um,there was a extremely hard

fought series of papers between Fisher and several other people about this procedure.

And there was another procedure called an Exact Unconditional test

and this, inventor of this Binard who later I think

came around to Fisher's way of thinking about things.

Said, well why don't we calculate, say, for

example, the probability of getting a proportion in,

of the, of tumours in the treated, bigger

than the proportion of tumours in the control.

We can calculate that.

It does depend on this,on the null, we can calculate that

under the null hypothesis if we knew this common proportion, right?

Then it's just a bunch of binomials.

And, sure you know, maybe it's tedious but we could do it.

And he

says, okay, well you know, we have this

probability for any given p, that's our p value.

Why not, then, just take the worst case version of p, right,

the largest version of the p value, and call it a day?

Super, super simple to describe.

And that procedure, in some cases, has better power than Fisher's exact test.

[INAUDIBLE]

, you know? I think the maths step.

People find it conceptually maybe a little,

a little, I think, conceptually, people tend to like Fisher's exact test.

But I like the unconditional test, too.

It had a lot of logic to it, I think, in terms of

[INAUDIBLE]

. It's very simple right?

It's a lot easier to describe.

And yet conditioning's very hard. I would say, however, the the

randomization idea, and permutation idea of Fisher's exact test is very compelling.

And, you know, so, so, why don't we actually, at any rate,

[UNKNOWN]

test is interesting. You can read up about it.

You can read about how they.

They fought over for decades, very fun literature, if you go

back to the biggest names in statistics, from before the say 50s or 60s.

They all weighed in on this problem, and if you get far

enough into the world of statistics, it's an incredibly fun literature to read.

Okay, last slide. you know?

I want to just ex-, ex-, expand on this idea of how you can.

Especially because we can generalize this idea to, to not just 2 by 2 tables.

And in the observed table, x equals 4. And the observed treatment

was T,T,T,T, 5 ts and then 5 cs, and the observed tumours were

like that; you could do this little collection of 2 by 2 tables.

And say for example an r if you have the data coded in this

way you could do table and it would just give you the observed data.

Okay, so one thing you could do is, you could permute

the treatment labels and that is exactly the hyper geometric distribution.

So you could do a Monte Carlo version of calculating this

hyper geometric p value, of course we don't need to, but

I'm introducing it because later on in the class we'll do

it in cases where it's hard to calculate the p value.

Analogically and you could in this case you, we simulated a table we got

x equals 3 and we just do this over and over and over again.

And calculate the proportion of tables for which the simulated tables have an X

bigger than or equal to 4.

I have evidence as and more extreme in favour of the alternative

and this is just a Monte Carlo estimate for Fisher's exact p-value.

This is exactly.

a simulation process, to, to give us this

hyper-geometric probability that we could calculate by hand.

In the future, we're going to have harder problems where

we'll need to do Monte Carlo to do it.

Because there's no way to do it, by hand.

And that's the end of the,

today's lecture.

And, next time we'll, we'll keep working on contingency tables.