Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Loading...

来自 Johns Hopkins University 的课程

Mathematical Biostatistics Boot Camp 2

41 个评分

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

从本节课中

Hypothesis Testing

In this module, you'll get an introduction to hypothesis testing, a core concept in statistics. We'll cover hypothesis testing for basic one and two group settings as well as power. After you've watched the videos and tried the homework, take a stab at the quiz.

- Brian Caffo, PhDProfessor, Biostatistics

Bloomberg School of Public Health

Okay, so, we went through some extra steps there.

In general, we don't convert our constant that

we're interested in, back to the original scale.

Right, we just take the standardized mean,

and compare it to standard, normal quantile.

So in this example. Remember, 32 was our empirical mean.

30 is the mean under

the null hypothesis.

And the standard error of the mean in this case was 10 divided by square root 100.

So the mean expressed in standard devi-

standard error units from the hypothesized mean

works out to be two, 32 minus 30 divided by the standard error, two.

So

[INAUDIBLE]

our mean is two standard eeror units away from the hypothesized

mean. Then we would compare that to the

95th percentile for the standard normal distribution.

which is 1.645 so we would directly compare a 2 and 1.645.

So that's a, that's just a quicker way to do it.

So we can

[UNKNOWN]

these rules for a normal test for a single mean and do a couple of simple rules.

So let's just call this the Z test, because we're talking about

testing a single mean, we're either assuming that our data, we're willing

to assume that our data is Gaussian, or that our data, or

that the central limit theorem is a good enough approximation to apply.

Suppose we want to test the null hypothesis, that mu equals mu naught.

H naught isn't, mu equals mu naught, versus one of the three alternatives.

The, the, mu being less than mu naught, mu being greater than

mu naught, and the middle one here, mu not equal to mu naught.

And that one, we'll have to talk a little bit about.

In all three cases, we're going to say, well,

we're going to reject for H1 say for sample

mean, is small enough. Right, and if it's enough below mu naught.

We're going to reject in the case of H3,

if our sample mean is enough above mu naught.

And then in the second case, we're going to just reject if our sample

mu is enough different from mu naught, either too large or too small.

And again, just like before, the logical way to

do this is to express the mean in standard

error units, in standardized units.

So, when we calculate our test statistic, it's x bar

minus the hypothesized mean under the null value, mu naught.

Divided by the standard error, s over square root n.

that then is a z-score.

It is a, it is a sample mean expressed in standard error units.

so if we get, for example three, we would know that the sample

mean is three standard deviations above the hypothesized mean.

Right?

And so we, we observed the sample mean

that is 3 standard errors above the hypothesized mean.

Which is, which would be unlikely, right?

So that maybe casts some doubt on the hypothesized mean.

If we observe a sample mean

that is say, four standard deviations below the hypothesized mean.

Again, that would, that would be evidence in favor of H1.

If it was four standard deviations above the hypothesized

mean, that would be evidence in favor of H3.

And so we can actually force, if we wanted alpha-level error rates, so when

I prove this example, alpha was 5%, so alpha, again, remember, is the probability

of a type 1 error falsely rejecting the null hypothesis when, in fact, it is true.

Then, under H1, bar we would reject if our test

statistic was less than negative Z1 minus alpha.

So in this case if, you know alpha is 0.05, then we

would look up that 95th percentile and then take the negative of it.

[INAUDIBLE].

We could equivalently look up the fifth percentile.

Because the fifth percentile is the negative of the 95 percentile.

But I wanted just to say, use the same quantile every time.

So, in that case, it would be, negative 1.645, okay?

In h3, then, we're going to do exactly what we did in our example.

We would compare it to the positive.

the, the positive

upper quantile, Z1 minus alpha. So the 95th percentile of our alpha.

If the error rate we want to control for is 5%.

And then for the second case, H2.

We will reject if our test statistic is either too large, bigger than Z1

minus alpha over 2 or too small, smaller than Z1 minus alpha over 2.

And the reason we divide alpha by 2 is because remember

we want a 5 per, well alpha percent chance of rejecting the null hypothesis falsely.

so if the null hypothesis is true, we want only a 5 per, alpha percent

chance 100 times alpha percent chance of rejecting The null hypothesis.

So the way that we're going to do this now is, we're going

to say, okay we will divide that probability into half of it being

accidentally rejected, too large, and the

other half being accidentally rejected, too small.

And that seems like a pretty sensible, sensible rule to do.

So in the, the, the, the execution of that rule is just to say,

well, take my test statistic, if it's negative I throw out the negative sign.

If it's positive, I just leave it alone.

And then I compare it to the upper quantile from the normal distribution.

But instead

of looking at the 5% error rate, I look at the 2.5% error rate.

If I wanted 10% type 1 error rate then I

would look at a 5% then 95th percentile and so on.

Okay?

So let me just describe briefly more of this two tale test idea.

So just to let you know that you know,

that alpha over two in this case you know if,

if alpha is 0.05 then, you know 1 minus 0.05 divided by 2, that

works out to be 0.975 in the 0.975th quantile of the normal

distribution works out to be about 1.96. And, and what we're

doing let's let me get some x and y values.

And then plot the standard normal again,

which you can see over there. And then, so what, what we have is

we have a, a a test statistic that we calculate.

And then we're going to reject if it's too large or too small.

So, so take for example, What we're going to do is take 1.96, and

that puts 2.5% in the upper tail, and then we're going to take

negative 1.96, and that's puts 2.5% in the lower tail.

Of course then that puts 95% in the middle.

So we're going to reject if our test statistic is above

positive 1.96, which has a 2.5% chance under the null hypothesis.

Or, if our test statistic is below negative 1.96,

which has a 2.5% chance under the null hypothesis.

So the union of those two events yields a,

5% chance under the null hypothesis and rejecting whether something

is bigger than positive 1.X is the same thing

as rejecting if the absolute value is bigger than 1.96.

So in the way that we're executing hypothesis testing,

we've forced the type 1 error rate to be small,

it's, it's usually 5% or so. So if we reject the null hypothesis.

You know, either our model is wrong or several other things maybe could

have gone wrong, or there's a low probability that we've made an error.

On the other hand, we have not fixed anything to do with the type two

error rate, which is usually called beta. so therefore we

sent, intend to say Fail to reject H naught, rather than accepting H naught.

The kind of, you know, it's, it's, to give you kind of a classical

example of this is imagine if you have a very small sample size, and you

want to test some scientific hypothesis, and you

fail to reject the null hypothesis Well you've

controlled, there's a, the, you know, regardless

of your sample size you've controlled the type

one error rate so that there's only a 5%

chance that you will have rejected the null incorrectly.

But if the alternative is true, it''s possible that your small sample size

[INAUDIBLE].

has, leads to variability in the mean,

[INAUDIBLE],

right?

Because, remember, the standard error of the

mean is sigma divided by square root n.

So if n is small, our variability of the mean is going to be larger.

So if your sample size is small, and you fail to reject h not.

it's not fair to say that you should conclude H node if you only have say, a

sample of size three then maybe you didn't

have a good chance of rejecting the null hypothesis

anyway because you didn't collect enough data to really evaluate, evaluate it.

So at any rate, that's a bit of novenclature, so that's

why we, there's a tendency in, in basically every statistics text book.

That teaches hypothesis testing to say, failed or

reject H naught, rather than accept H naught.

If you want to say that you want to accept H

naught, there's an implicit notion that the type two error rate

is small, which you know, usually most in problems

less is known about of a type two area.

And we'll talk, this goes to the subject of things

like power which we'll talk about later on in study design.

you know that one of the ways to try and combat this problem, issues like this,

is prior to the, conducting the study, to design it in such a way That you

would, will have a high probability of rejecting the

null hypothesis if, in fact, the alternative is true.

And one of the things you have under your control for

doing that is to, to create, to have a large sample size.

so that's one point.

And, so any rate, the, so the tendency is to say fail to reject H naught

rather than, than rejecting H naught. You know, I think a classic

phrase to the people always say is absence of evidence is not evidence

of absence, is another way to put that. so that, that's one point.

The other point I'm making in this

slide is that statistical significance is not the

same thing as scientific significance from, from

in, in, in the context of hypothesis testing.

So the most common argument in favor

of this point, is you know we have these, for most of our hypothesis

test procedures we have that, you know sharply specified nulls.

You know, H naught, H naught is that mu is exactly

5 or something like that, you know, depending on the problem.

and so, it's possible if, let's say, you

have an enormous sample size, to get a sample

mean that's 5.01. With an incredibly small standard error,

because you have an enormous sample size, and still reject the null hypothesis,

even though 5.01 isn't in any scientific sense different from five.

And so, that's the point that's often made, it's that just because you

reject the null hypothesis in the terms of prac, in, in the terms

of of executing statistical test that doesn't mean

the difference that you've detected is in fact meaningful.

now you know I've read an argument by, at one point by a person

saying that in some instanes for example

when you have randomized comparative trials that

[UNKNOWN]

hyphothesis are still meaningful and even small deviations are, are

important but, you know these are kind of subtle issues.

I think the main basic issue that, that sort of at least generally understood is

that it's not always the case that statistical

significance and scientific significance are the same thing.

at least generally understood is that it's not always the case

that, that statistical significance and

scientific significance are the same thing.

So I would like you to at least be aware of that and you can read more about it.

But, and then before, in the previous slide we said, well,

we'll reject if our test statistic is above this value In the

case where we're testing H1, we'll reject if it's either it's, it's

absolute value is above this particular value in the case of H2.

And then we'll reject if it's below this value in the case of H3.

That was our

rules we came up with in the previous slide.

So in H1, the, the upper normal quantile and above.

That's called the rejection region and it, again it's H2 the, the,

the, the upper quantile and above or the negative quantile and below.

In other words the absolute value being above a large value

is the the reject, is the, the rejection region in that case.

And in the third case The normal

[INAUDIBLE]

down below is the rejection region.

So just the collection of values of test statistics for

which you reject the null hypothesis is called a rejection region.

Just a bit of nomenclature there.