Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Loading...

来自 Johns Hopkins University 的课程

Mathematical Biostatistics Boot Camp 2

42 个评分

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

从本节课中

Hypothesis Testing

In this module, you'll get an introduction to hypothesis testing, a core concept in statistics. We'll cover hypothesis testing for basic one and two group settings as well as power. After you've watched the videos and tried the homework, take a stab at the quiz.

- Brian Caffo, PhDProfessor, Biostatistics

Bloomberg School of Public Health

So the, the Z test that we're talking about requires the assumptions of

the CLT and for n to be large enough for it to be applicable.

If n is small, then you could just do Gossett Student's T test.

In the same way, you're just replacing

the normal quantiles with the Student's T quantiles.

the probability of rejecting the null

hypothesis when it's false is called power.

Remember, we set the type

one error rate, which is the probability of rejecting the null hypothesis when

it's true, so we force the type one error rate to be small.

The type two error rate, which is the probability of failing to reject the

null hypothesis when in fact the null hypothesis

is false is called a type two error rate.

Power is 1 minus that, it's the probability

of rejecting the null hypothesis when it is false.

And so, power is a good thing.

You want to reject the null hypothesis when it's false.

And unfortunately, power is not typically under

our control after the experiment has been conducted.

so the way that people combat this is prior to conducting

the study, they do a power calculation where they vary the sample

size or if it's simple enough, just calculate the sample

size needed to obtain a certain level of power using guesses for what they think

the standard error and, and hypothesize hypothesized

significant effect would be.

And that's what we'll talk about next lecture.

Okay.

So let's actually go through the T calculation for this example.

suppose that n is 16 rather than a 100 as we were considering

before so we have to so we're going to use a T test.

then, look at this equation right here. we want 5% to be the probability

that X bar minus 30, the value under the null hypothesis,

divided by the estimated standard error now, s over square root 16.

we want to do the probability that that quanitity is

larger than the t quantile now instead of the z quantile.

Again, the 1 minus alpha quantile with 15 degrees of freedom.

So our test statistic now is this standardized observed

mean so 32, our observed mean, minus the hypothesized value

divided by the standard error 10 over square root 16.

Square root 16 then moves up in the denom, in the numerator and that works out to

be 0.8 and the t critical value is 1.75.

And so now we, we fail to reject and it, and it, and it shouldn't be surprising,

right, we're changing what used to be multiplication by a square root

100 to now square root 16. And so, the test statistic went

down substantially while the quantile that we're comparing it to went up.

Because remember, the t is a heavier tail distribution than the

normal, so it shouldn't be surprising that we now fail to reject.

Okay. So in the previous

slide, we did the one sided tests.

Let's now do the two sided tests, and we're

going to move through these things quickly because I'm hoping

at this point in the class that you're getting,

you'll be getting used to these kinds of calculations.

So let's, we want to now test whether mu is different from 30 as the alternative.

And maybe you could say that doesn't make a lot of sense in this case

because the way I framed the problem

was that we're looking at a particularly susceptible

population to having a high RDI so why aren't,

why don't we just test mu greater than 30.

And well, let's just, for the sake of argument,

just to show you the calculations do different from 30.

But also, I would say that in

many journals and avenues of scientific inquiry, they

demand two sided test even if the one

sided test is the natural direction to consider.

so let's do a two sided test.

So what we want is to test whether or not our observed mean X bar is significantly

different from our null hypothesized value of, for the population mean 30.

So that would if it's significantly larger than 30 or significantly smaller than 30.

So, we could just say, well, maybe we will look

at the absolute value, X bar minus 30, which would look

at whether it's too small too small below 30 or too large above 30.

And then of course, because we you know, we want to, to standardize our statistics.

We're going to divide by the standard error of the mean, s over square root 16.

And we know that X bar minus 30 over s over square

root 16, if the data are iid Gaussian, that follows a t distribution.

And so, if we want alpha, the type one error rate, to be specified

so that the probability that this test statistic is too large or too small,

the probability of that occurrence is exactly

alpha, well what we could then pick

is the t quantile t1 minus alpha over 3 and 15 degrees of freedom.

And what this does is this says, this random t statistic,

the probability of it being larger than

this quantile, is alpha over 2 probability.

of the, the positive part of this

statistic, the probability of it being larger than

the, the the, the, the t1 minus alpha

over two quantile gives alpha over 2 probability.

The probability that this test statistic on the negative end is less

than neg, the, the t alpha over 2 quantile with 15 degrees

of freedom which is a negative value is also alpha over 2.

So we put alpha over 2 in the lower tail, alpha over

2 in the upper tail, and that yields a total probability of alpha.

And in the next slide, I'll describe that a little bit.

And this calculation is, of course, all done

under the null hypothesis that mu equals 30.

so we'll reject if our test statistic, which in this case, X bar minus 30

over s over square root 16 is 0.8.

So, when we take the absolute value, it remains 0.8.

And we're going to reject if it's either too large or too small.

But again, remember, the critical value is calculated now using

alpha over 2 rather than alpha because we want alpha over

2 probability of rejecting for too large, and alpha over 2

probability for rejecting if the test statistic is too small, small

negative.

So, in this case, the critical value is 2.13 and notice, of

course, that's a larger value than when we just use alpha, because

we're going further out into the tail, so it's harder to reject

for the two sided test than it is for the one sided test.

So since we rejected for the one sided test,

we're, of course, going to reject for the two sided test.

Okay.

Let's just briefly again show you the calculate,

the two sided calculation in where the alpha over 2 comes from.

So here, I'm setting a sequence of x values from minus 4 to plus 4.

I'm evaluating the t density with 15 degrees of freedom at those

points, and then let me plot. And there's my t distribution.

Okay, now I'm going to shade in that area right there.

That's 2.5%.

And let's say my alpha, my type one error rate that

I want is 5%, and that value right there is 2.13.

So for the t distribution, the 97.5th

quantile is 2.13 with, when you have 15 degrees of freedom.

Then let's do the same thing for the lower quantile.

sorry about that.

[SOUND]

There we go. That's

better. And that's 2.5%

right there and that's negative 2.13 and

then that's 95%. So what we're saying is we calculate

the our normalized test statistic X bar minus 30

over S over square root 16 and the probability that the

absolute value of that statistic is bigger than 2.13.

Or in other words, the positive, the probability that that statistic is too

large positive above 2.13 is 2.5%. Or too small negative is 2 point

negative too small negative in the form of being less than negative 2.13 is 2.5%.

So the probability that it's absolute value is bigger than 2.13

is 5% including the upper tail 2.5% and the lower tail 2.5%.

So that the probability we, we, the test

statistic lies in the rejection region is 5%.