Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Loading...

来自 约翰霍普金斯大学 的课程

Mathematical Biostatistics Boot Camp 2

40 评分

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

从本节课中

Techniques

This module is a bit of a hodge podge of important techniques. It includes methods for discrete matched pairs data as well as some classical non-parametric methods.

- Brian Caffo, PhDProfessor, Biostatistics

Bloomberg School of Public Health

Zero.

Okay, so that's the nonparametric equivalent to the paired test.

Let's talk about the nonparametric equivalent to the unpaired test.

And here we have, we're comparing two measurement,

measurement techniques, again, from this wonderful book from Rice.

Mathematical statistics and data analysis.

at any rate.

They, they were comparing two measuring techniques.

And the units are in degrees

Celsius per gram.

and here we have a group measured with

method A and a group measured with method B.

And we want to kind, kind of test, are the, the measurements the same?

And we'll, we'll be a little more formal about the hypothesis in a minute.

but so let's, let's talk about how we can do that.

And so what, basically the method we're going to use is,

the, not to be confused with method A, method B.

The technique we're going to use for testing

whether the two methods are the same, is to kind of

take the AB labels and, and shuffle them on every measurement.

But because, to be nonparametric, we're going to shuffle them on the ranks.

But then we'll talk later on about

shuffling them on the observed values themselves.

That's the so-called permutation test.

Okay, so what we're going to do is test whether or

not the two measurements two treatments have the same location.

and what I mean is, kind of,

the distributions are centered at the same place.

we're going to assume that the measurements

are independently, independent and identically distributed.

have an independent identically distributed

errors, that are not necessarily normal.

So, there's a difference between the errors being

normally distributed, versus the measurements being normally distributed.

And that's one way to, so this is

the problem with this test is, that's one way to write out the assumptions.

Another way is to view this as a test of kind of a distributional shift, that, the

distribution for method B is kind of uniformly

shifted relative to that of, of, of distribution A.

And that's called a stochastic shift for the two arbitrary distributions.

So, you can either kind of specify the hypothesis kind of tightly.

That they're centered in the same location with IID

errors, and then you get the same test statistic and it

has a set of power for that particular collection of hypothesis.

Versus a very general one about a stochastic shift, and it

has a different kind of power for that set of hypothesis.

so all we're going to use, use is we're

going to disregard labels, method A, method B labels.

We're going to rank the observations, and then we're going to use the

sum of the ranks by discarding the within each treatment label.

And this

is called the Wil-, Wilcoxon rank sum test.

It's equivalent to the so-called Mann-Whitney test as, as well.

So, so you might call it, I don't know, the Wilcoxon, Mann Whitney test.

In R, it's wilcox.test.

And, and I should say that there, there are

some slight differences between the tests, depending on how

you, the tests work out to be the same,

but they characterize the test statistic in slightly different ways.

But it's, it's still, I think, correct to attribute

the test to Wilcoxon and Mann Whitney. Mann Whitney being two researchers.

so the procedure is to discard the treatment labels.

method A, method B in this case.

Rank the observations, without concern over which treatment they were.

calculate the sum of the ranks in the first treatment, which is arbitrary.

You could pick either the first or the second treatment, you get

the same value, but you have to pick one of the two.

And then you either compare your statistic with the asymptotic normal

distribution of the statistic, or you can, you can calculate the exact

distribution under the null hypothesis.

So here I show the ranks for method A, the ranks for method B, in

case the two observations are tied, we give

them the average rank and then move on.

the sum of the ranks for method A was 180,

and the sum of the ranks for method B was 151.

By the way, the sum has to add up to 231, by the way.

and let's, just because that's a fun result,

let's let's show why this is the case.

So, Gauss supposedly did this as a child.

there's some hypothesis that this story's apocryphal, but whatever.

Let's, for, for our purposes, let's assume he did it when he was a kid.

So, the story goes, is that his teacher asked

him to add up the numbers between 1 and 100.

And he went down and sat at his desk, and just came back with the answer.

And the teacher said, that's not possible, how did you do that?

And then he went and really did it and got the same answer.

Any rate

I think the story's probably apocryphal, but here

it's really a neat way to show it.

Is we could write x is the sum of the digits from

1 to n, 1 plus 2 plus 3 and, in that way.

Or we could write it as n, plus n minus 1, plus n minus 2.

It's the same exact thing all the way down to 1.

So, if you add the two together, you get 2x equals, and in this case, notice

1 plus n is n plus 1, 2 plus n minus 1 is n plus 1,

3 plus n minus 2 is n plus 1, and so on.

And so this, so 2x is n times n plus 1, which is exactly what's happening here as

the number n plus 1 added up n times, so it's n times n plus 1, so then x has

to be n times n plus 1 over 2.

Okay, so, let's let W be the sum of the ranks for the first treatment.

And then, you know, if,

if a treatment has more numbers in it

then it's, under the null hypothesis it's going to have

a higher sum just by virtue of having more numbers, so we need to know NA and NB.

the number in each sample and it turns out that the expected value of the sum

of the ranks under the null hypothesis from

the first group works out to be this guy.

Na times nA plus nB plus 1 divided by 2

and so one with a standard error given by this guy.

And then we could create a test statistic which is our

W, our sum of our ranks in our first

group minus it's expected value divided by its standard error.

Turns out to be normal 01, of course you

can calculate the exact distribution as we described before.

Okay, so let's go through our example, in this case our, our sum of our

ranks was 51, if we did method B, our sum of our ranks was 51.

Here's our expected value and standard, standard

deviation of that statistic, 88 and 13.

Our test statistic works out to be negative 2.68.

P value of 0.007 for two sided.

and then, you can also do the function wilcox.test, and it'll

perform the, it'll perform the test to you.

Both for the one and two sample version,

you have to read over the documentation for wilcox.test.

If you give it one vector, it's going to do the sign rank

test You give it two vectors, it's going to do the rank sum test.

So, some final notes about the nonparametric test is, they

tend to be more robust outliers than their parametric counterparts.

They do not require normality assumptions.

They often have exact small sample versions.

And their trick, their big trick is to

focus on the ranks rather than the raw data.

you, there is some loss in power of their parametric counter parts.

assuming the parametric assumptions

are met, but the loss in power is often not so bad.

and then I just want to

emphasize, nonparametric tests are not assumption free.

They can, they're often distribution free.

for example, the sign rank test, you really

kind of have to assume that the distribution is symmetric.

but either way, in all of the tests that we considered,

you have to have a sampling model, that the data's IID, right?

That, that's an assumption, a big assumption.

The biggest assumption.

so, it's, you know, so just to emphasize

that nonparametric tests are, are not assumption free.

They're they're, they're often distribution

free, but not assumption free.

And then, so I just wanted to remind

people about permutation tests, because it's, we've already

talked about them a little bit sort of in, with regard to, to Fisher's exact test.

But here we could also talk about them in general.

so permutation tests are similar to these rank sum tests,

though they use the data rather than the actual ranks.

So, under the null hypothesis for the rank sum test, we had the collection of ranks.

And,

and our null distribution was just obtained by

permuting the treatment labels, you know, we had NA

treatment A labels and NB treatment B labels, and

we just permute those with respect to the ranks.

That would retain NA labels and NB labels,

but they would be randomly allocated among the ranks.

A permutation test is exactly the same thing.

You're just doing it to the raw data rather than the actual

ranks, and you have to come up with the statistics.

and I go through the procedure here.

you know, you could permute the ranks and then create a rank statistic.

I, I, I also want to

well, I, I want to distinguish, there's two ways to think about this.

One is, imagine if your treatment was actually randomized.

Then you can think of the permutation

test as actually kind of redoing the randomization.

And in that case, the permutation test is called

a randomization test if, if you interpret it that way.

but you can also kind of perform the

permutation test even if the treatment wasn't randomized.

Because you're thinking along the lines of well,

my null hypothesis is that my labels A and B are exchangeable between the groups.

so either way it kind of make sense.

but it, it changes your interpretation a little

bit of the, of the test in either way.

But at any rate, the Fisher's exact test, the rank sum test, Fisher's exact test,

which works on collections of binary data, Fish-,

the rank sum test, which works on ranking

the observations.

And then the permutation tests all have the same basic principle.

Is that, under the null hypothesis, however we're interpreting it, is that

treatment labels are the NA and NB treatment labels, are exchangeable.

And our null distribution is obtained by

permuting those labels across the values in, in

Fisher's exact test, and the permutation test,

we're permuting them across the actual observed values.

And then in the rank sum test, we're converting the

data to ranks first, and then permuting across the ranks.

So just to reiterate, this is an easy way

to produce a null distribution for test of equal distribution.

It has kind of a similar flavor to the bootstrap, maybe not exactly.

this, this produces an exact test.

it's less robust but more powerful than, than, than

rank sum tests, because you're not throwing away the data.

With rank sum tests, you throw away the actual units

and you go to with ranks so you gain robustness

at the expense of power.

This, you get a little bit more power under certain assumptions.

but you lose some of that robustness.

It, it's very popular in, in large scale, big

data applications, like genomics, for example, and neuro imaging.

So this final picture is just what you

would aspire to get from a permutation test.

You would permute method A, method B labels.

You would say, calculate the T statistic as

if the permuted labels were the observed labels.

And you'd do that over and over again.

And you get a null distribution of T statistics, right?

And then you, this vertical line is where

our absolute, our, our actual T statistic occurred.

And then,

if that's the case, then you or whatever statistic, it doesn't have

to be a T statistic, but that's a reasonable statistic to do.

And then the percentage of the simulated statistics that are

more extreme than our observed statistic is our exact P value.

So that's a, that's a permutation test.

if you were to do it with the ranks,

then this would just be simulating the, the, the exact

small sample distribution of the rank sum statistic.

If you're do it with a raw data, then it's a so-called permutation test, and so on.