This course covers the design, acquisition, and analysis of Functional Magnetic Resonance Imaging (fMRI) data. A book related to the class can be found here: https://leanpub.com/principlesoffmri

Loading...

来自 约翰霍普金斯大学 的课程

Principles of fMRI 1

311 评分

This course covers the design, acquisition, and analysis of Functional Magnetic Resonance Imaging (fMRI) data. A book related to the class can be found here: https://leanpub.com/principlesoffmri

从本节课中

Week 4

The description goes here

- Martin Lindquist, PhD, MScProfessor, Biostatistics

Bloomberg School of Public Health | Johns Hopkins University - Tor WagerPhD

Department of Psychology and Neuroscience, The Institute of Cognitive Science | University of Colorado at Boulder

Hi, in this module we're going to be talking about the multiple comparison

problem in FMRI.

So, to recap what we talked about a few modules ago, when we want

to fit the GLM in order to localize areas that are active in response to a task,

we begin by constructing a model for each voxel of the brain.

And this is typically done in the massive univariate approach,

where every voxel has a separate model.

And we usually use a GLM type approach.

And here is just a kind of cartoon showing how we can create a Design Matrix for

two different conditions, A and B.

And then we put this into the GLM model as follows.

Now, once we do this and we estimate the parameters of this model,

we can perform a statistical test to determine whether or

not there's task related activation present in the voxel.

So typically we test some hypothesis, c transpose beta is equal to 0, so

this is some linear combination of the beta parameters.

So for example we might test condition a minus condition b is equal to 0.

And in that case we want to check this versus the alternative that they're not

equal to 0.

And so we do this at every voxel of the brain and then we can summarize

the results that say, the subsequent t-statistics that we obtain by performing

this hypothesis test in a statistical image such as the one shown here.

And so here each voxel now has a value

corresponding to the t-statistic of the statistical test at that voxel.

Now, the next stage is, that's a nice map and all,

but we want to sort of determine which voxels are active or not.

And so in that case, we need to find a way to threshold this t-map in order to find

significant voxels and get a statistical parametric map, such as the one seen here.

Here each significant voxel is color coded according to the size of its p-value.

So the question here is, how do we determine this threshold?

So, before we start talking about this and the multiple comparison problem that this

entails, let's go over some basic nomenclature for hypothesis testing.

So, the null hypothesis H nought is a statement of no effect.

So, there's typically, we want to test the hypothesis that beta 1- beta 2 = 0.

And then we try to see if we can reject this null hypothesis and

say that well, they're indeed different from each other.

The way we do this is through a test statistic T.

And so, the test statistic measures the compatibility between the null hypothesis

and the data.

So the way we see whether or not they're compatible or

not is we calculate something called the P-value.

And the P-value's the probability that the test statistic would take a value as or

more extreme than that actually observed if H nought is true.

So, mathematically, we can write this as the probability that T is bigger than

little T, the test statistic is bigger than little T given the null hypothesis.

So basically this distribution is showing us what are feasible

values that the test statistic can take, if the null hypothesis were indeed true?

And if the p-value is small,

that says that our test statistic is lying far out on the tails of plausible values.

So the smaller the p-value, the less likely that we believe that

it arose due to this, that the null hypothesis holds, and

in that case we might choose to reject the null hypothesis.

Typically, we decide a fixed threshold,

which is called the significance level, so we choose a threshold u of alpha

which controls the false positive rate at some level alpha.

So we basically want to find some threshold u of alpha such that

the probability that the test statistic lies above that value

is equal to some value alpha, where say 0.05 is often used.

So, we want to be able to, we want to control

that the probability of making a false positive rate at say 5% in that case.

So, whenever we're doing hypothesis testing,

we're ultimately making a binary decision.

Should we reject a null hypothesis, yes or no?

So, when we're making decisions like this,

there's two types of errors that we can make.

One is called a Type I error.

That happens if the null hypothesis is true, but we mistakenly reject it.

This is also a called a false positive.

So indeed, the null hypothesis is true, but

we decide that we should reject the null hypothesis.

And this we can control by the significance level alpha.

So if we want to guard against the false positives,

we can make the alpha level very, very small.

That means we need a lot of evidence to reject the null hypothesis.

The other thing is a Type II error, which is

assumed that now that that null hypothesis is false, but we fail to reject it.

This is a false negative.

So in this case, we really should be rejecting the null hypothesis but

we don't do that because we don't have enough evidence to do so.

In that case we get a false negative.

And what's most serious between a Type I and

Type II error will depend on the situation.

The probability that a hypothesis test correctly rejects a false null hypothesis,

this is a good thing, this is called the power of the test.

So we want a test that's very powerful because if the null hypothesis is false,

we want to be able to reject it.

So, these are sort of terms that are often used when talking about

hypothesis testing and whatnot.

So choosing an appropriate threshold is complicated in the situation

that we're in in FMRI by the fact that we're dealing with a family of tests.

So if more than one hypothesis test is performed at any given time

the risk of making at least one Type I error is going to be inflated.

It's going to be greater than the alpha level of a single test.

So for example, if we control the Type I error rate,

let's say 0.05, that's the rate for a single test.

But if we perform hundreds of tests,

there's a 5% likelihood of making a mistake on each of these tests, and

eventually we're going to wind up making a mistake.

So the more tests one performs,

the greater the likelihood of getting at least one false positive.

And so when we're actually performing, say, 100,000 tests, it's very likely that

we'll make false positives if we don't make control for this appropriately.

So again, which of these 100,000 voxels are significant in this statistical map?

Well again, now we've performed 100,000 different hypothesis tests.

And if we were to just assume that they were all independent and we

could control at the 0.05 level, then we'd actually get 5000 false positive voxels,

because once out of every 20 times we would make a mistake.

So in this case, we would have 5,000 false positive voxels, and so

this could be entire regions of the brain that are deemed active even though they

shouldn't have been, and this can be a very serious problem.

So choosing a threshold is ultimately a balance between sensitivity, which is

the true positive rate, and specificity which is the true negative rate.

So, again we looked at this little example in an earlier module, but

I think it's worth looking at again.

So, for example this statistical map, we could threshold at any given level here.

So, here I show five examples with threshold at 1, 2, 3, 4, and 5.

And so you see, if you choose a low threshold,

then you get a lot of active voxels.

So in this case, you're probably finding all the active voxels of the brain.

However, you're probably getting a lot of things that shouldn't have been active and

declaring them active, so that's no good.

On the other hand, if you choose a very stringent threshold, say 5,

in this case you're pretty sure that the regions that are active are truly active.

But you can't shake the feeling that you've missed a couple of activations.

So we have to find some middle ground and we want to do this in a principled way.

So how do we choose the threshold to determine which voxels are active and

not active in a principled way that we can sort of defend and believe in?

So that's what the next couple of modules are about.

And so there exists several different ways of quantifying the likelihood of obtaining

false positives.

One way is to control what's called family-wise error rate.

The family-wise error rate is the probability of making any false positives.

This provides a very strict control over multiple comparisons.

So, we want to guard against making any false positives at all.

A little bit more lenient approach which is becoming increasingly popular is what's

called the False Discovery Rate or the FDR.

And so, the False Discovery Rate controls the proportion of false positives among

all rejected tests.

And so in the coming modules we'll talk about the family-wise error rate and

the false discovery rate in turn.

So, that's the end of this module.

This was just a brief introduction to the problem at hand,

with multiple comparisons.

In the next couple of modules, we'll go into detail and talk about methods for

controlling the family-wise error rate and the false detection rate.

See you then, bye.