This course covers the analysis of Functional Magnetic Resonance Imaging (fMRI) data. It is a continuation of the course “Principles of fMRI, Part 1”

Loading...

来自 Johns Hopkins University 的课程

Principles of fMRI 2

90 个评分

This course covers the analysis of Functional Magnetic Resonance Imaging (fMRI) data. It is a continuation of the course “Principles of fMRI, Part 1”

从本节课中

Week 1

This week we will discuss psychological and behavioral inference, as well as advanced experimental design.

- Martin Lindquist, PhD, MScProfessor, Biostatistics

Bloomberg School of Public Health | Johns Hopkins University - Tor WagerPhD

Department of Psychology and Neuroscience, The Institute of Cognitive Science | University of Colorado at Boulder

This is also related to something that's been called the decline effect.

And this is the subject of repopularized in her New Yorker article a few years ago,

which is a really wonderful article.

And the idea is that film effects that appeared very large in initial papers

subsequently decrease in effect size when they are resampling.

And the lead example here is an example of a drugs that seem to work initially and

they're test over the years, they get smaller and smaller.

The effect sizes get smaller and smaller.

And there many reasons going to happen, of course, a drugs including

changing standards for how diseases are diagnosed and drugs are applied.

But a big part of this decline effect is likely just simply regression to the mean.

So anytime you have a false positive finding that you happen to find published,

and then people start trying to replicate it.

If it's not a true finding, or if those findings have been picked out of a larger

family of experiments that happen to work the best, then when it's replicated

the effects will get smaller and smaller, and it will appear to decline.

So not so mysterious, but important.

And then we'll talk about circularity.

And this is a way of thinking about brain analyses.

Especially and the kinds of analysis that lead to bias effects and

how we avoid them.

So circularity is also called double-dipping, in which was popularized

by an article a few years ago by Nikolaus Kriegeskorte and colleagues.

And the idea of circularity is that you can select voxels to look at

based on one effect or test, and then test those voxels on something

that's not independent of that selection criterion.

So this is sort of pernicious, and so we have

to really think through our results and analysis and be careful to avoid it.

But here's an example, a work through example.

So what you see here is a panel of voxels and this is a study with four conditions,

A, B, C and D.

So now we've got the blue area here, which shows some true effects and

the true effect is A & B activate, C & D don't.

So now we're going to select data, there's the truth, or

we're going to select data based on this contrast A versus D.

So we selecting on A versus D and

we're picking up voxels, that show those A versus D effect.

Now what's going to happen is,

any voxels I test later are going to tend to show noise that favors A versus D.

So then, if I tested independent data, I would get about the right answer.

A and B activate, the other ones don't.

But, if I use that same data where the noise is favoring the A versus D

hypothesis, then I'm going to get a biased effect.

A is going to be greater than B, on average.

Just by chance.

So, how can I select on A versus D and get an A versus B effect?

Well I'm conditioning on noise values that tend to be high for

A, and that's not true for B, so I'm creating a bias.

And that's one of the, that's essentially the circularity problem in a nutshell, and

that's one of the big dangers in terms of selecting ROIs and then testing them.

We have to make sure that the data are independent.

And one popular way of selecting regions of interest is through contrasts that

are orthogonal to the tests of interest.

And so nominally you might that that avoids the circularity problem.

So I might select on a main effect of A B versus C D and then test on A minus B.

And that seems on the surface, pretty okay.

It's safer than truly non independent tests but

there can still be bias if you test on the same data.

Why?

Because the design matrix, the regressors for A,

B, C, and D can be correlated and that can produce effects, and

also the noise characteristics might be auto correlated so

the noise characteristics might also create a selection bias.

So we do have to be careful even when we're applying orthogonal contrasts.

It's better to test on independent data.

This basic circularity phenomenon is one of the other issues that was raised in

the voodoo correlations debate.

And the idea is non-independent tests inflate this brain behavior

correlation estimates.

So this is a histogram from the literature of the correlation value for

reported correlations in your brain and behavior from the literature.

And as you can see it ranges from about .2 to correlations close to 1,

which seem like fair large effect sizes.

And what the authors here did, is they broke it down into those in which they

thought that the test were independent of a voxel selection of criteria and

those that were not independent.

And so you see independent in green and non-independent in red.

And that's one estimate of what the inflation of the apparent

effect size due to the circularity or non-independent testing might be.

So here are some solutions.

One solution that were going to advocate

quite a bit later in the course is data splitting.

Hold out independent test data if you actually want to estimate effects sizes.

And we should want to estimate effects sizes.

So this means perhaps holding out a sub section of participants for

a later exact test of the findings that you report.

And also maybe holding out runs

if you're interested in making inferences within an individual person.

So, with single hypothesis test of the model.

If you develop a model or

a pattern across regions that you can integrate into a single test.

Then if you're just doing one test on new independent data

then you can get an unbiased estimate of how big that effect is.

We'll look at an example of this later.

And this principle goes beyond voxel selection to encompass

all kinds of model-building.

Whether your designing a fancy connectivity or dynamic causal model or

a predictive model, which includes multimodal data, or anything else really,

the same principles hold.

One really effective strategy, we'll talk about later is called cross-validation.

And what it is, it's an efficient data splitting strategy

in which model development is done on one set of data, training data.

And then testing is done on another subset of the data, systematically.

And we'll talk about that

when we talk about machine learning later in the course.

So let's talk a little bit more about selection bias in a more concrete way.

And let's look at how some forms of bias can combine with voxel

selection bias to multiplicitably increase false positives.

So for example, if I test two contrast maps and

I've gotten voxelselection bias in each contrast map,

I get twice the false positives and the corresponding increase in effect size.

If I do two experiments, I get twice the false positives.

This doesn't mean that we should correct for

multiple comparisons across every test that we've done.

But what it means is we need to be mindful of this when we interpret the effects.

So let's look at our four levels of bias, publication, experiment,

model selection, and voxels and tests, and how they play out in neuroimaging.

So this is a illustration of the file drawer problem.

And it's not from neuroimaging but I think it's quite illustrative.

And what we're looking at is studies of antidepressants that

have been submitted now to the British Medical Counsel.

And, across the five drugs, we're looking at

the studies that have been published in the blue lines.

And the y axis shows the effects size of the antidepressant.

And so those are pretty high when we look at only the published studies.

But the nice thing about these drug studies is that there's a national

registry where they have to have all the data submitted.

So they can go back to the unpublished data as well and

look at all the studies that have been submitted to the registry.

And we see those in red.

And what you see here is across the drugs The effect sizes in

all of the studies are substantially lower than they are in the published studies.

And that's an example of the problem in action.

So here's the next section which will look at

flexibility in experiments and in the model.

And this flexibility in choosing which experiments and which models and

which outcomes that you want to look at after you've observed some effects.

And trying to optimize the chances of getting nice looking effects from

publication has been referred to as p-hacking in the literature.

And there are tests for p-hacking now, that people are interested in doing.

So this is one influential paper where they discuss this.

And I think this is the paper where they coin the term.

So Simmons Nelson and Simonsen.

And they point out that researchers have millions of decisions to make.

Whether to collect more data, which outliers to exclude,

which measures to analyze, which covariance to use.

And the newer imaging which types to pre-processing and modeling in correction.

And the idea of p-hacking is,

that you make the analysis decisions as the data are being analyzed.

And then you want to create findings in publish

that would create a bias just like voxel selection bias.

Some of the red flags for p-hacking are the use of median splits in the data,

high and low responders when it's actually a continuous distribution.

Why not use the continuous scores?

Exclusion of many data points without really good

principle reasons for doing so.

Unconventional analysis choices or internal inconsistent analysis choices.

So maybe the researchers looked through lots of different possibilities and

they just picked the thing that worked the best.

P-values close to the threshold of 0.05

is one of the red flags that they pointed out.

Not everybody's P-value can be just under 0.05.

That's a flag for some effect that really wasn't significant and

then you're trying to get it to be more significant.

And then finally, unusual numbers of subjects without explanation.

And so journals have started to really flag those issues.

This is a view of the preprocessing pipeline.

And this is a paper by Josh Carp from a few years ago.

And what he did is he analysed the same data set many different ways.

In fact, he analysed them 34,000 different ways.

Just by picking different analysis steps and several strategies for

each analysis step.

So that he has got almost 7,000 unique analysis pipelines,

five multiple comparisons correction strategies, leading to 34,000 maps.

And so this is the mean activation across all of the analysis maps,

and also the range.

So the point is that there's a lot of variability

according to these analysis pipeline choices.

Some of them are better than others.

But the solution here for us is to really be principled and

consistent in your analysis choices.

It's not bad to have a good pipeline or

to change things about your pipeline to make it better.

But we should really make those choices in advance of looking at the results

as much as possible.

So here finally are some do's and don'ts in terms of

what we should do to avoid selection bias problems.

So don't is thoughtless analysis.

I don't want to give you the idea that you should just choose everything and

advance in one analysis and then be done and not look at your data.

It's really important to explore the data,

to examine the data, examine the assumptions.

The point is to get the right answer.

Not just to get the answer that we wanted in the first place.

And getting the right answer really requires looking at the data and

making some smart choices.

And sometimes we do have to change what we do based on what the data

actually looked like.

But we should do this in a principled way.

Don't do uncorrected exploratory analysis with strong conclusions.

So we can do those analyses but don't go to town and sell the story and

yourself on a finding that comes from an uncorrected exploratory analysis.

Do really work in advance to choose principled a prior hypothesis.

We talked about using net analysis to do that and that's in the next module.

And tried to conduct adequately powered studies which often requires a lot of

investment and resources.

Don't do circularity analysis and do,

do testing on independent data or data splitting to estimate effect sizes.

Don't do P-hacking, and do make principle choices and

use standardized, reusable pipelines with sensible a priori choices.

And finally, don't make heuristic reverse inferences.

We see the insula activated, that must be disgust, we see the hippocampus activated,

that must be memory.

There's a lot of various inferences that can come from that.

But do learn the techniques to make quantitative reverse inferences and

really use those to understand what your brain maps are telling you.

So that's the end of this module, thanks for listening.