This course covers the analysis of Functional Magnetic Resonance Imaging (fMRI) data. It is a continuation of the course “Principles of fMRI, Part 1”

Loading...

来自 Johns Hopkins University 的课程

Principles of fMRI 2

82 个评分

This course covers the analysis of Functional Magnetic Resonance Imaging (fMRI) data. It is a continuation of the course “Principles of fMRI, Part 1”

从本节课中

Week 2

This week we will continue with advanced experimental design, and also discuss advanced GLM modeling.

- Martin Lindquist, PhD, MScProfessor, Biostatistics

Bloomberg School of Public Health | Johns Hopkins University - Tor WagerPhD

Department of Psychology and Neuroscience, The Institute of Cognitive Science | University of Colorado at Boulder

So one important thing to know is that inferences on the model parameters or

the beta slopes depend on getting the model for other parameters right.

And in some cases, this really includes things that

you should have included in the model but didn't.

So the interpretation of the parameters, the slopes, is always model-dependent.

So here's a tricky example.

I hope you'll agree that there's a positive relationship

here between predictor and outcome.

Looks good, right?

However, what I didn't tell you is that this data has subgroups.

And now when I look within subgroups, green and

red, there's a negative relationship within each subgroup.

This is something that really does happen in practice.

So, which one is right?

Well, they're both right in a sense,

across the whole population, there's a positive relationship between X1 and X2.

However, that doesn't mean that if I manipulated X1,

changed X1 somehow I could change Y in that direction.

In fact that might have the opposite effect.

And that's because the increase overall in Y with X1

is due to a group, green, which is high in both.

So here, X1, my predictor, is collinear with the group.

Green versus red.

And that's what's causing this problem.

And so it might be more correct to infer that within a group

the relationship is negative.

So one way to look at collinearity is by looking at variance inflation factors.

And variance inflation factors among other metrics have

emerged as my favourite really simple way of looking at a design matrix and

understanding what some of the problems might be.

So [COUGH] the idea is you can calculate Variance Inflation Factors for

each regressor in your design matrix and

this is the increase in the error variance that's due to design multicollinearity.

So for example,

if you have a VIF of 2, That means that the error variance will be doubled.

Error VIF of 5, error variance is five times higher than it would be,

which obviously you don't want.

So how do we calculate it?

Well, for a design matrix, X with columns I equals 1 to capital I there,

the variance inflation factor for each column is one minus

one over the variance explained by the other predictors.

So this connects a series of progressions for each column affects the outcome and

then the predictors are the other regressors.

And that's going to get me the VIF.

So let's look at an example.

[COUGH] Here's an event related design with four event types.

We can see the four model regressors there.

And we can look at this in a matrix form.

This is the heat map of the four regressors.

Now time is going down.

And it's the plot of the same design matrix.

So let's look at the Variance Inflation Factors here in this random

event related design.

And you can see them here.

They're the orange dots.

And they're all pretty close to 1.

And what I've done here is marked off a level of 2 in blue, 4 in green, 8 in red.

And they're pretty close to 1 which means that

the regressors are essentially orthogonal, or very, very close to orthogonal.

And that's optimal in this sense.

So here's some properties of Variance Inflation Factors.

It's estimated for each column in the design matrix, so some columns may

have high Variance Inflation Factors, others low Variance Inflation Factors.

So I can have a multiple collinearity problem only with the sub-space

of the design.

Adding nuisance regressors, for example, head movement parameters and

other physiological noise parameters to a FRMI design matrix might increase

the VIFs for some regressors more than others.

And we'd like to know what's the increase in Variance Inflation Factor when I

include those nuisance regressors in the model.

That can give me a clues about task prohibited head movement and

physiological artifacts.

An important point is that pairwise correlations between the predictors

are not enough to assess multicollinearity.

I'll show you an example of that, and that's because the multicollinearity

problem doesn't depend of whether it's correlated with a regressor,

it's correlated with any single other regressor, but

with any combination of those regressors.

So it may not be obvious.

So here's a design matrix.

I've taken the exact same design as before the four regressors, and

I've added a new regressor.

Now if you look at that it looks reasonable, I think, right?

And here are the pairwise correlations between them.

So the fifth column shows some correlated some correlations with the other columns,

but it's still estimable, so

you might think you have a reasonable shot at estimating this design matrix.

However, you would be wrong.

And the reason that you'd be wrong is that the new regressor that I created

is a perfect linear combination of two of the original regressors.

It's just the first one minus the second one.

So this model, there's no unique solution for those betas at all.

So now, think about this.

Which variance inflation factors will be effected by this?

And which of these model perimeters are not uniquely estimal?

[LAUGH] Well here's the answer.

Here's a plot of the Variance Inflation Factors again in the full design.

And i've used the red bars where

the Variance Inflation Factors are nearly infinite, they go up to infinity.

So these are not estimable at all, but look at model primers three and four.

They're just as they were before.

There's no effect on those.

Why is that?

Because I've taken predictors one and two that combine that to risk predictor five.

So one two and five are all collinear.

I can estimate any of them uniquely, but three and four are just fine.

So how are present factors for

that into correlations in the design matrix our values and power?

So this is a plot of assimilation that shows you some of these relationships.

As you can the left here, for sample sizes of 50, 100, and 500 they're all the same.

As the predictor correlation goes up above 0.8,

0.9 and the Variance Inflation Factor hits 0.5 and above.

So correlation point nine is a course sponsored

Variance Inflation Factor of close to 10.

And a correlation between predictors of 0.8

corresponds to a Variance Inflation Factor of about 5.

And now on the right, we'll look at power.

This is our ability to detect a true effect if it exists

at 0.05 uncorrected for the different sample sizes.

So obviously here power depends on the sample size as well.

And for 50 subjects, it's relatively low and

it drops to 0 as the Variance Inflation Factor increases.

As they move up the 500 subjects, power starts out very hight, but even then,

as you can see, with 500 subjects and

this is effect size of co DH1 and strong effect size,

then power drops to quite low levels as the Variance Inflation Factor goes up.

So these are smooth curves.

There's no hard and task rule for too high and how high it needs to be.

How high is too high depends on your study design,

your sample size and also the goals.

In some cases, the whole goal of the study is to disentangle

some correlative predictors and, so some degree of correlation is inevitable.

But if the correlations are very strong, then you're not going to be able to get

the right answer no matter how large the sample.

And here again, the P values can be misleading because of multicollinearity

with high Variance Inflation Factors then the effects can flip-flop from

significant positive to significant negative.

So here's a take on one regression.

First, on multicollinearity.

It's important to check for multicollinearity and it's easy to do.

Look at your design matrix visually.

Look at the pairwise correlations.

But also look at the Variance Inflation Factors because they're giving you some

unique information.

And some take-homes on interpreting P-values and making inferences.

So the P-values and their corresponding effect sizes,

T-values, and Z-values are only valid if the GLM assumptions hold.

And we went through a number of those.

And secondly, a predictor with a significant fit doesn't mean that

the predictor is the right model.

Just because the predictor explains some of the variants, it doesn't mean

it explains more variants than all the other possible models out there,

which is a really important point.

So, just because it fits doesn't, it doesn't mean it's the right model,

it just means it explains some of the variance.

And third, variables that you haven't modeled may actually be causing effects in

your data and confounding effects that you're observing.

So this is something to keep in mind and

to think through whenever you think through the specifics of your study.

That's the end of this module, thanks for tuning in.

[SOUND]