In this module we'll talk about some of the basic assumptions we need to make for statistical inference using the And we'll talk about issues of multi Which is a really important issue with Models. Let's review our basic design framework. Here we have four event types. Each of the event types has onsets for each event. They're convolved with a canonical hemodynamic response function, that's the HRF, which yields the design matrix in its basic form. We talked about some of the assumptions earlier, like the assumption that the neural activity function is correct, that the HRF is correct, and that there's an LTI system. And we talked a little bit about how to relax some of those assumptions. Now we're going to look back and talk about some basic assumptions required for valid p values in any linear model. So one assumption is that the data iid, independent identically distributed. And what identically distributed means is that the observations come from the same underlying distribution. If this is violated there might be sub groups in the data and that can mask or reverse directions of effects, we will look at an example of that a little bit later. We can also have outliers or extreme values that can influence the data in unexpected ways even in large samples the second assumption is the assumption of independence. We assume that the errors are independent conditional on the model parameters. What that means is that every subject in our design, or here every time point, is allowed to be or is independent of every other one. Let me say that again. Every subject in a standard Subjects will time point is independent from every other one and if this is violated p-values at the single subject level will be too liberal of increase false positives. The slopes are still good estimates, they're unbiased estimates, the slopes, but they're more variable than we think they are. One way to think about this is, the nominal degrees of freedom in that model is an over-estimate of the actual degrees of freedom. A third assumption is the assumption of linearity, or in general, that the model form is specified correctly. So with linear regression we're assuming that a straight line relationship is adequate. We're assuming that a straight line relationship is adequate. If this is violated there's a loss in power. And in some cases incorrect inference, especially with multiple regression. And again we'll look at examples of that in a few moments. Fourth assumption is normality. We are assuming that the areas are normally distributed. If this is violated, PNI's are wrong, but there is no simple rule for what way. Fortunately, if we are making inferences on the mean signal then do the central limit theorem that the distribution of the mean values are often normally distributed. Even if the individual data values are not. So this is often not a huge problem unless you have small samples. And finally, we make the assumption of equal variance, or homoscedasticity. And that refers to this idea that the errors have the same variance, or cross values, of x or a predictor. So if you see a plot of predictor versus outcome that looks like a funnel for example, than that can be a signal of violation of a quality of variance. And if this is violated, in general p values are somewhat too liberal. They aim to increase false positives and again, the nominal degrees of freedom is an overestimate because essentially the points coming from the high variance parts of the distribution have more pull than the other points, so there's an unequal contribution of the points. So, what do we do about these problems? And there are several kinds of fixes. The first basic one is to check the assumptions and look at the data. What are the most important things that we can look for and we'll see this again when we talk about robust regression is looking for outliers and skewed variables that can have strong influences on the data and take particular attention in neural imaging to behavioral predictors or clinical and other outcomes in brain behavior correlation or group analysis. And that's because you might not be able to look at the data on every single brain One by one, but if you have behavioral predictors, those are regressed against brain data everywhere in the brain. So if there's something funny about the distribution of those predictors, then that's going to influence results all over the brain, and that's something you can look at and think about carefully, there are a number of kinds of fixes that one can do. One family is variable transformation, so for example reaction time is a variable that's typically highly positively skewed because its bound to zero so its a positive tails and so it's very typical to take a wild reaction time to reduce that tails and make it look more normally distributed. Other kinds of fixes can be really simple to implement and very powerful. So they include non-parametric and robust approaches to the GLM alternatives. One approach, for example, is a a nonparametric test in imaging called statistical nonparametrical mapping. And it uses a permutation test approach to avoid making strong assumptions about the distribution of the data. And it provides results that are often closer to the ground truth than parametric assumptions, than models that make parametric assumptions. Another class is rank statistics, which are the original form of nonParametric statistics. And the idea, for example, of Spearman's rho, which is a robust Correlation or non-parametric. Experiments row which is a non-parametric correlation is to rank all of the data and behavior and then you would rank the data in each foxhole in the brain images and then you correlate the ranks. And what happens then is if there's a very extreme data point, maybe several standard deviations out from the group, what is does is it brings it back inline with the rest of the data. So your losing magnitude information about the distances between points. But often you have still most of the information in the data set. And it's much better behaved. And finally another category that we'll talk about later is robust statistics like Squares and other kinds of alternative methods. So here's a note on Because it's so important to understand this. So correlated predictors in any design matrix, including our FMR designs increase the variance or the uncertainty in parameters estimates and what that means is if two repressors are highly correlated. There's a fundamental uncertainty in which of the predictors should be assigned a credit for explaining the variance in the data, which one should have the positive slope and which one the negative slope? So let's look at some example data, times series FMR data. And this is really terrific data. You can see the rise and fall, which corresponds to a block design. There's really very little noise. So I'm going to do something tricky here, I'm going to take two predictors, I'm going to say here's a block predictor that's the red one and I'm going to correlate that against the data. And let's come up with a second predictor and because I think there might be a delay in the Response, I'm going to shift this predictor over by one time point and add that as well. So now to gather those two regressors can capture the response with some time delay built in. Is that a good idea? No, it's a terrible, terrible idea. Why? Well I put in two predictors that are highly correlated. Each one by itself might correlate or will correlate strongly with the data. But [COUGH] which one should get a positive slope and which one shouldn't? They both explain the same thing. They both make the same predictions and so we can't tell, so if we look at it this way. We can look at where the predictor values are different and only the data points at those particular values determine what the slopes are and that's called the basis for support. So here, we only have eight data points, where the predictors make different predictions. And so only at those eight data points, do the data actually matter for determining which predictor has a positive slope. So let's look at a little simulation where we can repeat this. Sampling with the same noise characteristics again and again and again and let’s look at what happens with those slopes, the betas. And what you see here is that they trade off. Sometimes the red is high, sometimes the green is high. But even though each one by its self would be a strong positive predictor of the data, sometimes they are negative, and so it's very common to flip from significant positive to significant negative results in a multicolinearity situation like this one. So you can see that betas are really fundamentally unstable. So now let's look at regression with two predictors. And what we're going to do is look at this two predictor space. The ground tooth, there's a strong positive effect of both predictor one and predictor two. What we're doing now is we're projecting the data down onto the space of the predictors. Those are the black dots. And fitting a regression slope. There's a strong effective predictor one, effective predictor two. Things look good. Now we get to repeat the simulation a bunch of times. And look at the stability of that regression plane. So there you see it, very stable. Now we will repeat that simulation. And the only thing that is different, now, is that there is multicolinearity, there is correlation between the predictor one and predictor two values. So here we go, look at the data in the three dimensional space of predictors and outcome. And now we're going to project those data points down onto the predictor space and you can see there is a ridge there, right? You're not high on one and low on two or vice versa. Now still here there's a strong effect predictor one, predictor two, great, right? We got lucky. Now let's see what happens when we repeat the test. Now look, the slopes are ll over the place, sometimes positive, sometimes negative. And you can see what's happening is they're actually tipping back and forth like a see saw or a scale on that ridge of data. So they're inherently unstable. That's the problem of multicolinearity in a nutshell. So p-values can be misleading with multicolinearity like this. It's easy to flip-flop from significant positive to significant negative because of this fundamental instability. This is one of the ways in which multiple aggression is tricky business, and there's others as well.