Welcome back. In this module we're going to talk about waylee squares and robust regression. So, let's talk about second level analysis with robust regression. What robust regression algorithms are is a class of algorithms that are designed to be less influenced by extreme values. So, this is a potential way to deal with outliers and extreme observations in your design. High-variance observations which are far out from their respective means or regression lines on average tend to dominate the results if those observed values are extreme, especially. So, what we're doing with any kind of Least squares estimation is minimizing least squares. So, what that means is that outliers, things that are far from the regression line, have more pull. Also, if those values are extreme on the predictors. They have high values on the x variables, predictor variables. They have high leverage, and they can be very dangerous. Because they're going to exert a tremendous pull on the regression line. So, this creates a lot of problems for the standard analysis framework. When you can't check the assumptions for each voxel for the normality of the distribution to the presence of outliers, then automatic procedures for weighting the observations for potential outliers status are advantageous that's what robust regression is all about. So, let's look at an example. Here's some Null-hypothesis data on the left with 50 subjects. So, as you can see, there's no effect here in this sample. And now we have the same exact data, but we've included one outlier point, which is high out on the x value on the x axis. So, it's got high leverage and it's high variance so its going to tend to have a large value on the Y axis in one direction or the other. And as you can see, this one data point now, produces a significant result in our regression. That's a problem. So, the structural model for the General Linear Model to review is, y equals X times beta plus error, in matrix form. And now, let's consider that we want to weight some observations more than others. So, let's come up with a vector w, which has weights for every observation. Let's define matrix Q then, which is diagonal matrix, that applies those weights when you multiply is by a data vector. So, there's Q with the weights and the diagonal and now we're just going to multiply each part of that structural model equation by Q. So, Q times Y equals Q times X times beta plus q times error. Now, let's also define a matrix W that's Q transpose times Q. So, in that case, the solution to beta is beta hat, that's the estimates to beta is X transpose WX inverse extranspose WY. So, that winning matrix W applies the weights and now I'm estimating the betas in a way That includes the weights on the observations. So that's the basic principle behind the weight of these squares. And beta-ha is the weight of these squares solution. So, weights can be used, and this framework can be used, in a number of different ways. So, this is very fundamental. These ways can be used to adjust for inequality of variances, heteroscedasticity. So, for the weights, for example, on inversely proportion to the variance of group. And secondly, it can be used to de-correlate or whiten autocorrelated data. If you in estimate what the form of autocorrelation is, see your autocorrelation. And then use weights that are proportional to its inverse. Third, it can correct for non-independence and repeated measures designs. So, if you're sampling for the same observations time one, time two, time three, there's some correlated structure, and that can be estimated and controlled for in this way. And finally, the subject of today's lecture It can be used to down-weight extreme values or potential outliers in robust regression. And by the way, in all these cases, the betas, the variance estimates, and the degrees of freedom have to be adjusted as well for the fact that you're weighting the observations. So, here is robust regression algorithm with iterative generalized squares And we just walked through how the algorithm works, in it's basic form. So, first, we often wait based on the inverse the leverages. So, we don't want extreme values to dominate even our initial fit. We fit the weight of least squares model. And then, we scale the residuals by a robust estimator of their outliers status. How far they are from the regression line. And we apply a waiting function and that you can see in C, there down at the bottom, you can see a couple of popular robust waiting functions. The bysquare function or function. Once we do that, we refit the model in wait of these squares with the extreme values down waiting, and we iterate these types until the algorithm converges on this solution. And then finally once we've converged on a solution, we adjust the variances and degrees of freedom so that we can get valid and accurate P-values. So, now let's look at a Null-hypothesis data case. There's the original data. There's the data with an outlier with a significant positive slope. When we fit robust regression, this is the IRLS solution to that And what you can see here is that the weights, the darkness of the point is proportional to the weight. And here, this extreme value is a clear white circle. So, that means that it's weight in the final model is basically zero. Most of the other points have solid black circles indicating that they have high and equal weights. So, effectively what we've done here is we've down-weighted that one extreme value and now we get something that's again closer to the right solution. So, now let's look at a case study from some actual FMRI data. And we'll look in the right motor cortex, and we're going to look at a series of conditions where people made button press responses with the left hand or the right hand. So what we see here is t-scores from a group analysis in eight conditions. The first four conditions are button presses with the right hand, so we shouldn't see a strong response. And the last four conditions are button presses with the left hand, so we should see a substantial right motor cortex response. And now what you see is in the black lines here, ordinarily square solutions, and indeed, the t-scores are substantially higher for the left button presses than the right, so that's good. Now, let's look at the robust solutions, which are in grey, and the t-values especially for some of those conditions are quite a bit stronger For conditions five and six there. And these actually happen to be the conditions where there are extreme values in one subject that go in the opposite direction. And finally, the univariate case refers to the idea of trimming the data, or just removing outliers that are extreme values. Without the robust waiting procedure. And as you can see here, there's really no benefit here, to the outline removal. And this has been a pretty consistent finding, that the is not a terrific solution, and robust regression is So here, the conclusion is increase power with robust regression. And there's really little or no cost here, that's apparent. Let's look at another case study, from another study. This is a study where there's both visual responses, and there's also painful events. So, what we expect to see is activity in the visual cortex, for visual responses. An activity in the sematicentry cortex especially S2 and the posterior insula among other regions for pain. On the first column here you see the OLS solution for both and you can see visual activity. It's unilateral, left side only for the OLS and vision. The middle column is the IRLS solution, and as you can see here, there's bilateral visual activity, and there's also responses in the pain areas that we expect. And now, the final column on the right shows you the map of where there's significant differences between them. so, what you can see here is stronger statistical values in the visual cortex for visual responses, and in the posterior insula for, and that's 2, for pain. Which is exactly where we should expect to see real results So here there are higher t-scores with robust regression in areas which we expect to effect. So that's good news of robust regression as well. Now, we'll look at some simulations. So, what you see here is a case of, a series of simulations where there's no outliers in the data. So, all the assumptions actually hold So, we shouldn't see a benefit from this observation here, but we should be able to evaluate the cause. And on the top panel there is no true signal and we can look at the false positive rate. The x-axis is the sample size ranging from 5 up to 40 subjects. And what you see here, oops. And the bottom panel shows you power when there is a true effect which increases with the sample size. So, let's look at the power curve first. The IRLS solution has, within IRLS there's a very small cost and power. The line is below the other lines, but there's a very small cost. And what we're comparing it to here is in blue the ordinary least squares solution, and the light blue line, outlier dropping. The green line, multivariate outlier with mahalanobis distance. The red line and black line are both robust regression, IRLS algorithms with different weighting functions, the bisquare and Huber weighting functions. So, what we see is a very small cost in power but not really much of a problem. And know when we look at false positive rates we can see the it appropriately controls the false positive rate. In fact, the only algorithm that doesn't control the false positive rate where the actual false positive rate is higher than the nominal one is the Mahalanobis distance based outlier a removal and this is a cautionary tale that [INAUDIBLE] the outliers can sometimes have unexpected effects, unless we really know how it's behaving in the data. So, we should do those things with caution. So, there's the increase in false positive rate. So, now we'll look at a case, and these are brain behavior correlations. Where we have multivariate outliers, values that have higher variance on both the predictor and the outcome. And what do we see here? With multivariate outliers all of the methods show increases in false positive rates above the nominal false positive rate of .05 which is the solid horizontal line. But the worst of those is the ordinary least squares solution. And the IRLS solutions are actually better than ordinary least squares, in terms of false positive. Now, here the distance, the multi-rate outlier removal, is performing better. But that's really because the procedure that was used to remove outliers was exactly the procedure that's used to generate the outliers. So, in that case, it can be a win. But, on average, the multivariate outlier removal isn't a good idea unless you really know that it's performing well and is valid for your data. So, IRLS here is better. And then, finally, in the bottom, we're seeing the case where there really are true effects, but there are also outliers. And IRLS is the most powerful. Here, it's the bi-square rooting function when true effects are present. And ordinary squares is the worst. So, ordinary squares is not very robust to outliers in terms of power. And finally, univary outlier removal here is again sub-optimal. It's performing really not much better at all than ordinary squares. So robust regression can help and it can help even with very large data sets. So, this is some recent work buy Virgio and it's from a European imaging genetics study with 1,500 people. So you think huge sample size, right? What do we need to worry about these few outliers for? Few bad apples. Well, what they're showing here in this map is it's an anger recognition task and they're showing greater re-producibility of the maps in other ways I'm not showing you here. And more putative true effects detected during anger recognition. Here, there's stronger effects in the stratum and some additional Things that show up with the robust liner model, that's the bottom series versus OLS on the top. And interestingly enough they also have cases where they believe that they found false positives that go away with robust regression. So, what you see here on the left is a blob in the fallumus that's a significant result under orderly squares. And not under the linear model. And when they look at what's happening in the thalamus. What you see there is, and this is a correlation between impulsivity and brian scores, and they plot that correlation and you can see that there's extreme heteroschevascity is inequality of variance. And there's a few points, theres a skew in the impulsivity score, so that there's a heavy tail on the right end, and that means those points have a lot of leverage. And as you can see, there's one point there, on the bottom right around impulsivity score of a little bit higher than eight, That out of 1,500 subjects, has this very extreme value and have a very extreme brain value as well. And that subject was enough, almost by themselves to drive a significant group result. So, the IRLS procedure, Robust on your model, down weighted this extreme value. So, it's a clear circle not filled, so its weight's very low. And the significant result goes away. So, this is an example of how, even in large samples, we can end up with fewer false positives in some cases with robust regression. So, lets summarize. Ordinarily scrows models are influenced by outliers. Like all models. T tests are fairly robust against false positives but they're really not robust against loss of power And this is even more true with regression and correlation. So, regressions and correlations are not robust against loss of power or false positives so we can end up with lots of problems. Robust regression is a good way to minimize the influence of those outliers especially when you can't check the assumptions and data at every test performed. So, there's a small cost in power when the assumptions hold but potentially larger benefits when there are some problems with the data. Now, here's some further reading on some of the older [INAUDIBLE] and papers Robust regression and neuroimaging. And finally, thank you for your attention. >> [SOUND]