Hi. My name is Brian Caffo. This is Mathematical Biostatistics Boot Camp Two, Lecture Nine on Simpson's Paradox and Confounding. In this lecture, we're going to talk about a phenomenon called Simpson's paradox. And I, I, I don't find it to be a paradox. But it's called Simpson's paradox. we'll talk about some examples like the Berkley data of Simpson's paradox. And then we'll talk about this related to the treatment of confounding, and then I want to cover a particular way to handle confounding through weighted estimators and then talk a little bit about the Cochran Mantel/Haenszel estimator. Okay, so consider this data right here, which is taken from Agresti's Categorical Data Analysis book. I, I think I've mentioned this book in the, in the past, So, in this, in this instance there was a cross classification. Of defendants from criminal trials where they cross classify by the race of the victim. These are all murder trials. So, the race of the victim, versus the race of the defendant and then whether or not the person got the death penalty. And here, I present all of the, the possible cells plus all the possible marginals. So for example, here, you see the eight cells that classify, victims race and here we are only factoring in, two race denominations, white and black, and death penalties, so there's eight cells. and then here I have, the the, the margin for the defendant white versus black. summed over victim's race. And then here I have the victim, white versus bla, black. summed over the defendant's race. Okay. So let's actually investigate this. So I'm, we're looking at the percentage of people that got the death penalty. So if you look white defendants receive the death penalty a fewer percentage of the time, 11 to 22%, for both white victims and black victims. zero of the, of the white defendants receive the death penalty for the black victims verses 2.8%. Okay? But then something kind of paradoxical occurs. If you disregard the race of the victim, it actually comes out that white defendants receive the death penalty a greater percentage of the time, 11% to 8%, okay? And then if you look at the race of the victim, disregarding the race of the defendant, actually in the instance where the victim was white, the, the the defendant received the death penalty a higher percentage of the time, 12% to 2.5%. But let's forget this last two, race of the victim, marginal. And let's just compare the te, table itself, in which case, in both cases, the the white defendants got the death penalty a smaller percentage of the time than the black defendants, regardless of the ri, race of the victim versus the marginal, here, 11 to 8%. Where the white defendants got the death penalty a greater percentage of time than black defendants. So what's happening? If you were asked to, you know, this, this was related to a court case about whether or not the death penalty was being equally applied, and so what would you conclude? The, if you condition on the race of the victim, you get a totally different, the opposite answer than if you look at the race of the defendant disregarding the race of the victim. So what is the right answer? So let's just discuss a little bit about what's going on. So marginally, white defendants receive the death penalty a greater percentage of time time than black defendants. Across white and black victims, black defendants received the death penalty a greater percentage of time than white defendants. And Simpson's paradox refers to the fact that marginal conditional associations can be opposing. In this case, if you take the margin across victim's race. You get a different answer than if you, condition on victim's race. Here, So, here the death penalty was enacted more often for the murder of a white victim than of a black victim. And then whites tend to kill whites, it just demographically. hence the larger marginal association. but I want to, you know, kind of do a little bit of a commentary before I go through more of examples. So I'm going to cover several examples. First of all, when you state Simpson's paradox in the following way, it doesn't seem that paradoxical at, at all. And that the paradox is, the apparent relationship between two variables can change when factoring in a third variable. And then that, that seems obvious. Of course that's true. It just seems difficult when you start to get mired in the specifics. That, and later on I'll go through the math to say that there's nothing paradoxical about the mathematics of Simpson's paradox. and Larry Wasserman, on his blog, The Normal Deviate, has the most wonderful discussion of why Simpson's paradox is difficult and what the mistake people are making. And the mistake people are making is they're equating the statements the, the causal statements with the probabilistic statements, and the probabilistic statements can be misleading and paradoxical and you're trying to make you know, difficult conclusions in the light of noisy evidence. so, and, and in addition, even if you knew the exact probabilities, the probabilities themselves can exhibit the paradox. However, the causal statements cannot exhibit a state of, of paradox. Okay? So the problem, I think, is his statement is that the. The, the, the real confusion is equating the, the cause, in this case, if you were to, the cause would be, you would say that for example, juries tend to convict causally convict black defendants more often than they convict white defendants. If you were to make that causal statement, then it is impossible for the, for marginal and conditional associa, conditional causal statements to disagree. and so, you know, the real details of this, I, you know, put up the link to his blog post, which is wonderful. But the real details of investigating the causal statements is beyond the scope of this class. We're not going to cover causal inference in this class but, but I think it was a great discussion on his part. To basically show or demonstrate that it is this conflation of cause. With describing probabilities and associations that is the apparent paradox. but mathematically there is no paradox and I think when we go through it you will see.