Okay, so just expanding on these points, matched binary data can arise from several circumstances, for example, when measuring responses to occasions or matching on k status in retrospective study. matching on exposure status in a prospective study or a cross-sectional study. and all these cases, matching in general right? Matching general induces a dependency, and that has to be accounted for in the analysis. So the pairs on binary observations are, are independent. I'm sorry, the pairs are dependent, in other words, your response at time one is correlated with your response at time two. So our existing methods don't apply. However, we assume that, that, you know. Person one who responded at time one and time two, is independent of person two who responded at time one and time two. So there's, we're assuming, independence. Across pairs, the dependence within pairs. Okay, so let's look at some notation. So here we're going to use our standard contingency table notation, where we have n11, n12, n21, n22 for the four cells. And then we have n plus 1, n plus 2, n1 plus, n2 plus. and so here's our data, the n's, and we're going to assume that the, the four cell counts, n11, n12, n21, n22, are multinomial. With n, which is the sum of them, trials. And then the associated probabilities is conveniently labeled pi 11, pi 12, pi 21, and pi 22. So in other words we're going to assume that every pair of measurements, every time 1, time 2 collection pair of measurements, is going to be a one or a zero in exactly one, one in one of these four locations. So the, the person will have either said yes at both occasions, a yes and then a no, a no and then a yes, or a no and then a no. So they're going to only be a one in each one of those occasions. And the probability of, of being the probability of being a one in that particular cell is pi IJ. Okay, and then the multinomial is just the sum of all of these Multivariate Bernoulli. Okay. And then we would denote the margins with plus, n1 plus for the row margin, pi1 plus for the row margin of the probabilities and so on. And so pi1 plus and pi plus 1 are the marginal probabilities of a yes response that the two occasions disregarding the other occasion, so pi1 plus. is the probability of saying yes at Time 1 regardless of whether or not you said yes at Time 2. And Pi plus 1 is the probability of saying yes at Time 2 regardless of whether or not you said yes at Time 1. Okay? So marginal homogeneity is the hypothesis that these two marginal probabilities are the same. That's how it gets its idea, marginal homogeneity pi 1 plus equals pi plus 1. And of course because there's only two probabilities right? If pi 1 plus equals pi plus 1, pi 2 plus equals pi plus 2. So the marginal probabilities are the same and so we call it marginal homogeneity. you can do a very quick calculation right? Pi 1 plus is pi 1 1 plus pi 1 2. Pi plus 1 is pi 11 plus pi 21, right, and the pi 11 is common in both of those. If you subtract them out, this hypothesis is identical to pi 12 equal to pi 21. Okay, and so that's, that hypothesis is referred to as symmetry, because it is the off-diagonal elements of the table and it's basically saying that the true probability matrix, the true probability two by two table would satisfy being identical under the transpose if you were to transpose the table. and so that property is called symmetry and hence this marginal homogeneity hypothesis is equivalent to symmetry, only in the case of a two by two table, and in more general cases, it's not true. So the we, we clearly have an estimate for all of the pis, so pi 12 estimate is just the n 12 divided by n, pi 21 estimate is n21 divided by n and so on. simply the proportion. The, the estimates of the true probabilities of landing in each cell would be the proportion of people who landed in each cell. So the obvious estimate of, of the difference between p12 and p21, are ie. How far away from symmetry are, you are. Or in other words, how far away from marginal homogeneity you are is just n12 over n21 minus 1 over n. And it turns out and this is maybe a little bit involved for us to go through, but under H not as consistent estimate of the variance turns out to be n12 plus n21 divided by n squared. and so if you were to take this numerator and one, and take this as our statistic, n12 over n minus n 21 over n, and divide it by the standard error, so square root n12 plus n21 divided by the square root of n12 plus n21 divided by n. Divide those two, you would get a so-called z statistic. The preference the preference in this case is typically to square that statistic. I think that matches the traditional development. And so the square of that statistic works out to have this convenient form, n12 minus n21 squared, over n12 plus n21. And this follows a chi squared distribution because of course the Z statistic, squared follows a chi squared distribution with one degree of freedom. So this is the famous McNemar's test statistic. And you were jerked marginal homogeneity if this test statistic is large. So this test is called McNemar's test. And notice what's interesting about McNemar's test is that only n12 and n21 are used. They, they're the only ones that carry the relevant information about pi 1 plus, and pi plus 1 being different. now, n11 and n22, the concordance cells, contribute to the magnitude of this difference, but, in testing whether or not they differ, it's only the discordant cells, n12 and n21, where people disagreed from time 1 to time 2. so that's an interesting fact about this test. It's called McNemar's test and it's a, you know, it's a very famous statistic. Okay, so let's look at our test statistic from the approval rating example. We have 86 and 150 as the off diagonal cell, so that's 86 minus 150 quantity squared over 86 plus 150. That works out to be 17.36. The P value is extremely small then, right? Because right, if chi squared was one degree of freedom, it's going to be unlikely to be, extremely unlikely to be above 9, 3 squared. with three as a, as a way out on the tail of the standard normal. hence we reject the null hypothesis, and conclude that there appears to be some sort of change in the opinion between the polls. any rate in R you can just do mcnemar.test, you have to give it a matrix. And again this is one of these options, one of these instances where if you if you want get exactly the statistic you work out by hand you have to put correct equals false because it does a continuity correction. by default you, you want, in general to leave the continuity correction in, I'm just putting it as false here so you get, it matches exactly your by hand calculations.