So with this shuffle test, how does it actually work? The p-value, which we didn't talk about directly, is the probability that the outcome observed would happen by chance with repeated experiments. And we simulated those experiments directly. And compared with the classical method, this is more robust and, arguably, much easier to interpret. Because you just see the distribution right in front of you that you generated by running the experiments. This matches exactly the interpretation that you're supposed to use when reasoning about things like the p-value and these confidence intervals, which we'll talk about in a second. And it makes no assumption about the underlying distribution, which is crucial, right? You don't have to sort of imagine what the underlying population, the characteristics of the underlying population, or the parameters that define the distribution of the underlying population. You just re-sample the method, and you'll see that distribution painted out, or the sampling distribution, painted out. So for example, it handles skew distributions naturally. Consider salary measurements, where the $50,000 a year salary is the mean, but the variance is 15k. Well, if you model this as a normal distribution, you have a non-zero chance of getting a negative salary, which can't actually happen. But when you just work from the sample that you've already collected, you won't have any negative salaries for sure. Okay, so you don't have to sort of stand on your head to kind of constrain the or change the distribution of the underlying population. You're changing the whole distribution so that it's kind of bound at zero or do anything like that. You just work with the data you have, and you'll naturally sort of be robust with respect to the constraints of the real world. Okay, so put another way, classical theory is based on the idea that averaging tends to produce normal distributions, right. This is the central limit theorem, and that most statistics are essentially some kind of an average of one kind or another. So asymptotically they are normal distributions, they're Gaussian. And so we can start from a Gaussian and then reason about the sampling distribution of all these different kinds of statistics one way or another. And pretty much all of this has been worked out for any kind of situation or case or statistic you want to compute. The sampling distribution has probably been derived, and you can look it up in a textbook. But, as an alternative, you can do this kind of Monte Carlo simulation, where we pretend we know the population and simulate samples from it. And repeat over and over just to construct the sampling distribution for the estimator you're interested in, more or less directly. Okay, so I may be belaboring this, but I really wanna sorta bring home the idea that not only is this a, arguably clearer to understand, it actually is closer to the goal. It's closer to the actual experiments that you're trying to reason about theoretically, okay. So, in general, deriving the sampling distribution is difficult, but simulating the sampling distribution is easy. So far we've shown this shuffle test, but now I wanna show we, shufflling is useful for significance testing. So you can tell whether the value's significant or not. But it doesn't really give you the effect size. It doesn't give you how important that value is, what kind of range of values that estimator can take. And for that you need the confidence interval, which we'll talk about now using a different form of re-sampling method.