When we first introduced hypothesis testing in unit one, we likened it to a court case. And just like court cases, hypothesis tests are not flawless. In the court system, innocent people sometimes are wrongly convicted and sometimes the guilty walk free. Similarly, we can make wrong decision in statistical hypothesis tests as well. The difference, though, is that we have the tools necessary to quantify how often we make errors in statistics. So, in this video we're going to first introduce type one and type two errors and we're also going to discuss a little bit how we can balance these error rates. Just to give you a sneak peak, the likelihood of making a type one error and likelihood of making a type two error are actually inversely proportional. So it's actually not that easy to keep both of those error rates down. So sometimes we have to choose and we're going to talk about how do we choose between which one we are okay with being a little higher versus which one we really want to minimize as much as possible. Here's a two by two table that basically tells us what the true state of the hypotheses are, and remember we usually don't know whether the null hypothesis or the alternative hypothesis is true, but we make a decision regardless based on the evidence that we collect or on the statistical significance of that evidence. If the null hypothesis is indeed true and you fail to reject it, you've done the right thing. There's absolutely no reason to worry. Similarly, if the alternative hypothesis is true and you reject the null hypothesis in favor of the alternative, once again, you've done the right thing. But how about the other two cells? A type one error is rejecting the null hypothesis when the null hypothesis is actually true. So in other words, rejecting the null hypothesis when you should not have. A type two error is failing to reject the null hypothesis when the alternative is true. In other words, it's failing to reject the null hypothesis when you shouldn't have. We almost never know if the null or the alternative is true. But we need to consider all possibilities. So if we again think of a hypothesis test as a criminal trial, then it makes sense to frame the verdict in terms of the null and the alternative hypotheses. The null hypothesis says that the defendant is innocent. Remember usually it's innocent until proven guilty, at least in the US, so it makes sense that the status quo the null hypothesis says that the defendant is innocent. The alternative says that the defendant is guilty. So let's take a look at these two questions. Which type of error is being committed in the following circumstances? Declaring the defendant innocent when they are actually guilty. So in this case, we are saying that the defendant is innocent, however they are actually guilty. This means that we are failing to reject the null hypothesis when the alternative is actually true. Indeed, that is a type two error and the converse of that is declaring the defendant guilty when they're actually innocent. So in this case, we have rejected the null hypothesis in favor of the alternative when the null was true and that is the definition of a type one error. One question we might then ask is which error is the worse to make: a type two error where we make the defendant innocent when they're actually guilty or a type one error where we declare the defendant guilty when they are actually innocent? One might consider this a subjective question. However, usually I think there's a consensus around a quote from William Blackstone. Better that 10 guilty persons escape than one innocent suffer. So, if you agree with this, and you're setting up your hypothesis testing scheme, then you want to think about which error rate should I be minimizing so that I don't accidentally make suffer one innocent person, even if it means that a few of the guilty ones might, might walk free. So in that case, we want to make sure that we want to minimize the rate of the type one error where we would be declaring a defendant guilty when they're actually innocent. If you do not agree with this statement and you think that a type two error is a worst offense, then in your case you would want to be minimizing the type two error rate. Let's talk a little bit about the type one error rate. As a general rule, we reject the null hypothesis when the p-value is less than .05. In other words, we set our significance level, our alpha, to .05. This means that, for those cases where the null hypothesis is actually true, we do not want to incorrectly reject it more than 5% of those times. In other words, when using a 5% significance level, there is about a 5% chance of making a type one error if the null hypothesis is true. We can simplify this to read as the probability of a type one error, given that the null hypothesis is true, is simply equal to the significance level that you set forth. This is why we prefer small values of alpha. Because if we were to increase alpha that might sound good because that would allow us to reject our null hypothesis more often which is usually what we are trying to do. However, we would also be inflating our type one error rate. So increasing alpha increases the type one error rate and that's why we like to keep it as small as possible. So how do we choose our alpha? That's actually a balancing act. If type one error is dangerous or especially costly, we might want to choose a small significance level, something even smaller than the 5% that we're used to working with. The goal here is that we want to be very cautious about rejecting the null hypothesis, so we demand very strong evidence favoring the alternative before we would do so. If, on the other hand, the type two error is relatively more dangerous or much more costly, then we want to choose a higher significance level. Because increasing our significance level will in turn have the effect of decreasing our type two error rate. The goal in this case would be that we want to be cautious about failing to reject the null hypothesis when the null is actually false. Let's take a look back at our truth versus decision table. A type one error, we said, is rejecting the null hypothesis when you shouldn't have. And the probability of doing so is alpha, our significance level. The type two error is failing to reject the null hypothesis when you should have and the probability of doing so is labeled beta. This is not something we get to set ahead of time and the calculation of beta is actually not that trivial. The power of a test is a positive thing. Instead of an error where we have, we have labeled this a positive word. This is the probability of correctly rejecting the null hypothesis and the probability of doing so is one minus beta. If the alternative hypothesis is true, there are one of two things you can do. You might commit a type two error, and the probability of doing so is beta, or you might correctly reject the null hypothesis in favor of the alternative, and the probability of that, then, is going to be the complement of beta, one minus beta. And that's what we call our power. Our goal, in general, in hypothesis testing, is to keep both alpha and beta low at the same time. But we know that as we push one down, the other is going to shoot up. So we usually want to find a delicate balance between these two probabilities. And let's wrap up our discussion by talking about the type 2 error rate. If the alternative hypothesis is actually true, what is the chance that we make a type two error? In other words, what is the chance that we fail to reject the null hypothesis, even when we should reject it? The answer to this is not obvious. If the true population average is very close to the null value, it will be very difficult to detect a difference and to reject the null hypothesis. In other words, if the true population average is very different from the null value, it will be much easier to detect a difference. Clearly then, beta, the probability of making a type two area depends on our effect size. An effect size is defined as the difference between the point estimate and the null value.