In this video, we're going to define a confidence interval. Talk about the conditions required to be able to calculate the confidence interval with the formulas that we provide. I'm going to give you guys a hint. It actually is based on the central limit theorem. So the conditions are going to be very similar. And lastly, we're going to generally discuss how to find confidence intervals and how to interpret the results. A plausible range of values for the population parameter is called a confidence interval. Using only a sample statistic to estimate a parameter is like fishing in a murky lake with a spear. And using a confidence interval is like fishing with a net. We can throw a spear where we saw a fish, but we probably will miss. If we toss a net in that area, though, we have a good chance of catching the fish. In other words, if we report a point estimate, we probably won't hit the exact population parameter. On the other hand, if we report a range of plausible values, we have a good shot at capturing the parameter. So, based on this one sample's mean, how can we figure out what this range of plausible values is going to be? Well, this one sample mean, our x bar, is indeed our best guess for the unknown population mean. Therefore, any interval we construct should be constructed around that x bar that we know to be our best guess. Also, from the central limit theorem, we know that x bars are distributed nearly normally, and the center of that distribution is at the unknown population mean. One more piece of item that we want to think about is the 68, 95, 99.7% rule. Which tells us that, roughly 95% of random samples will have sample means that are within two standard errors of the population mean. Clearly then, for 95% of random samples, the unknown true population mean is going to be within two standard errors of that sample's mean. Note that we're being very careful about the language here. The 95% here only applies to random samples in the abstract. Once we actually have a sample, the mean of that sample will be either within two standard errors of the population mean or it won't be. So the 95% confidence interval can be constructed approximately as our sample mean, plus or minus two standard errors. In this formula, what comes after the plus or minus, the two standard errors, is actually called the margin of error. So usually we construct a confidence interval as a point estimate. In this case we are dealing with mean so our point estimate is the sample mean, plus or minus some margin of error. The margin of error for a 95% confidence interval is roughly two times the standard error. Let's take a look at a practice problem to put to use some of the concepts that we have recently learned. One of the earliest examples of behavioral asymmetry is a preference in humans for turning the head to the right rather than to the left during the final weeks of gestation and for the first six months after birth. This is thought to influence subsequent development of perceptual and motor preferences. A study of 124 couples found that 64.5% turn their heads to the right when kissing. The standard error associated with this estimate is roughly 4%. Which of the below is false? A says a higher sample size would yield a lower standard error. We know that this is always true. We've seen this with the central limit theorem as well. Conceptually this is because the higher your sample sizes, the less variable your point estimates from those samples are going to be. Mathematically speaking, the standard error is always sigma over square root of n so that n and the standard error are going to be inversely proportional, in other words if n goes up, the standard error is going to go down, so this is correct. The margin of error for a 95% confidence interval for a percentage of kissers who turned their heads to the right is roughly 8%. We just learned that the margin of error for a 95% confidence interval is going to be approximately two times the stand error. In this case, the standard error is given to be 4%, and therefore this option is also correct. The 95% confidence interval for the percentage of kissers who turn their heads to the right is roughly 64.5% plus or minus 4%. Remember, the confidence interval is always of the form, point estimate plus or minus a margin of error. In this case, what we have is our point estimate, the sample proportion, plus or minus a standard error, as opposed to the margin of error. And while those things sound similar, they're not exactly the same thing. Therefore, this option is wrong. Lets take a look at the last one real quick as well. The 99.7% confidence interval for the percentage of kissers who turned their head to the right, is roughly 64.5% plus or minus 12%. We haven't really talked yet in depth about using different confidence levels for confidence intervals, but hopefully it's obvious that we can do that. How did we come up with this 12% number? Remember, according to the 68, 95, 99.7% rule through 99.7% of the distribution will be within three standard deviations of the mean. Or in this case, three standard errors, since we're looking for the variability of a point estimate. So, 3 times 4 does indeed give us 12%. So this one also seems right, so the option that's false is C. How could we make this option correct? We could actually add and subtract the margin of error, which is given in part b so the approximate 95% confidence interval should be 64.5% plus or minus 8%. More formally, the confidence interval for a population mean can be computed as a sample mean, plus or minus a margin of error. This is critical value corresponding to the middle whatever you like percent. So, I have just xx here as a placeholder of the normal distribution times the standard error of the sampling distribution. As with the central limit theorem, there are some conditions that need to be satisfied to use this formula to construct confidence intervals. In fact, since this method is based on the central limit theorem, these are actually the same conditions. The first condition is independent. Sampled observations must be independent. And we talked about this being difficult to confirm. However, usually we either want a random sample, if we have an observational study, or a random assignment if we have an experiment and if were sampling without replacement we want our sample size to be less then 10% of our population. The second condition is about the sample size and skew. We either need n to be greater than or equal to 30 or larger if the population distribution is very skewed. And this second condition is actually a little stricter than what we saw with the central limit theorem. Because it places a minimum required sample size requirement. That's the n greater than or equal to 30. And we're going to discuss what we do if the sample size is smaller than 30 in the next unit. So for now, let's focus on what we call large samples and these are samples that have at least sample sizes over 30. Or even larger if the population distribution is very skewed. So when we're checking our conditions, we're definitely going to want to see a visualization of the distribution from the sample that we're going to use as an indicator for what the population looks like. Or we're going to need to be told to assume that we're going to need to be told that perhaps we can assume some normality and proceed. Earlier we conceptually developed the formula for the confidence interval for the mean as x bar plus or minus z star times the standard error. And we said that the z star for a 95% confidence interval should be approximately 2, as per the 68, 95, 99.7% rule. But this rule is simply a rule of thumb, and it's actually not exact. So how do we find the exact critical value for a 95% confidence interval? Remember that the confidence level refers to the middle of the distribution. So the 95% confidence interval will span the middle 95% of the normal distribution. So, let's mark that on the normal curve, and we're basically looking for the cut off values that mark the middle 95%. We can use the table to find these, but first remember that the tables always give us areas under the curve below a given z score, so the area under the curve below the lower bound of the middle 95% is simply 1 minus 0.95 divided by 2. Since the total area under the curve is one and the curve is symmetric, leaving equally sizes tails on each side. So this comes out to be 0.025 or 2.5 % on each side. Next. And we can take a look at a table. What we want to do, is we want to locate 0.025, the percentile within the table, and actually this time we can, we hit exactly 0.025, and then we want to look at the edges of the table to grab the associated z score. Which here, comes out to be negative 1.96 and the upper bound will then be positive 1.96, since, once again, the curve is symmetric. Note that the critical values in a confidence interval formula are always defined to be positive. Therefore, the exact critical value for a 95% confidence interval is actually 1.96 as opposed to the two that we were using as an approximate placeholder. Also remember that you can get this critical value using R. When we want to find cutoff values using R, we use the qnorm function which takes in the percentile as an input. So qnorm of 0.025 should also give us negative 1.96 and what we need to do is to just remind ourselves that if we are looking for a critical value, we are always going to need the positive version of this number.