So far in this unit, we discussed inference on a single mean as well as inference for comparing two means to each. Next, we move on to comparing many means simultaneously. Our motivating data comes from the general social survey. The two variables of interest are vocabulary scores and self-identified social class. Vocabulary score is calculated based on a set of 10 question vocabulary test, where a higher score means better vocabulary, and self identified social class has four levels, lower, working, middle, and upper class. The vocabulary test works as follows. Respondents are given the following list of words and are asked to choose a word from the list that comes closest to the meaning of the first word provided in capital letters. For example, which of these words is the word edible closest in meaning to? Or if you were the respondent on this survey would you mark don't know? The answer should be fit to eat. If you're curious about the vocabulary test, feel free to pause the video and work through the rest of the words as well. But for the full purposes of this example, we're not going to be focusing on what the words mean, but instead we'll take a look at how people who took the survey did on the vocabulary test and whether there are scores associated with their social clause. The distribution of the vocabulary scores is shown in this histogram. And the distribution of social class is shown in this bar plot. These visualizations tell us about the variables individually. But they don't really tell us much about their relationship. Side by side box plots are useful for visualizing the relationship between a numerical and a categorical variable. And summary statistics are also helpful. We can see some differences between the groups. But we don't yet have the tools for determining whether these differences are statistically significant. Let's take a quick look at this question. Which of the following plots shows groups with means that are most and least likely to be significantly different from each other? The groups that are clearly separated from each other are most likely to have means that are significantly different from each other. Plots number one and three shows groups with the same centers. But the data in plot one are much less variable than the data in plot three. Hence it would be much easier to detect the differences in means for data in plot one, as the groups are much obviously separated. On the other hand, plot number two shows groups with centers that are very close. And therefore, and this plot is going to be the one that is with groups that are least likely to be significantly different from each other. Our goal is to find out if there's a difference between the average vocabulary scores of Americans from the different classes. We know that we can compare means of two groups using T statistics and comparing a groups of three or more is going to require a new test called analysis of variance and a new statistic, the F statistic. The null hypothesis in ANOVA, just like any other null hypothesis says there's nothing going on, or in other words, the mean outcome is the same across all categories. We can denote this as mu one one equals mu two equals mu three all the way to mu k. Where each mu indicates a group mean and k is the number of groups. In other words, the levels of the explanatory categorical variable. This value was four for the data set we introduced earlier where people self identified as either lower, working, middle or upper class. The alternative hypothesis says there is something going on. But it's not very specific. It says that at least one pair of means are different from each other. But it doesn't specify which means are different. This is an important point we're going to come back to later. But for now, think about it as, if we do reject the null hypothesis, we find out that there is something interesting going on in the data, and we might need to dig deeper to find out which group means are actually different from each other. In a t-test, we compare means from two groups to see whether they're so far apart that the observed difference cannot reasonably be attributed to sampling variability. In ANOVA we compare many means, from more than two groups to see whether they're so far apart that they observed differences cannot all reasonably be attributed to sampling variability The summary illustrates the parallels between what we've seen so far in ANOVA, so let's take the comparison further a bit. The t-test, in a t-test, the test statistic is calculated as a ratio of the effect size to the standard error. And in ANOVA, the test statistic is also a ratio but since there isn't a single population parameter or point estimate that we can identify. Because remember we're comparing many means. The test statistic is calculated a little differently. As the ratio of the variability between groups, over the variability within groups. Remember that a large test statistics lead to a small p-values. In size test statistics gets closer to the tails, the tail areas only gets smaller and smaller. And also remember that the p-value is small enough, we can reject the null hypothesis and conclude that the data provide convincing evidence for a difference in population means. We mention an F statistic, so let's also introduce the F distribution. It's right-skewed and always positive, since it's a ratio of two measures of variability, which can never be negative. We know that in order to be able to reject the null hypothesis, we need a small p value which requires a large F statistic. Then, obtaining a large F statistic requires that the variability between groups be much larger than the variability within groups. In the next few videos we'll get into more details about how to actually calculate this F statistic.