M&M Candy comes in different colors. The company that produces M&Ms, actually put the percentages of the various colors on their website. But in 2008, they stopped publishing those data. Here are the last data that were published in 2008. It shows the percentages for each of the six colors, blue, orange, green, yellow, red and brown. So the question is, whether that's still the distribution of colors today. A group of students opened a bag of milk chocolate M&Ms and counted the colors. And here's the table of the counts which they got. So, we would like to answer the question whether these counts are consistent with the last published percentages. Or, whether there are sufficient evidence to claim that the color distribution is now different. This type of problem is called, a goodness-of-fit test. The question here is, whether the observed data fit the published distribution. As always when testing, we have to think what the null hypothesis is. Remember, the null hypothesis means that nothing special is going on. And in that case, that would mean that the color distribution is still the same that it was in 2008. The alternative hypothesis would be that the color distribution is different. The idea of a goodness-of-fit test is, to compare the observed counts to the numbers one would expect if the null hypothesis were true. In order to do that, we need to figure out what the expected number under the null hypothesis would be. Now, that's a very simple calculation. We have 410 M&Ms in our bag, so under the null hypothesis, we expect 24%, which is the percentage given in the table, times these 410 counts which is 98.4 blue M&Ms. So, now do this for all the other five colors, and then we have a table for expected counts. And remember, we also had a table for the observed counts. So, next we have to compare these two rows of observed and expected counts, and see whether those numbers are compatible with each other. The way we do that is, we look at the difference between observed and expected counts, then we square it to make it non-negative and then we standardize it. And it turns out that, in order to standardize, here we have to divide by the expected counts. We do that for each category and we sum up these terms of all categories. So, for example, for the blue M&Ms we observed 85, we expected 98.4, and so, we take the difference of these two, square it and divide by expected. We do this for all six colors and the statistic, which is called a chi-square statistic, is then 8.57. If the value of that chi-square statistic is large, that means there's a big discrepancy between observed and expected counts. And that's evidence against the null hypothesis. On the other hand, if the chi-square statistic is small, that means that the observed and expected counts are close to each other. Next, we compute a P-value, that means we have to assess how large that 8.57 is when compared to an appropriate null distribution. In this case, it turns out the null distribution is called the chi-square distribution. Which comes with the degrees of freedom, and the degrees of freedom is the number of categories minus one. So, in this case we have six colors, so the degrees of freedom is five. On the right-hand side, you see the curve for the chi-square distribution with five degrees of freedom. And our value of the chi-square statistic is 8.57, so, the P-value is the area to the right of that, which turns out to be 12.7%. So, the conclusion here is that, there's not sufficient evidence to reject the null hypothesis. The assumption of the chi-square test is that the observed data are counts that were obtained by drawing independently from a population. It's plausible that filling a bag of M&Ms roughly corresponds to this process. It's an interesting fact, the blue M&Ms were not introduced until 1995. In the beginning, some people suspected that there were fewer blue M&Ms in the bag than advertised. Of course we can also test that. This is a case which we looked at before, because if we are just interested in one category, then we can simply use a z-test. So, the chi-square test is in a sense a generalization of the z-test, if you want to examine several categories.