But what if my categorical, explanatory variable has more than two groups? In this example, I'm going to examine the association between ethnicity and smoking quantity. The explanatory variable, ethnicity, actually has five levels or groups. One is white. Two is black. Three is American Indian, Alaskan native. Four, Asian native, Hawaiian Pacific Islander. And five, Hispanic or Latino. So by running an analysis of variance, we're asking whether the number of cigarettes smoked, differs for different ethnic groups. Since we're only changing the categorical explanatory variable, that's the only part of the syntax that we'll need to change. So rather than including MAJORDEPLIFE in the CLASS statement. We're going to include ETHRACE2A. We're going to include it here again in the MODEL statement, and then here again in the MEANS statement. Here are the PROC ANOVA results for a greater than two level explanatory variable. This time the F statistic is 24.4 with an associated P value of less then .0001. Well this tells me that I can safely reject an old hypothesis, and say that is an association between ethnicity and smoking quantity. But all I know at this point is that not all the means are equal. I could eyeball each mean in the means table and make a guess as to which pairs are significantly different from one another. For example the ethnic group with the lowest mean number of cigarettes smoked per month is ethnic group five, hispanic or latino. And the group with the highest cigarettes smoked per month is ethnic group one, white. The F-test and the p-value do not provide insight into why the null hypothesis can be rejected, because there are multiple levels to my categorical explanatory variable. They do not tell us in what way the population means are not statistically equal. >> Note that there are many ways for population means not to be all equal. Having each of them not equal to the other is just one of them. Another way could be that only two of the populations are not equal to one another.