We've covered a lot of information, so here's a quick summary of analysis of variance. First, an analysis of variance is used when we have a categorical explanatory variable and a quantitative response variable. And we want to examine differences in the mean response variable for each category of our explanatory variable. The null hypothesis is that there's no relationship between the explanatory and response variable. In other words, that all means are equal. The alternate hypothesis is that not all means are equal. The F statistic is calculated by comparing the variation among sample means to the variation within groups. If the variation among sample means wins out, the p-value will be less than or equal to 0.05, and we have a significant finding. This would allow us to reject the null hypothesis, and say that the explanatory and response variables are associated. The model Python syntax for conducting an analysis of variance with OLS is the following, where I name my model, include the equals sign and the OLS function from the statsmodels.formula API library, which I've imported as smf. Within parentheses I then write my formula, including the name of my quantitative response variable followed by a tilde and then the name of my categorical explanatory variable. I will also need to indicate to Python that this is a categorical variable by adding a capital C and putting the variable name within parenthesis. If your explanatory variable has more than two levels or groups, you'll also need to conduct a post hoc test. To conduct a tukey, honest significant different test, you include the following syntax in your program. Remember that to get accurate calculations. You need to include in your data frame only those observations with valid data for both your explanatory and response variable. To do this, be sure to include the dropna function when setting your data frame prior to running the OLS model, and the tukeyhsd post-hoc comparisons. >> Now you're ready to test a categorical to quantitative relationship. If your own research question does not include these types of variables, you might want to test the procedure with variables from your data set that do require an ANOVA. For example, you could look at mean age differences according to any categorical variable in the newserc, or treating the grade-level variable and add health as quantitative. You could look at mean differences in grade level, again, by any categorical variable that you choose. For both Gapminder and the Mars crater data, there are many quantitative variables, so you might choose to categorize one of them for inclusion in an anova. Whatever types of variables you have, you'll be able to test the association with the right tool. [MUSIC]