Just like any other statistical inference method we've encountered so far, there are conditions that need to be met for ANOVA as well. There are three main conditions for ANOVA. The first one is independence. Within groups the sampled observations must be independent of each other, and between groups we need the groups to be independent of each other so non-paired. We also need approximate normality. So the distribution should be nearly normal within each group. And lastly, we need equal variance. That is the variability of the distribution of the response variable within each group should have roughly equal variance. We're going to discuss each of these conditions in more detail in the next few slides. Let's start with the independence condition. Within groups we want the sampled observations to be independent which we can assume to be the case if we have random sample or assignment, depending whether we have an observational study or an experiment. If each sample size is less than 10% of its respective population. If we have sampled without replacement. And this condition is always important, but sometimes it can be difficult to check if we don't have sufficient information on how the study was designed and how the data were collected. Between groups we want the groups to be independent of each other. Checking this requires some careful consideration on whether there is a paired structure between the groups. If the answer is yes, this is not the end of the world. But it requires a different and slightly more advanced version of ANOVA called repeated measures ANOVA. Imagine for example if you were taking multiple measures on the same set of people. So the ANOVA we learned in this course will only work in circumstances where the groups are independent. We also need the distribution of the response variable within each group to be approximately normal. And this condition is especially important when the sample sizes are small, sadly it's also difficult to check when the sample sizes are small. We can visually check this condition using normal probability plots. We can see that the lower and upper class plots are a little difficult to read because of the lower sample sizes and then the middle class group we have quite a bit of divergence from normality in the lower tail, so this condition may not necessarily be met. Lastly, we need constant variance across groups. In other words, variability should be consistent across each of our groups. A commonly used term for this is homoscedastic groups. The condition is especially important when the sample sizes actually differ between the groups. Side by side box plots in summary statistics are useful for checking this condition, and here it seems like the variability is consistent across lower, working and middle classes, but it's much higher for the upper class group.