Let's visualize this idea of variability partitioning.

Suppose the circle represents the total variability in vocabulary scores.

We partition the variability in to two

variability that can be attributed to differences in social class and

variability attributed to all other factors.

Variability attributed to social class is called the between group variability

since social class is the grouping variable in our analysis, and

the other portion of the variability is what we're not interested in.

And in fact it's somewhat of a nuisance factor for us.

Since, if everyone within a social certain class scored the same,

then we would have no variability attributed to other factors.

This portion of the variability is called our within group variability.

Here's a look at the anova output.

The first row is about between group variability, and

the second row is the within group variability.

We often refer to the first row as the group row, and

the second row as the error row, the third row displays the totals.

Next, we're going to go through all of the values n this table.

How they're calculated, and what they mean?

Let's start with the column of sum of squares.

The last value in this column is sum of squares total,

commonly referred to a as SST.

This value measures the total variability in the response variable.

In this case, that would be the variability of vocabulary scores.

This value is calculated very similarly to variance

except that it is not scaled by the sample size.

More specifically, this is calculated as the square

deviation from the mean of the response variable.

We have 795 observations in our dataset.

On the mean vocabulary, score is 6.14.

So to calculate SST, we take each individual score and subtract 6.14 from

it, square the difference, and finally add up all the values.

For example, the first is 6, so that's 6- 6.14 squared.

The next one is 9, that's 9- 6.14 squared.

Third one is also 6, so on and so forth and

we add up all of the values to get to the total sum of squares of 3,106.36.

This value represents the total variability in the response variable.

But what we're really interested in is how this variability is partitioned into

between and within group variabilities.

As an aside we can see that this is an awfully tedious calculation to do by hand.

And hands for a no, we usually rely on software to do the calculations for us.

So the calculations we're going to present in this video are for

illustrative purposes and for introducing the concepts.

But you'll likely never have to calculate these by had.

You still need to understand what they mean so

that you can interpret your analysis though.

Next, let's talk about the sum of squares group, SSG.

This value measures the variability between groups and

can be thought of as the variability in the response variable

explained by explanatory variable in the analysis.

It's calculated as the deviation from group means from the overall mean

weighted by their sample sizes.

So more specifically for each group we calculate it's mean,

that's y bar j subtract the grand mean from it,

y bar square this value and multiply it for the sample size for that group.

We do this for each of the groups, and sum them up.

Here's a summary table that's going to help us.

The lower class group has a mean of 5.07 we subtract from that grand

mean of 6.14 square that value, multiply it by the sample size for the group of 41.

We do the same thing for for all of our groups and arrive at the sum of square's

group of 230.56, which on its own is not a meaningful number but it's

interesting how it compares to the total sum of squares we calculated earlier.

For example, this value is roughly 7.6% of SST.

Meaning that 7.6% of the variability in vocabulary scores

is explained by social class and the remainder is not

explained by the explanatory variable we're considering in this analysis.

This is a low percentage which I think would make sense because we would expect

vocabulary scores to be associated with, more with education or

how much people read.

The last value here is sum of squares, SSE and

it measures the variability within groups.

In other words, this is the unexplained variability and

it's the variability due to all the other variables.

The simplest way of calculating this is simply as

the difference between SST and SSG.

Now we need a way to get from the sum of squares measures to the mean square

values.

To do so we need to scale the sum of square values by values that incorporate

sample size as well as the number of groups, namely the degrees of freedom.

So next let's focus on that group.

Total degrees of freedom is calculated as sample size minus 1, 794.

Group degrees of freedom is calculated as number of groups minus 1, 3.

And the error degrees of freedom is simply the difference between these two 791.

Next stop is the mean squares column, which measures the average variability

between and within groups and is calculated as the sum of squares for

that component divided by degrees of freedom.

So we can calculate that by doing the divisions, and

we're going to next use these values for calculating our F score,

because you remember that our F statistic is the ratio of the average between and

within group variabilities.

In other words, it's MSG divided by MSE.