0:25

One of them might be whether or not the student completed

all components of the course leading up to the final exam, such as the videos,

the quizzes, the midterm, the lab, so on and so forth.

However, there will certainly be other factors as well.

Familiarity with the material beforehand,

number of hours per week put into the course, so on and so forth.

Suppose we're interested in studying how strongly completing

all components leading up to the final exam associated exam scores.

To study this, we would partition the total variability in exam scores

as variability due to this variable, and variability due to all other factors.

We're going to build up on this idea of variability partitioning and

the aesthetics we introduced earlier to work through our way of the analysis

of variance output.

Let's quickly remind ourselves of the data we're working with.

From the general social survey we had the vocabulary scores,

the numerical variable, social class, a categorical variable.

We have our summary statistics at the group level

as well as at the overall level.

Our null hypothesis is that the average vocabulary score is the same across all

social classes.

And the alternative hypothesis says that the average scores differ for

at least a pair of social classes.

1:43

Let's visualize this idea of variability partitioning.

Suppose the circle represents the total variability in vocabulary scores.

We partition the variability in to two

variability that can be attributed to differences in social class and

variability attributed to all other factors.

Variability attributed to social class is called the between group variability

since social class is the grouping variable in our analysis, and

the other portion of the variability is what we're not interested in.

And in fact it's somewhat of a nuisance factor for us.

Since, if everyone within a social certain class scored the same,

then we would have no variability attributed to other factors.

This portion of the variability is called our within group variability.

Here's a look at the anova output.

The first row is about between group variability, and

the second row is the within group variability.

We often refer to the first row as the group row, and

the second row as the error row, the third row displays the totals.

Next, we're going to go through all of the values n this table.

How they're calculated, and what they mean?

Let's start with the column of sum of squares.

The last value in this column is sum of squares total,

commonly referred to a as SST.

This value measures the total variability in the response variable.

In this case, that would be the variability of vocabulary scores.

This value is calculated very similarly to variance

except that it is not scaled by the sample size.

More specifically, this is calculated as the square

deviation from the mean of the response variable.

We have 795 observations in our dataset.

On the mean vocabulary, score is 6.14.

So to calculate SST, we take each individual score and subtract 6.14 from

it, square the difference, and finally add up all the values.

For example, the first is 6, so that's 6- 6.14 squared.

The next one is 9, that's 9- 6.14 squared.

Third one is also 6, so on and so forth and

we add up all of the values to get to the total sum of squares of 3,106.36.

This value represents the total variability in the response variable.

But what we're really interested in is how this variability is partitioned into

between and within group variabilities.

As an aside we can see that this is an awfully tedious calculation to do by hand.

And hands for a no, we usually rely on software to do the calculations for us.

So the calculations we're going to present in this video are for

illustrative purposes and for introducing the concepts.

But you'll likely never have to calculate these by had.

You still need to understand what they mean so

that you can interpret your analysis though.

Next, let's talk about the sum of squares group, SSG.

This value measures the variability between groups and

can be thought of as the variability in the response variable

explained by explanatory variable in the analysis.

It's calculated as the deviation from group means from the overall mean

weighted by their sample sizes.

So more specifically for each group we calculate it's mean,

that's y bar j subtract the grand mean from it,

y bar square this value and multiply it for the sample size for that group.

We do this for each of the groups, and sum them up.

Here's a summary table that's going to help us.

The lower class group has a mean of 5.07 we subtract from that grand

mean of 6.14 square that value, multiply it by the sample size for the group of 41.

We do the same thing for for all of our groups and arrive at the sum of square's

group of 230.56, which on its own is not a meaningful number but it's

interesting how it compares to the total sum of squares we calculated earlier.

For example, this value is roughly 7.6% of SST.

Meaning that 7.6% of the variability in vocabulary scores

is explained by social class and the remainder is not

explained by the explanatory variable we're considering in this analysis.

This is a low percentage which I think would make sense because we would expect

vocabulary scores to be associated with, more with education or

how much people read.

The last value here is sum of squares, SSE and

it measures the variability within groups.

In other words, this is the unexplained variability and

it's the variability due to all the other variables.

The simplest way of calculating this is simply as

the difference between SST and SSG.

Now we need a way to get from the sum of squares measures to the mean square

values.

To do so we need to scale the sum of square values by values that incorporate

sample size as well as the number of groups, namely the degrees of freedom.

So next let's focus on that group.

Total degrees of freedom is calculated as sample size minus 1, 794.

Group degrees of freedom is calculated as number of groups minus 1, 3.

And the error degrees of freedom is simply the difference between these two 791.

Next stop is the mean squares column, which measures the average variability

between and within groups and is calculated as the sum of squares for

that component divided by degrees of freedom.

So we can calculate that by doing the divisions, and

we're going to next use these values for calculating our F score,

because you remember that our F statistic is the ratio of the average between and

within group variabilities.

In other words, it's MSG divided by MSE.

7:33

Once you have your F score you're finally ready to find your p-value and

conclude the hypothesis test.

The p-value in this context is the probability of at least as large a ratio

between the between and

within group variabilities, if in fact the means of all groups are equal.

This is just another way of saying p-value is the probability of observed or

more extreme outcome given the null hypothesis is true.

And it can be calculated as the area under the F distribution.

And the F statistic has two degrees of freedom, degrees of freedom group and

degrees of freedom error.

So the p-value shown on the anova table is the tail area under the F

distribution with three and 791 degrees of freedom, which is tiny.

Note that even though we're looking for differences,

we only consider the upward tail of the F distribution.

This is because the F statistic can never be negative.

Think about it, it's the ratio of two measures of variability

that can't ever be negative either.

Since the F statistic is always positive,

a more extreme statistic will always be more extreme in the positive direction.

Even though the anova table always reports the p-value, if you wanted to do so,

you could directly calculate it in R using the PF function.

This function takes the observe F score as one of its arguments as

well as the degrees of freedom.

And we need just note that we don't want the lower tail and

get this tiny P value which is in Indeed, less than 0.0001.

Now it's finally time to make a conclusion.

If the p-value is small, we reject the null hypothesis and

say that we have sufficient evidence for the alternative.

If the p-value is large, we fail to reject the null hypothesis and

conclude that the data do not provide convincing evidence that at least one pair

of population means are different from each other, the observed differences in

sample means are then attributable to sampling variability or chance.

In this case, we had a pretty tiny p-value.

So what's going to be our conclusion?