0:05
Let's take a look at our first experiment where we
can measure a result with an analysis of variance, and
we'll start with a common experiment that you may have even done yourself.
A website ab test.
An ab test Has visitors who come to a website and
some are exposed to one version of the site and
others are exposed to another versions hence the A and B term.
0:43
So here's the scenario we'll work with.
First we'll talk about the design considerations of this experiment.
Talk about some of the considerations when we're running the experiment,
and then we'll move as we've done before to the arcode and
show how we would analyze this experiment statistically, and report the result.
1:04
Let's say on a given day.
500 visitors to a website are treated as part of the experiment.
Perhaps the first 500 he visit the website on that designated day, and
let's say half of them are exposed.
To a website A and half of them are exposed to a variation of it website B.
1:28
Now that may not be the optimal way to run an AB test perhaps it shouldn't just be on
one day for example, and perhaps it should be more than 500 people, or
perhaps it should be a certain number of people on a given day.
All of those are good variations to consider, but for
now we're going to keep it simple and just keep it to the scenario I described.
1:58
So, maybe we think that a redesign of a web site, say version B of this site,
will have people stay on the site longer and view more pages.
So distinct pages viewed will be our measure, and
you could imagine in a real world AB test, we might also count time on site and
perhaps page loads or page views total and other types of factors like that.
Maybe even clicks and things.
So we're interested in the number of distinct pages that they view.
3:24
Dependant variables are the things that result from our manipulation, or
sometimes called our treatment, which would be the site they're exposed to.
The dependant variable is really the measure, and as I said before, we're
interested in the number of distinct pages that are viewed so we can call that pages.
4:04
Some independent variables let's say x we just have one here so
we'll call it x, but if we had more than one which we will
see later in the course, we may have x 1 and x 2 and x 3 and so
on, bY is related to X and then we have to add plus.
Which is traditionally measurement error.
4:28
The idea here, in our case would be the number of pages viewed we think
Might depend on the value that x takes.
Is x website a or b, plus measurement error.
What's measurement error?
Well this is actually a very deep issue, but you can think of it as the random,
or error, or noise, that's in the measurement's that were
taking over people, over subjects for this experiment.
4:58
You might say, why is there any measurement error?
We know how many distinct pages they visit on the website.
That's true.
In that case, we know the measurement of the page count Presumably without error,
although there could be perhaps some error in our code that's logging that, or
maybe some edge case that's not handled or something, but
that's not just what measurement error is.
Measurement error in this term is also considering the variation that naturally
takes place when we measure things.
So it doesn't have to be that we're logging it wrong.
It could be that if I measured the same person on Tuesday,
and then measured them again on Wednesday, they may in fact have a different result.
If I measured two different people, they may have a different result.
Due purely to the fact that they're different people,
not because the website really is causing that.
These errors are taken to be kind of random, and
usually normally distributed, and they are part of any experiment, any measurement.
In fact, we don't know how much air may be in a measurement.
How much variation maybe, natural variation, and
that's why we need to have an statistical power
to draw the inferences over the population that we're after.
Meaning, we want to know, is there a true difference between website A and B,
in this case, in spite of the fact that we have some error in
every single measurement, because of the so called natural variation.
6:28
Of any human behavior that we might be measuring, so that's what that term is and
it's inescapable and it's exact value, of course, is unknowable.
So, in our particular experimental case, we're looking at, as I said,
the number of distinct pages being in some relation to
The site value of the site plus this error.
Now there's something else to be said about the design of this
experiment as well, and that is that these variables each have types and
it's important to be aware of variable types.
We saw in the previous section that we were recoding the subject variable as
a factor which is R's term for a categorical or nominal variable type.
We also know that there are numeric variable types.
Also sometimes called continuous or scalar And there's even a third type called
ordinal, or ordered, which are variables that are in a sequence
7:32
that has an order like a liquard scale, like a one to seven scale or
a one to five scale or short, medium, tall, taller, tallest.
Things like that that have an order to them are called ordinal.
7:54
What's the variable type for this pages?
It is numeric or numerical or scalar or continuous, all synonyms.
I'll grab this color here and I'll make a note of that.
In our customer analysis of variance situation we'll see some analysis where
this is not the case, but most we'll see that our Y value will be numeric.
It's a numeric outcome based on certain inputs, but what are those input types?
What is the type of X here?
It's the site that can take on two values, A or B.
9:05
Okay, so those are variable types, and
we'll see that through out some of our analysis.
Now, the other terms that are relevant here, that we'll use more commonly.
We wont say independent variables, probably much beyond this moment.
We'll say factors, because certain experiments we look at in the future
will have multiple factors, and they'll be factorial designs.
That'll be later in the class.
So independent variables can also be called, let me use our other color here,
can also be called factors, and factors can take on values.
Just like site has in this case two values,
10:04
Now, there's one last consideration to take into account, and that is that these
factors can also be between subjects or within subjects.
Well, what does that mean?
10:47
So in our case each subject would experience either website a ,or website b,
but not both, and within subject's factor Is one for
which a participant experiences more than one level of the factor.
In this case it would both website A and B.
In a website A B test, when a visitor comes to a site, they're usually issued
into one or the other variations of the website and not both.
I mean, piece of local storage or a cookie or something similars put on the machine
to kind of remember which site they were exposed to.
So each time they go to the site, they get the same one.
11:29
So, that's what a between subjects and within subjects factor is and
then when we have multiple factors, we can have Ssome of them be between subjects and
some of them be within.
To be a within subject factor you only need to be exposed
to more than one level of the factor.
So if we had a, b, c and d say versions of the site,
if a participant was exposed to a and b, but
maybe not c and d, it would still be within a subjects factor.
It would be a partial within subjects factor at that point.
12:02
So these are some of the design considerations for this website AB test.
What are some things to keep in mind when we run such a test?
This is by no means comprehensive list of considerations, but
it is a few things we'd want to think about.
12:16
One question is do we measure each visitor only once?
Remember we're measuring how many distinct pages they view.
What if they come back in the same day, or what if they come back
in a time when they're still within that group of 500 that we said we wanted?
12:31
For that matter, how many visitors do we want, why 500?
Should we want more, fewer?
That kind of depends again on how big is the difference in pages visited
between these website A, and B versions.
The differences are great, we dont need, so many
subjects if the differences are smaller we may need more to tell the difference.
Is the split 50/50?
Do half the subjects get A and half get B?
You can run website A B tests of course with any arbitrary split say 90/10 or
80/20 In our case, for this data, we'll more or less do 50/50, but
it may depend on an algorithm that assigns people the conditions in
a way that could get slightly unbalanced, and so that's a consideration as well.
Is the design a balanced design or an unbalanced design.
Balanced designs have the same number of data points in every condition.
Unbalanced designs do not.
So those are some of the things to think about.
For our purposes in this particular study, we will have near a 50/50 split,
but it comes out, as we'll see, not quite exactly 50/50, and
that's okay, and we have a total of 500 visitors.
13:42
And we do measure each visitor only once.
So, we have one measure per visitor, the number of distinct pages they viewed,
either in website A or website B.
Let's go now to look at the R code, and see how we would the analysis for
this kind of experiment