I'm going to show you how to use data
analysis and the key tasks for two samples.
First, to do the hypothesis testing as well as how to come up with
the confidence interval for the difference we see between two population.
The manager of a store would like to know if the average sales measured in dollars
through her websites are any different than her in store sales.
The data has been collected and
we want to do an analysis of 5% level of significance.
So first of all all she's wondering is that is there a difference between online
and in stores?
So a non hypothesis is that there is not difference between the two, right?
This is their online sales is the same thing as the store sale and
the alternate would be that no, they're not the same.
So this is what we're checking, and again, what I said was that if you write it like
this might be easier for you to see what we're doing so
now we're going to say that the difference between the two, so
if I have to take the two mus, I am saying that these are equal to zero.
Or it is not equal to zero so this is the online and
this is the store, my handwriting isn't so good so I'm not going to follow it.
Now let's go look at our data and do the analysis.
So here's our data, these are dollars that has been recorded at end of day for
several days that we have taken the observations from online and in-store.
So first thing you want to do is go to Data > Data Analysis > Select t-Test,
again, Unequal Variances.
I will assume that, I don't want to simplify it.
So then I say OK.
And it will ask for range one, and I have shown you that you can click here and
do Ctrl+Shift+Down.
You could actually do this quicker by just selecting the column.
This will work in this type of analysis.
If it works only in few places,
not all the time you can use this kind of a selection.
But here you can and that will speed up the selection process.
So then the second variables appear in the B column.
I assume that there is no differences.
I have no preconceived idea.
I'm going to say that we have labels.
I'm going to select the output range.
Click here, click somewhere here, and then say OK.
And here we go, so once again I know that
I am doing a two-tail test because it was equal versus and inequality.
So what I need to do is that I just need to focus
on the two-tail part of this output.
And that would be right here.
So now looking at the p value, the p value is less than 0.05 so
I will end up rejecting the null hypothesis which was the two are the same.
So now that I know that they're different, I want to know what is the expected
difference between online versus the store sales.
So now I can go ahead and use the same output to develop the confidence
interval for the difference that exist between the two channels of sales.
So to develop the confidence interval of the difference between the two channels of
sales.
These are the mean differences between the two.
So let me highlight this in red.
But this is not enough so I cannot just take the difference between these two and
say this is what I expected difference to be.
There is a margin of error.
There is some variability from sample to sample.
So what we want to do is develop that entire difference.
So right now I can look at the mean differences between the two and
say this is x bar 1 minus x bar 2.
So I can develop this by saying this minus this value.
And by the way, it doesn't matter which one you do first, your
signs will change but the absolute value of the differences are still the same and
that's what we care about.
With either with say right now it looks like
mean sales in online is $105 less than what is in store.
If you did it the other way around,
this positive number would say that in store sales is $105 more than online.
So there's really no difference.
So going back to the formula that we have for the confidence interval of
the differences, the next thing that I need to find out is what is my t-value.
Now the confidence interval is always a two-tail.
Margin of error can occur plus or minus.
So the two tail here is 1.99.
That's my t-value so again, looking at our formula, the next thing that I
need in order for me to come up with the margin of error is the standard error.
And the standard error based on our equation is just
the square root of the variance of the first one divided
by its sample and in this case was 52 plus the second
sample's variance divided by its sample size.
And once I have the two I can use the square root and find the standard error.
Now I'm ready to calculate the margin of error.
Margin of error is simply your t value times the standard error.
And now I'm ready to calculate the lower bound and
the upper bound of the differences between these two channels.
The lower bound Is this value minus the margin of error and
upper bound is this value plus.
So essentially what they are saying is that you are 95% confident that
the difference that exist between sales online versus sales in store is that
the online channel can send an average about $18.80 per day up to $192.65 per day
less than what we have as sales for our in store.
Now as I have said before it is showing that is a significant difference,
a statistically significant difference between the two.
But in practice that could be as little as $18 difference between the two.
So as a manager you may decided that this is not important to you,
it's not enough differences.
So statistically significant does not mean that you have
found something meaningful at times.
Now if this number was thousands of dollars difference, then nobody would
argue that even in practice you would consider that as an important issue.
What is the threshold?
Where you as a businessperson will decide this is significant enough,
really depends on domain and the decisions that has go along with that domain.
So there is no standard answer for that.
This is where your expertise come into play and
it's not just about the statistical values that you're seeing.