This is a five-section course as part of a two-course sequence in Research Methods in Psychology. This course deals with experimental methods whereas the other course dealt with descriptive methods.

Loading...

来自 Georgia Institute of Technology 的课程

Experimental Research Methods in Psychology

个评分

This is a five-section course as part of a two-course sequence in Research Methods in Psychology. This course deals with experimental methods whereas the other course dealt with descriptive methods.

从本节课中

Evaluating Causal Claims

- Dr. Anderson D. SmithRegents’ Professor Emeritus

School of Psychology

Hello. Anderson Smith here. We are talking about ways that we can evaluate

a causal claim that there is a difference in

the independent variable that is causing a difference in the dependent variable.

And often we have to use statistics to do that and we use

inferential statistics to tell

us whether the manipulation of

the independent variable(s) significantly affect(s) the dependent variable(s).

So the statistical procedures that we can use is

if we get a difference whether that difference is a real difference or not.

So we have a difference between the effect of

one level of the independent variable and the other level of the independent variable.

So we want to know is there really a difference between those two means,

and is that difference significant?

That is the difference are not the same, they are unequal.

But if we see that difference it might be

that in fact the difference that we are observing is simply due to chance.

That is the two means are really the same

significantly not found a difference, they are the same.

And that is often called the null hypothesis.

So if we want to find out if it is significant or is due to chance.

We often use a standard of 5%,

that is if it is significant,

the probability of getting that difference is < 5%.

And if it is due to chance the probability of getting that difference is > 5%.

That is an arbitrary standard,

that it is a standard that we use.

So let's say we have a difference where the difference is > 5%.

That could mean there are no difference,

that is a null hypothesis. They are equal.

The difference that we are observing is not a significant difference,

that could mean that but it could also mean that there is a difference,

but in fact the means are not the

same but we can't find it because we don't have a powerful enough experiment.

This is why testing in null hypothesis is very

difficult because we have null hypothesis because there really is no difference.

Others are different so we just can't detect it because have a sloppy experiment or

we don't have enough power in a statistical procedure to really see the difference.

Power: Probability of detecting a difference really depends upon sort of

the expected relationship whether we expect to have

a large difference or a small difference and that means the sample size.

We expect to see a small difference.

We got to have a much larger sample to see the difference.

If we have a big difference then we can have a smaller sample to see the difference.

So the number of people that we test really determine the power that we have in

detecting the differences of getting that p < 5%.

So we have two kinds of error;

we have type 2 error,

which is the probability of accepting the null hypothesis,

that is there is no difference when the difference is really present.

And remember, the power that we expect to have is

usually the standard is just like the p < 5%.

The power needs to be about 0.80 or higher,

and again that is a statistical finding which tells us the sample size

and expected difference to get

what the power of the experiment is detecting the difference.

So to achieve a power of 80, for example,

if we expected the difference to be large then we need about 52 subjects.

If we expect it to be a sort of a medium difference,

we need 128 subjects and if we really think it is a small difference

but a meaningful one then we have to have 788 subjects.

When we look it up in tables from the textbook,

in fact these sort of numbers actually come from one of your readings.

It is just another way of saying the p that we get,

is it powerful enough to really detect the difference?

That is we are trying to do in making causal claims.

We want to say we believe that

this independent verbal manipulation is causing this difference of

the dependent variable and that result really depends upon the power of the test.

Let's talk about that significance test,

the test that tells us whether the p < 5% or > 5%.

Is it significant or is it due to chance?

Well, if we only have two means that we are comparing,

a very simple experiment just one variable and

two means we can use this statistic called the t-test which simply

tells us whether or not the difference between the two means based on

the variability found in the experiment are significantly different or not,

less than p < 0.5.

If we have more than two means so we are testing maybe

more than one variable or more than one mean then would use Analysis of Variance.

Again it is a test that we use when we have more than

just one comparison to tell us whether there

is a difference among any of the means in the experiment.

The t Statistic: When we have several comparisons of two means,

that is the independent sample t-test

and we are just comparing one mean to the other

and we have to also understand is it one-tail test or two-tail test?

We have two means.

Now, is it possible that this mean is higher than the other mean?

And that is the only direction that can occur.

Or is it possible that the test can be higher or be lower?

That is a two-tail test.

So we want to know whether it is significantly lower or significantly

higher in the same test that is two-tail.

If we know the difference has to be in one way,

that is a one-tail test.

So, what do we need to do to have this kind of test?

We need two sample means,

we need to have an estimates of variance or two standard deviations and we need to

have a sample size which is determined by how big do we expect the difference to be.

So a t-test is really this formula.

It's the difference between the two means divided by the variance which is the square of

the standard deviation divided by

the number of subjects we have in

the square root because it is variance standard deviation,

a squared, so we take a square root of that and that gives us the t which is the test.

And when you look that up in the statistics table and

tell us whether that t with that degrees of

freedom that in size really is powerful enough to give us a significant difference.

Let's use an example. This is an example again

from the texts how psychologists want to know if

calorie estimation for people that eat

junk food is different from people to eat non-junk food, healthy food.

Is the calorie estimation different?

Are we good at estimating the number of calories?

So here is the results from the junk food eaters, eight of them,

they guessed that the food that is in front of me in a picture is 180 calories,

220 calories, 150, 85, 200.

So different estimates, different guesstimates I should say

made by the people that eat junk food of

the number of calories in that food group that is shown in the picture.

The non-junk food eaters have this estimates,

we only have seven of those.

So within the t-tests,

we take the difference of

the two means and we divide it by the standard deviation squared,

divided by the n,

n of 8, n of 7.

Then we look up the t-test that result is 2.42,

then we can look that up in a table that you can

find on the internet a t-test calculator or you can find it in

any statistics textbook and you look

up t-score of 2.42 that is the t score of that comparison.

Degrees of freedom it is 1 minus the degrees of freedom so (8-1) + (7-1)=13.

And we know it is a two-tail test because we believe that

the junk food is a going to estimate more calories or estimate

less calories and we get a p=0.0306 and that is less than 0.05.

So our different between the two means is significant, there is a difference,

they are not equal and we can talk about that

there is a causal relationship between junk food eaters,

independent variable and non-junk eaters and

their estimation of calories of dependent variable.

If we have more than two means,

we have to use an Analysis of Variance which is a much more complicated statistic.

And in fact, even though it is used when we had more than two means it

really is based on lots of different designs.

What we are testing is that the null hypothesis where all the means are

equal and expectation is going to be differences in the means,

that all the means are not equal.

And that test unlike the t-test is called The F distribution so is an F-test.

So the statistic we are looking for is an F. Then we can look at F value given

the degrees of freedom up in a table or on the internet and come up with a p,

the F statistics that is significant at p < 0.05.

As I said there are different tests for different kinds of designs.

But basically what you are doing in

all Analysis of Variance designs is you are

getting a computation of the variance between groups,

the variability between the groups that you are studying and the variance within groups

that is within a single group and then you

comparing that as a ratio and that gives you the F statistic.

I am not going into details about the statistics,

this is not a statistics course but I want to point

out that they are statistics that tell us that there are differences,

influential statistics there are differences among the means.

And just like t,

F tables can be used to determine the p for that particular experiment.

As I mentioned, there are many different Analysis of Variance

used to repeated measures design,

used for factorial designs but they all have

that same common way of looking at significant differences.

Analysis of Variance will only tell us that the means are different.

It won't tell us what means are different,

what other means and so we have to use

something called multiple comparison tests or post-hoc tests,

which didn't tell us any individual mean Analysis of Variance is different from

any other individual mean and that is often what we

have to do when we have multiple variables for example.

So, statistics used to analyze the design.

The data analysis are used to come up with

whether or not the inferences that we are making

about relationships between variables are significant.

And then the interpretations and conclusions are based on this analysis,

A is better than B,

that means we have done a test to show that in fact

the probability of getting A better than B in our experiment is < 5%.

In the next fed back into the research literature which then allows us to

increase body of knowledge about the relationship that we are interested in research.