Another category of threats to internal validity is
associated with the instruments that are used to measure and
manipulate the constructs in our hypothesis.
The threats of low construct validity, instrumentation, and
testing fall into this category.
I'll start with low construct validity.
Construct validity is low if our instruments contain a systematic bias or
measure another construct or property entirely.
In this case, there's not much point in further considering the internal validity
of a study.
As I discussed in an earlier video, construct validity is a prerequisite for
internal validity.
If our measurement instruments or
manipulation methods are of poor quality, then we
can't infer anything about the relation between the hypothesized constructs.
Suppose I hypothesized that loneliness causes depression.
I attempt to lower loneliness of elderly people in a retirement home by
giving them a cat to take care of, expecting their depression to go down.
Suppose that taking care of the cat didn't affect loneliness at all, but
instead gave residents a higher social status.
They're special because they're allowed to have a cat.
The manipulation aimed at lowering loneliness, in fact,
changed a different property, social status.
Now consider my measurement of depression.
What if the questionnaire that I used actually measured feeling
socially accepted instead of feeling depressed?
If we accidentally manipulated social status or
measured feeling accepted, then we cannot conclude anything about
the relation between loneliness and depression.
The second threat that relates to measurement methods is instrumentation.
The threat of instrumentation occurs when an instrument is
changed during the course of the study.
Suppose I use a self-report questionnaire to measure depression at the start of
the study, but I switch to a different questionnaire, or
maybe to an open interview at the end of the study.
Well then, any difference in depression scores might be explained by the use of
different instruments.
For example, the scores on the post test at the end of the study could be
lower because the new instrument measures slightly different aspects of depression,
for example.
Of course, it seems rather stupid to change your methods or instruments half
way, but sometimes a researcher has to depend on others for data.
For example, when using tests that are constructed by
national testing agencies or polling agencies.
A good example is the use of the standardized diagnostic tool
called the DSM.
This Diagnostic and Statistical Manual is used to
classify things like mental illness, depression, autism.
And it's updated every 10 to 15 years.
Now you can imagine the problems this can cause, for example, for
a researcher who is doing a long-term study on schizophrenia.
In the recently updated DSM,
several subtypes of schizophrenia are no longer recognized.
Now if we see a decline in schizophrenia in the coming years,
is this a real effect or is it due to the change in measurement system?
The last threat I want to discuss here is testing,
also referred to as sensitization.
Administering a test or measurement procedure can affect people's behavior.
A testing threat occurs if this sensitizing effect of
measuring provides an alternative explanation for our results.
For example, taking the depression pre-test at the start of
the study might alert people to their feelings of depression.
This might cause them to be more proactive about improving their emotional state, for
example, by being more social.
Of course this threat can be eliminated by introducing a control group.
Both groups will be similarly affected by the testing effect.
Their depression scores will both go down, but hopefully more so
in the cat companionship group.
Adding a control group is not always enough, though.
In some cases, there's a risk that the pre-test sensitizes people in
the control group differently than people in the experimental group.
For example, in our cat study, the pre-test in combination with getting a cat
could alert people in the experimental cat group to the purpose of the study.
They might report lowered depression on the post-test, not because they're less
depressed, but to ensure the study seemed successful so they can keep the cat.
Suppose people in the control group don't figure out the purpose of the study,
because they don't get a cat.
They're not sensitized and not motivated to change their scores on the post-test.
This difference in sensitization provides an alternative explanation.
One solution is to add an experimental and
control group that weren't given a pre-test.
I'll discuss this solution in more detail when we
consider different experimental designs.
Okay. So to summarize.
Internal validity can be threatened by low construct validity,
instrumentation and testing.
Low construct validity and instrumentation can be eliminated by
using valid instruments, and valid manipulation methods.
And of course by using them consistently.
Testing can be eliminated by using a special design that includes groups that
are exposed to a pretest, and groups that aren't.