0:06
Hi, welcome back, Caren Stalburg, here.
We've been talking in this unit about assessments.
And so when we create assessments,
we also need to think about how are we gonna score them and what do they mean.
So, today, we're gonna talk about how to create scoring rubrics.
And then, I'd like you to also become a little bit familiar with the techniques
that most individuals use when setting standards for exams.
So let's make sure we're all talking about the same things.
When I"m talking about score,
what I mean is the actual performance of the individual on the actual assessment.
When I"m talking about standard, or standard setting,
what I mean is the acceptable score to indicate the desired level of performance.
1:21
If we're talking about a novice learner when they're taking a history,
that individual will obtain the most relevant points in the history, but
they may miss some key components.
Now, you can create a nine point score, you can create an eight point score.
Really, it depends whatever you feel is an appropriate
scoring sort of domain or scoring dimension.
In this case it would allow the evaluator sort of three areas to say,
yes this person is novice.
But they're approaching intermediate, which means that they got most of
the relevant history and most of the key elements but not all of it.
Or they were actually really competent and
included all of the relevant components of a patient history.
2:32
And we may say okay, if you get a 3, that that's a standard passing score.
So everything above a 3 is going to be acceptable for passing.
And it may provide gradations of excellence, which we may or
may not use to provide sort of higher grades, like A, B, C.
Or, we may say this is the cutoff, which is where under here,
you did not display the skill appropriately.
So if you were difficult to hear, or had poorly positioned audio equipment.
Or if the audience was completely gone from your talk, then you actually did not
fulfill the sort of necessary performance to say that you checked off on that skill.
4:59
And, you also want to make sure that you understand the learners or
the group being tested.
So, in my EKG example, if we were looking at first year emergency medical
technician students, whether or not they could identify a right bundle or
a left bundle branch block may not be an appropriate standard to hold them to.
However, all of them, we would argue, perhaps,
should be able to recognize and acute MI.
So where do we set those cut-off scores?
So again, the cut-off score is the place or the number below which
the performance of the individual is sort of deemed unacceptable.
Now remember, where you place your cutoff score can have really significant
ramifications for you, in terms of how many people are you failing.
And how many people have to go through the examination again, and
what is the efficiency issue there?
What is the cost?
But more importantly, the cutoff score can have significant ramifications for
the individual.
Because, if your cutoff score is too high, you're gonna have too many people failing.
And perhaps that's not really an appropriate cut off score, or
what you want for individuals.
And if it's too low, then you run the risk of saying that people are able to do
things that they actually really aren't capable of doing.
So, when people talk about cut-off scores, and
our psychometrician colleagues are looking at standards.
We sort of talk about relative standards where they're norm-referenced, or
based on the performance of a group of individuals who take the assessment.
So we may say that the exam mean, whatever that is, is set at a C grade.
And whoever is in the bottom 10th percentile will fail.
So that translates into if the mean is 60%,
people will still pass because it's a relative standard.
It may be that the exam was very tough.
It may be that the individuals taking the exam were novices.
And so this was their first go around and the performance was low.
But it's a relative standard,
as opposed to an absolute standard which is criterion based.
And so it's going to be independent of the group's performance.
If a learner gets 70% of the exam questions correct, then that demonstrates
they have mastered enough of the material to have an adequate performance.
And if nobody in the class scores 70%, then no one passes.
You don't necessarily adjust based on the group's performance.
So, there are actually sort of formal ways to to this,
and some of us are familiar with those.
But I just want to mention them because sometimes people use this language to help
understand how do you set the standard.
So there's two methods.
We're going to talk about, one is the Angoff method and
the other is the Hofstee method.
So first the Angoff method is a test-centered method for
setting your standard.
10:11
And the absolute highest fail rate we could tolerate would be about 15%.
Then we would also look at the number of items that we think should be correct.
So, the absolute highest number that would be an acceptable performance
would be 70 questions, and the absolute lowest would be 60.
So you come up with this box of sort of where's our range of where the cut point
should be?
And then you actually plot that against the student performance curve.
And where those two intersect is basically your pass-fail cut point.
So you could say from this graph that
our panel of experts said that we don't want less than 2 to 3% failing,
and we don't want more that 15% of our students failing.
And we think that the items correct in a 100-point exam is somewhere between 60 and
70.
And if we look at how our learners actually performed, the place where that
sort of cuts is at the 66 to 67% rate.
And then that means we may have about 10 to 15% of the students failing.
And we have to decide as a group and
as a panel of experts if that's an okay number to pick up.
To say, when we administer this exam,
we know there's going to be 15% who may fall below that cut point.
11:33
So again,
you may have unique ramifications from where you set your cut point.
You actually may pass too many students and
miss a student who shouldn't have been passed.
There's always the practical aspects of administering a test.
Some people would say 15% is way too high.
I can't redo that standardized patient performance, or
I can't re-administer a test to 15% of 20,000 people.
And then remember that the cut-point for
the Hofstee is gonna depend on sort of the rectangle and
the curve of student performance, and they're actually gonna have to intersect.
So sometimes the way that people perform on the exam is not
what your panel of experts actually wanted them to or predicted them to.
And so, you may have to shift or you may have to go back and
say, is this really measuring what we think we're measuring?
Is this exam correct?
What else do we need to do?
12:29
So standard setting is very, very, difficult.
It certainly will not be clear to you,
or crystal clear anyway, in sort of this short video.
My intent here is to make you aware of the different options.
And to just raise your awareness about things to consider when we're creating
an assessments around intentional instructional methods.
Remember that the higher the stakes,
the more intentional your process of standard setting needs to be.
Your standards, no matter what you decide, need to be defensible.
They need to make sense.
They need to be reproducible and relative to the context that you're in.
And depending on how things evolve,
your standard may need to be revisited or changed over time.