Adjusting Test
Scores After Finding Some Wonky Items
A Story by Larry Nelson, Curtin University
12 August 2010
The data:
The original control “cards”:
The reliability was not high enough for such an important test:
Three very weak items were spotted in Stats1b: I21, I29, and I39:
Still in Stats1b, scrolling up to get more scope on I21, I29, and I39:
Using Stats1f to focus on I21:
The quintile-a plot for I21:
The quintile-b plot for I21:
::::: The item writers’ decision for I21: key option C, and option D :::::
Using Stats1f to focus on I29:
The quintile-a plot for I29:
The quintile-b plot for I29:
::::: The item writers’ decision for I29: key options C, D, and E :::::
Using Stats1f to focus on I39:
The quintile-a plot for I39:
The quintile-b plot for I39:
::::: The item writers’ decision for I39: change key to option B :::::
The Big Question: how to implement the item writers’ decisions in Lertap?
The answer:
•Some “rules” about CCs lines:
First empty line is where processing ends (row 15 in this case).
Lines without * at the start are ignored (rows 1, 5, 9, 11, and 13).
(Lines 16 – 20 are all ignored as they come after an empty line.)
Each *col line defines a “subtest”.
Subtest 1 is titled “Run1”.
Subtest 2 is titled “Run2”.
Subtest 3 is defined above, but will be ignored.
After running the Interpret and Elmillon options, Lertap produces:
Freqs
Scores
Stats1f, Stats1b, Stats1ul
Stats2f, Stats2b, Stats2ul
New results, Stats2f:
The mean test score is now 34.73 (69.5%). It was 33.65 (67.3%) before.
Reliability now 0.83. It was 0.81.
Using Stats2f to focus on the “new” I21:
Using Stats2f to focus on the “new” I29:
Using Stats2f to focus on the “new” I39:
Looking at the new scatterplot at the bottom of Stats2b:
We did all this so that the test scores would be “fairer”.
We had three wonky items.
One of them, I39, had been mis-keyed.
The other two, I21 and I29, were plain-old faulty items, with ambiguity.
The item reviewers rescored the items for the present.
They also planned to re-write or replace them for the future.
Here are some of the scores, old (Run1), and new (Run2):
Se Tutto!