Adjusting Test Scores After Finding Some Wonky Items
A Story by Larry Nelson, Curtin University

12 August 2010

The data:

The original control “cards”:

The reliability was not high enough for such an important test:

Three very weak items were spotted in Stats1b: I21, I29, and I39:

Still in Stats1b, scrolling up to get more scope on I21, I29, and I39:

Using Stats1f to focus on I21:

The quintile-a plot for I21:

The quintile-b plot for I21:

::::: The item writers’ decision for I21: key option C, and option D :::::

Using Stats1f to focus on I29:

The quintile-a plot for I29:

The quintile-b plot for I29:

::::: The item writers’ decision for I29: key options C, D, and E :::::

Using Stats1f to focus on I39:

The quintile-a plot for I39:

The quintile-b plot for I39:

::::: The item writers’ decision for I39: change key to option B :::::

The Big Question: how to implement the item writers’ decisions in Lertap?

The answer:

•Some “rules” about CCs lines:

First empty line is where processing ends (row 15 in this case).

Lines without * at the start are ignored (rows 1, 5, 9, 11, and 13).

(Lines 16 – 20 are all ignored as they come after an empty line.)

Each *col line defines a “subtest”.

Subtest 1 is titled “Run1”.

Subtest 2 is titled “Run2”.

Subtest 3 is defined above, but will be ignored.

After running the Interpret and Elmillon options, Lertap produces:

Freqs

Scores

Stats1f, Stats1b, Stats1ul

Stats2f, Stats2b, Stats2ul

New results, Stats2f:

The mean test score is now 34.73 (69.5%). It was 33.65 (67.3%) before.

Reliability now 0.83. It was 0.81.

Using Stats2f to focus on the “new” I21:

Using Stats2f to focus on the “new” I29:

Using Stats2f to focus on the “new” I39:

Looking at the new scatterplot at the bottom of Stats2b:

We did all this so that the test scores would be “fairer”.

We had three wonky items.

One of them, I39, had been mis-keyed.

The other two, I21 and I29, were plain-old faulty items, with ambiguity.

The item reviewers rescored the items for the present.

They also planned to re-write or replace them for the future.

Here are some of the scores, old (Run1), and new (Run2):

Se Tutto!