Adjusting Test Scores After Finding Some Wonky Items
A Story by Larry Nelson, Curtin University

12 August 2010

The data:


The original control “cards”:


The reliability was not high enough for such an important test:


Three very weak items were spotted in Stats1b: I21, I29, and I39:


Still in Stats1b, scrolling up to get more scope on I21, I29, and I39:


Using Stats1f to focus on I21:


The quintile-a plot for I21:


The quintile-b plot for I21:

:::::  The item writers’ decision for I21: key option C, and option D :::::


Using Stats1f to focus on I29:


The quintile-a plot for I29:


The quintile-b plot for I29:

:::::  The item writers’ decision for I29: key options C, D, and E :::::


Using Stats1f to focus on I39:


The quintile-a plot for I39:


The quintile-b plot for I39:

:::::  The item writers’ decision for I39: change key to option B :::::



The Big Question: how to implement the item writers’ decisions in Lertap?


The answer:


•Some “rules” about CCs lines:

    First empty line is where processing ends (row 15 in this case).

    Lines without * at the start are ignored (rows 1, 5, 9, 11, and 13).

        (Lines 16 – 20 are all ignored as they come after an empty line.)

    Each *col line defines a “subtest”.

        Subtest 1 is titled “Run1”.

        Subtest 2 is titled “Run2”.

        Subtest 3 is defined above, but will be ignored.


After running the Interpret and Elmillon options, Lertap produces:



    Stats1f, Stats1b, Stats1ul

    Stats2f, Stats2b, Stats2ul


New results, Stats2f:


The mean test score is now 34.73 (69.5%).  It was 33.65 (67.3%) before.

Reliability now 0.83.  It was 0.81.


Using Stats2f to focus on the “new” I21:


Using Stats2f to focus on the “new” I29:


Using Stats2f to focus on the “new” I39:


Looking at the new scatterplot at the bottom of Stats2b:


We did all this so that the test scores would be “fairer”.

    We had three wonky items.

    One of them, I39, had been mis-keyed.

    The other two, I21 and I29, were plain-old faulty items, with ambiguity.

        The item reviewers rescored the items for the present.

        They also planned to re-write or replace them for the future.


Here are some of the scores, old (Run1), and new (Run2):


Se Tutto!