Adjusting Test Scores After Finding Some Wonky Items
A Story by Larry Nelson, Curtin University

12 August 2010

The data:

 

The original control “cards”:

 

The reliability was not high enough for such an important test:

 

Three very weak items were spotted in Stats1b: I21, I29, and I39:

 

Still in Stats1b, scrolling up to get more scope on I21, I29, and I39:

 

Using Stats1f to focus on I21:

 

The quintile-a plot for I21:

 

The quintile-b plot for I21:

:::::  The item writers’ decision for I21: key option C, and option D :::::

 

Using Stats1f to focus on I29:

 

The quintile-a plot for I29:

 

The quintile-b plot for I29:

:::::  The item writers’ decision for I29: key options C, D, and E :::::

 

Using Stats1f to focus on I39:

 

The quintile-a plot for I39:

 

The quintile-b plot for I39:

:::::  The item writers’ decision for I39: change key to option B :::::

 

 


The Big Question: how to implement the item writers’ decisions in Lertap?

 

The answer:

 

•Some “rules” about CCs lines:

    First empty line is where processing ends (row 15 in this case).

    Lines without * at the start are ignored (rows 1, 5, 9, 11, and 13).

        (Lines 16 – 20 are all ignored as they come after an empty line.)

    Each *col line defines a “subtest”.

        Subtest 1 is titled “Run1”.

        Subtest 2 is titled “Run2”.

        Subtest 3 is defined above, but will be ignored.

 

After running the Interpret and Elmillon options, Lertap produces:

    Freqs

    Scores

    Stats1f, Stats1b, Stats1ul

    Stats2f, Stats2b, Stats2ul

 

New results, Stats2f:

 

The mean test score is now 34.73 (69.5%).  It was 33.65 (67.3%) before.

Reliability now 0.83.  It was 0.81.

 

Using Stats2f to focus on the “new” I21:

 

Using Stats2f to focus on the “new” I29:

 

Using Stats2f to focus on the “new” I39:

 

Looking at the new scatterplot at the bottom of Stats2b:

 

We did all this so that the test scores would be “fairer”.

    We had three wonky items.

    One of them, I39, had been mis-keyed.

    The other two, I21 and I29, were plain-old faulty items, with ambiguity.

        The item reviewers rescored the items for the present.

        They also planned to re-write or replace them for the future.

 

Here are some of the scores, old (Run1), and new (Run2):

 

Se Tutto!