|
|
Not too massive missives and musings
(more than too, in fact)
Some somewhat technical papers; not all of them are boring.
• |
Visual item analysis with quintile plots.
The matter of assessing the quality of cognitive test items has traditionally been based on tables of numeric data, embroidered with core global measures, such as estimates of test reliability.
Lertap's quintile plots provide an alternative method: pictures.
Have a look at some examples, and see if you too might not be found wearing smiles for quintiles. Click here to have an initial look; you'll branch out to a pdf document, about 400 KB. See the new "packed plots" in action here.
Then, pour yerself another cuppa something, sit back, and take in a short practical example from some quality achievement tests developed and used in Central Java ('Jateng'). (PDF file, about 110 KB.) |
• |
Using cut scores to denote subject mastery.
The Standards for Educational and Psychological Testing (1999), published by the American Educational Research Association, recommend the use of 'special' statistics when the measurement process involves the use of cut scores. Mastery, licensing, and certification tests are examples of applications which typically use cut scores, often on a pass-fail basis.
Starting with version 5.6.3, Lertap supports all of the 'special' statistics recommended in the Standards. We've got a whiz-bang, top-flight paper which you'll not want to miss if you hope to make the cut. Have a read (PDF file, about 500 KB).
NCCA, the National Commission for Certifying Agencies, has a special report form used to summarise results from mastery (or pass/fail) exams. A paper which indicates how Lertap's output links in with the information requested by NCCA is here.
(Caution inserted June 2007: the top-flight paper just mentioned fails to point out a potential error in version 5.6.3. Two of the three conditional standard error of measurement calculations made by version 5.6.3, CSEM1 and CSEM2, may be inaccurate if users have employed one or more *mws lines in their CCs worksheet. CSEM1 and CSEM2 assume all test items are scored on a right/wrong basis. While this is far and away the normal case, the use of *mws lines changes the picture, and can in some situations introduce error in the CSEM1 and CSEM2 figures. Less than 5% of the Lertap-using world employs *mws lines; this caution is for their eyes only.) |
• |
Differential item functioning (DIF).
Support for DIF analyses was added to the Excel 2007 version of Lertap in September, 2009.
Lertap uses Mantel-Haenszel (M-H) methods for assessing the possibility of differential item functioning.
A feature of Lertap's implementation of M-H involves the ability to make charts of group item responses, including empirical item response function plots.
Read about it / see all about it with a wee click about here (PDF file, about 800 KB).
|
• |
Response similarity analysis (RSA).
In July 2005 work started on equipping Lertap with tools to enable users to investigate the possibility of student cheating.
A description of the basics of this work may be seen by clicking here (PDF file, about 600 KB. Note: Lelp's Response similarity analysis topic is very relevant to this document).
The initial methods used in Lertap's "response similarity analysis" were tested with several data sets from two major testing centres, and a paper prepared for journal submission. Other programs, such as Integrity, Scrutiny!, and SCheck are mentioned in this paper. You may take in a near-final draft version of the paper with a click here (PDF file, about 300 KB).
In early 2006 an updated version was released, with much improved support. Read all about it (PDF file, about 350 KB).
|
• |
Iteman 4 and Lertap 5
ITEMAN is another item analysis program recognised (not to mention used) all over the world. These documents point to some of the differences between the latest versions of Iteman and Lertap 5. For a very general discussion see this small PDF file. For a more detailed account of one particular area of difference, item performance summaries and flags, you won't want to go without reading this PDF file.
|
• |
About eigenvalues, scree tests, and coefficient
alpha.
Copy of a 2005 journal article having
to do with interpreting some of Lertap's output, and suggesting
a new method for guesstimating the number of factors underlying
a set of test items. (PDF file, about 450 KB.) |
• |
Production mode.
We have worked a bit with an Australian
university to suggest the design of a system which will
accept output from a scanner, and automatically
(1) reformat it so that it's Lertappable, and (2), have
Lertap make reports and graphs. This document gets into
macros, and more macros— Holy macro! (Word
doc file, about 190 KB.)
Note: Lelp's Production
mode topic is very relevant to this paper. |
• |
Programming Lertap
Talking about macros, the Macs menu may be used to link home-grown code modules to the toolbar, making it possible to customize Lertap so that it's specially tailored to local needs.
The technical aspects underpinning this capability can require some knowledge of programming, but we have an example or two ready for all users to enjoy (Word
doc file, about 180 KB).
Note: to fully pursue the use of macros in Lertap, you should not be without a read of Lelp's Macs menu topic. |
• |
Scoring open-response items.
It is possible to get Lertap to score
open-ended, short-answer, and free-response questions. This
document explains how, using real data from a graduate student
in Minnesota. (Word
doc file, about 120 KB.) |
• |
Lertap's correlation coefficients.
As millions of people the world over
launch their Lertap careers, the question sometimes arises
as to why some of Lertap's results differ from results obtained
from other programs. The answer has to do with correcting
correlation coefficients for something your parents probably
never mentioned: part-whole contamination. (Word
doc file, about 200 KB.) |
• |
Solutions for Excel's 255-character
limit.
Users with a tendency to create very
long lines in Lertap's CCs worksheet can find themselves
up against an Excel limitation. This document might help
you out, should you long for lengthy CCs lines some day.
(Word
doc file, about 230 KB.)
Note for Excel 2007 users: this limitation is gone! Excel 2007 has significant enhancements in some areas (but in other areas may not be as good as previous versions; see this paper). |
• |
Experimental features in Lertap 5.
Mentions some special statistics which
are available in Lertap, but not normally output. These
include biserial correlation coefficients, and classical
test estimates of two item-response theory (IRT) parameters.
Updated in 2005 to reference Dawber's
doctoral research. (Web
page, will open in a new browser window.) |
•
|
Item analysis in criterion-referenced situations.
Comments on the use of item analysis
for competency-based testing, by Ian Boyd, director
of one of Western Australia's technical and further education
colleges. The mastery test procedures discussed in this
paper are supported by Lertap 5. (Word
doc file, about 120 KB.)
|
•
|
Rasching an achievement test (draft of 27 May 2008) .
Examines the use of two Rasch (IRT) oriented systems, ConQuest 2.0 and Winsteps, using an achievement test traditionally processed with classical test theory (CTT) and Lertap. Argues that there may be little or (likely) even no gain in using Rasch scaling, pointing out that there is reason to question the Rasch assertion of "fundamental measurement" and interval scaling. (PDF file, about 450 KB.)
|
• |
Some CTT and IRT comments.
For years playground bullies have been
running amuck, suggesting that classical test theory is
outdated. You may have seen them— they wear strange
hats, and t-shirts with "IRT is me" printed on
them. This paper looks at some recent literature which compares
the use of CTT and IRT methods; bastante interestante!
(Web page,
will open in a new browser window.) |
|
|
|