The main limitation of the macro concerns processing speed: it isn't fast. Not at all. When processing the BLOT data with 35 items and 150 students, the first iteration took 14 seconds to complete, while 11 seconds were required in each of the seven subsequent iterations. This was using Excel 2016 on a laptop with an I5-8250U Intel processor @ 1.60 GHz -- not the fastest computer, but not the slowest either. Iteration times were almost halved when using Excel 2010 on the same laptop.
With workbooks from the FIMS study, using Excel 2016 to process the whole 14-item dataset with all 6,300 students: the first iteration took just shy of 6 minutes, with each of the eight subsequent iterations taking just under 4 minutes. An exercise mentioned on the next page suggests that data from FIMS might be used to show how Rasch results vary by country. About 250 data records from each country would be sufficient for this exercise, and processing time with Excel 2016 then comes down to about 7 seconds per iteration.
Accuracy might be another concern, but a smaller one -- our results when processing large datasets, 50 items and 1,000 students for example, show excellent agreement with those obtained from using R programs such as TAM and Dexter, even better than the BLOT results on a previous page.
Like Moulton's example, the macro presently does not make use of the PROX algorithm for improving starting item and student performance estimates. One might think that datasets with few items and/or students might serve to limit the accuracy of the results produced by the macro, but by and large we have thus far found that, if the CTT (classical test theory) statistics are okay, macro results closely align with those from TAM and Dexter. Such is the case, for example, with FIMS results from Japan based on just 14 items. (But not so much the case with FIMS results from Australia where one or two items had poor CTT discrimination.)
As others have pointed out, items with negative CTT discriminations should be weeded out, and tests with a preponderance of either hard or easy items may not fare exceptionally well when it comes to Rasch.
The macro insures that it will have a complete data matrix by deleting those items and students having either a perfect or a zero score. In the case of BLOT, three students had perfect scores and were automatically excluded from the macro's analysis.
What about missing data? What happens when a student does not answer an item? This topic has a general discussion, but, in practical terms, input to the macro comes from the IStats worksheet and, by the time results have been carried by Lertap5 into IStats, missing data will have been converted to zeros, and these will then be passed on to the RaschAnalysis1 macro. In other words: missing data will be missing - as far as the macro knows, there isn't any.
We have not yet included INFIT and OUTFIT t-tests but may do so in the future. Such tests are so highly influenced by sample size as to have, in the opinion of some, limited utility.
INFIT and OUTFIT values can also relate to sample size; Bond and Fox (2015, Appendix B) suggest that INFITs/OUTFITs above 1.3 may hint at problematic items (not fitting the model) for samples of less than 500 students. This might be dropped, they say, to 1.2 for samples between 500 and 1,000, and then down to 1.1 for samples over 1,000.