JML, joint maximum likelihood estimation, involves a series of computational iterations and, as a result, a Rasch analysis will require time to complete, depending on the number of test items, the number of students, and the speed of the computer used.
As an example, when processing the BLOT data with 35 items and 150 students, Excel 365, running on a Windows10 laptop with an I5-8250U Intel processor @ 1.60 GHz, took 43 seconds to complete the five iterations needed to get the sum of squared residuals below the default cutoff value of 0.25 -- Excel 2010 running on the same laptop took 18 seconds. On a MacBookPro, also with an i5 processor and running Excel 365, much more time was required: 350 seconds, almost six minutes.
Note: the Excel 2010 times reported here are probably closer to those that would obtain using a current laptop or desktop computer running Excel 365 but having a processor better than an i5. (As of the year 2021 in Australia, Excel 2010 was no longer available for purchase. Note that Excel "365" is a generic label for users having an annual Office 365 subscription; these users will have their version of Excel automatically updated by Microsoft.)
With workbooks from the FIMS study, using Excel 365 to process the whole 14-item dataset with all 6,300 students required about 20 minutes to work through 6 iterations. This reduced to just under 7 minutes using Excel 2010.
A dataset with 60 items and 1,767 students also took 7 minutes using Excel 2010 (compared to 16.5 minutes with Excel 365). Using data from just the first 500 of these students saw this reduce to less than 2 minutes. This dataset may be found here.
The program insures that it will have a complete data matrix by deleting those items and students having either a perfect or a zero score. In the case of BLOT, three students had perfect scores and were automatically excluded from the analysis.
What about missing data? What happens when a student does not answer an item? This document has the asnswer.
INFIT and OUTFIT t-tests have not been included but may be added in the future. Such tests are so highly influenced by sample size as to have, in the opinion of some, limited utility.
INFIT and OUTFIT values can also relate to sample size; Bond and Fox (2015, Appendix B) suggest that INFITs/OUTFITs above 1.3 may hint at problematic items (not fitting the model) for samples of less than 500 students. This might be dropped, they say, to 1.2 for samples between 500 and 1,000, and then down to 1.1 for samples over 1,000.
For a superior discussion of the interpretation and use of INFIT and OUTFIT statistics see the entry for a text by Wu et al. (2016) in the Lertap5 references. One of the many useful points raised in this text is that classical test reliability and item discrimination values must be used in conjunction with a Rasch analysis -- it is possible the find acceptable INFIT and OUTFIT values for test items even when the test itself has unacceptable reliability (see p.154 in the text).