gms | German Medical Science

15. Grazer Konferenz – Qualität der Lehre: Teaching and Learning – Expanding our Resources

28. - 30. April 2011, Wien, Österreich

Reusing Written Test Items: Will their difficulties decline?


Suche in Medline nach

15. Grazer Konferenz – Qualität der Lehre: Teaching and Learning – expanding our resources. Wien, Österreich, 28.-30.04.2011. Düsseldorf: German Medical Science GMS Publishing House; 2012. Doc11grako53

doi: 10.3205/11grako53, urn:nbn:de:0183-11grako539

Veröffentlicht: 25. April 2012

© 2012 Wagner-Menghin et al.
Dieser Artikel ist ein Open Access-Artikel und steht unter den Creative Commons Lizenzbedingungen ( Er darf vervielfältigt, verbreitet und öffentlich zugänglich gemacht werden, vorausgesetzt dass Autor und Quelle genannt werden.



While practical stations are generally reused, as exposing the case content does not appear to influence examinees [1], [2], written test items are kept close for some years. Knowing more about how reusing written test-items influences person-scores and item-diffculties, will inform the decision of reusing them in subsequent tests expediently. Relevant literature does not indicate that person-scores increase when items are reused. However, in these studies subjects did not expect to encounter reused items and their cheating attitude speaks against leaking items, making it unlikely that students use leaked material for studying. Studies also do not give figures regarding item-diffculty-shift, presumably because sample-dependence of item-diffculty-statistics precludes meaningful comparison. In this pilot study reuse-effects were evaluated in a setting where students commonly encounter reused items and discussing exams’ items is not regarded as cheating. Item-diffculties and person-scores necessary for quantifying reuse-effects were estimated using the Rasch-Model. It expresses itemdiffculties relative to each other, thus calibrating item-diffculties from different person samples and comparing them is possible.

671 students sat the newly introduced in-course exam assessing basic clinical skills. Example items were published in the offcial study materials. Four test forms, experimentally combining published (8%), unused (52%) and reused items (40%), were administered on four test days within three weeks. Students were pre-scheduled a day, but allowed to re-schedule.Mean item-diffculty for the 15 reused items decreases (T-test: p=0.067, n.s) as expected. Students self-scheduling to the last shift perform worse than other students (ANOVA: p<0.05, s.). Six item-diffculties decrease, two increase and seven stay stable.

The weaker performance of students self-scheduling to the last exam shift, complies with previous literature. Availability of leaked material for the in-course exam does not translate uniquely in higher person-scores when used for studying, as mastering old items may not transfer to the new items also included in the test. An exam’s quality will not automatically deteriorate when a low ratio of randomly selected items is reused.


Boulet JR, McKinley DW, Whelan GP, Hambleton RK. The effect of task exposure on repeat candidate scores in a high-stakes standardized patient assessment. Teach Learn Med. 2003;15:227-232. DOI: 10.1207/S15328015TLM1504_02 Externer Link
Reiter HI, Salvatori P, Rosenfeld J, Trinh K, Eva KW. The effect of defined violations of test security on admissions outcomes using multiple mini-interviews. Med Educ. 2006;40:36-42. DOI: 10.1111/j.1365-2929.2005.02348.x Externer Link