gms | German Medical Science

5th International Conference for Research in Medical Education

15.03. - 17.03.2017, Düsseldorf

Rasch analysis of OSCE data – an illustrative example as proof of concept

Meeting Abstract

Search Medline for

  • corresponding author presenting/speaker Joy Backhaus - Institute of Medical Teaching and Medical Education Research, Wuerzburg, Germany
  • Chantal Rabe - Institute of Medical Teaching and Medical Education Research, Wuerzburg, Germany; Department of General, Visceral and Paediatric Surgery, University Medical Centre, Goettingen, Germany
  • Eva Hennel - Teaching Clinic of the Medical Faculty Wuerzburg, Wuerzburg, Germany
  • Sarah Koenig - Institute of Medical Teaching and Medical Education Research, Wuerzburg, Germany

5th International Conference for Research in Medical Education (RIME 2017). Düsseldorf, 15.-17.03.2017. Düsseldorf: German Medical Science GMS Publishing House; 2017. DocO29

doi: 10.3205/17rime29, urn:nbn:de:0183-17rime294

Published: March 7, 2017

© 2017 Backhaus et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at



Introduction: The Objective Structured Clinical Examination (OSCE) is a widely used tool in the assessment of practical skills in medical education. To date, the classical test theory (CTT) remains the gold standard in analyzing student performance. However, statistical models based on item-response theory (IRT) offer psychometric advantages in the analysis and subsequent interpretation of examination scores. This study explores the use of the Rasch model in evaluating OSCE checklists.

Methods: A monocentric OSCE (five stations) was performed with 298 third-year medical students to assess their basic medical skills (winter term 2015/2016 and summer term 2016). Statistical parameters provided by the Rasch model, such as differential item functioning (DIF) using the Wald-test on both global and item level, as well as mean-square (MSQ) outfit statistics were computed using the R package "eRm".

Results: Andersen's Likelihood Ratio Test was not significant (χ2(15)=5.45, p=.99), which indicates that the OSCE data fit the Rasch model. The person separation index (PSI) varies from .163 to .704, indicating stations with a low to sufficient level of reliability respectively for the OSCE scores. MSQ outfit values revealed that checklist items measure a unidimensional construct, i.e. medical skills competence. DIF indicated that individual items of the OSCE station scores did not meet the criteria of reliable/fair measurement (e.g. χ2(124)=235.42, p<.001). On discussion with medical teachers, examiners and students corresponding technical problems and wording deficits could be identified in the checklist items.

Other useful findings from the Rasch analysis were also revealed using the data set, which provide insight above and beyond the analysis based on CTT.

Conclusion: This study exemplifies the potential insight that Rasch analysis provides to clinical teachers and stakeholders alike through a retrospective analysis of OSCE data. IRT proves a useful approach to the calibration and analysis of OSCE scores. In line with findings from large-scale assessments, the Rasch model provides an excellent platform to evaluate the quality of items.