gms | German Medical Science

Gemeinsame Jahrestagung der Gesellschaft für Medizinische Ausbildung (GMA) und des Arbeitskreises zur Weiterentwicklung der Lehre in der Zahnmedizin (AKWLZ)

05.08. - 09.08.2024, Freiburg, Schweiz

The Identification of Guessing Patterns in Progress Testing as a Machine Learning Classification Problem

Meeting Abstract

  • presenting/speaker Iván Roselló Atanet - Charité – Universitätsmedizin Berlin, AG Progress Test Medizin, Berlin, Germany
  • Miriam Sieg - Charité – Universitätsmedizin Berlin, AG Progress Test Medizin, Berlin, Germany
  • Victoria Sehy - Charité – Universitätsmedizin Berlin, AG Progress Test Medizin, Berlin, Germany
  • Mihaela Todorova Tomova - Technische Universität Ilmenau, Fakultät für Informatik und Automatisierung, Data-Intensive Systems and Visualization Group (dAI.SY), Ilmenau, Germany
  • Maren März - Charité – Universitätsmedizin Berlin, AG Progress Test Medizin, Berlin, Germany

Gemeinsame Jahrestagung der Gesellschaft für Medizinische Ausbildung (GMA) und des Arbeitskreises zur Weiterentwicklung der Lehre in der Zahnmedizin (AKWLZ). Freiburg, Schweiz, 05.-09.08.2024. Düsseldorf: German Medical Science GMS Publishing House; 2024. DocV-06-03

doi: 10.3205/24gma025, urn:nbn:de:0183-24gma0254

Veröffentlicht: 30. Juli 2024

© 2024 Roselló Atanet et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Research questions: The PTM test takes place twice a year since 1999 at German, Austrian and Swiss universities with the participation of around 10,000 medical students in each issue. We use data collected from this test to determine whether machine learning methods could be helpful in the identification of guessing test takers. Further, we investigate how these methods fare in comparison to more established statistical procedures for this purpose, particularly person-fit indices.

Methods: Most universities in the PTM consortium require participants to give information on answer confidence by means of a three-option (“very sure”, “fairly sure”, “guessed”) Likert scale [1]; they also collect data on response time per question. From these two data sources we built a dataset with 14,897 entries after preprocessing.

We defined a machine learning binary classification problem with two data labels: “guessing patterns” and “non-guessing patterns”. During the testing phase we set a classification threshold of 50%; however, alternative thresholds are also viable. We applied the logistic regression algorithm from the Python package scikit-learn [2] to this problem, with a train-test split of 80%:20%. This algorithm predicts data labels based on three parameters: number of answered questions, share of correct responses among the questions answered, and total time spent on the test.

Subsequently, we compared the results to those of the non-parametric person-fit indices included in R’s PerFit package [https://cran.r-project.org/web/packages/PerFit/PerFit.pdf]. These comparisons were made on a test-by-test basis; ROC-AUC scores were computed out of the probability values yielded by the logistic regression algorithm as well as the scores given by each person-fit index.

Results: Upon comparing the results obtained from logistic regression (with ROC-AUC scores ranging from 0.886 to 0.901) and the person-fit indices tested (with ROC-AUC scores ranging from 0.703 to 0.761 for the best performing index), we observe that logistic regression surpasses the performance of these person-fit indices.

Discussion: In our setting, logistic regression outperformed person-fit indices very clearly. We believe this is due to the fact that machine learning methods can be tailored to match the classification problem they are intended to solve (in our case, whether students have guessed more than 50% of their answers), while person-fit indices cannot.

Take-home messages: The problem of detecting guessing patterns in a low-stakes medical test can be understood as a binary classification problem which can be solved by machine learning methods. Experiments made with PTM data show that this approach outperforms person-fit indices.


References

1.
Kämmer JE, Hautz WE, März M. Self-monitoring accuracy does not increase throughout undergraduate medical education. Med Educ. 2020;54(4):320-327. DOI: 10.1111/medu.14057 Externer Link
2.
Sperandei S. Understanding logistic regression analysis. Biochem Med. 2014;24(1):12-18. DOI: 10.11613/BM.2014.003 Externer Link