Artikel
The Identification of Guessing Patterns in Progress Testing as a Machine Learning Classification Problem
Suche in Medline nach
Autoren
Veröffentlicht: | 30. Juli 2024 |
---|
Gliederung
Text
Research questions: The PTM test takes place twice a year since 1999 at German, Austrian and Swiss universities with the participation of around 10,000 medical students in each issue. We use data collected from this test to determine whether machine learning methods could be helpful in the identification of guessing test takers. Further, we investigate how these methods fare in comparison to more established statistical procedures for this purpose, particularly person-fit indices.
Methods: Most universities in the PTM consortium require participants to give information on answer confidence by means of a three-option (“very sure”, “fairly sure”, “guessed”) Likert scale [1]; they also collect data on response time per question. From these two data sources we built a dataset with 14,897 entries after preprocessing.
We defined a machine learning binary classification problem with two data labels: “guessing patterns” and “non-guessing patterns”. During the testing phase we set a classification threshold of 50%; however, alternative thresholds are also viable. We applied the logistic regression algorithm from the Python package scikit-learn [2] to this problem, with a train-test split of 80%:20%. This algorithm predicts data labels based on three parameters: number of answered questions, share of correct responses among the questions answered, and total time spent on the test.
Subsequently, we compared the results to those of the non-parametric person-fit indices included in R’s PerFit package [https://cran.r-project.org/web/packages/PerFit/PerFit.pdf]. These comparisons were made on a test-by-test basis; ROC-AUC scores were computed out of the probability values yielded by the logistic regression algorithm as well as the scores given by each person-fit index.
Results: Upon comparing the results obtained from logistic regression (with ROC-AUC scores ranging from 0.886 to 0.901) and the person-fit indices tested (with ROC-AUC scores ranging from 0.703 to 0.761 for the best performing index), we observe that logistic regression surpasses the performance of these person-fit indices.
Discussion: In our setting, logistic regression outperformed person-fit indices very clearly. We believe this is due to the fact that machine learning methods can be tailored to match the classification problem they are intended to solve (in our case, whether students have guessed more than 50% of their answers), while person-fit indices cannot.
Take-home messages: The problem of detecting guessing patterns in a low-stakes medical test can be understood as a binary classification problem which can be solved by machine learning methods. Experiments made with PTM data show that this approach outperforms person-fit indices.