gms | German Medical Science

GMS Journal for Medical Education

Gesellschaft für Medizinische Ausbildung (GMA)

ISSN 2366-5017

Increase in medical knowledge during the final year of undergraduate medical education in Germany

research article medicine

  • corresponding author Tobias Raupach - University Medical Centre Göttingen, Department of Cardiology and Pneumology, Göttingen, Germany
  • Daniela Vogel - University Medical Centre Hamburg-Eppendorf, III. Department of Internal Medicine, Hamburg, Germany
  • Sarah Schiekirka - University Medical Centre Göttingen, Department of Cardiology and Pneumology, Göttingen, Germany; University Medical Centre Göttingen, Study Deanery, Göttingen, Germany
  • Carolina Keijsers - University Medical Centre Utrecht, Department of Geriatric Medicine, Utrecht, the Netherlands
  • Olle Ten Cate - University Medical Centre Utrecht, Center for Research and Development of Education, Utrecht, the Netherlands
  • Sigrid Harendza - University Medical Centre Hamburg-Eppendorf, III. Department of Internal Medicine, Hamburg, Germany

GMS Z Med Ausbild 2013;30(3):Doc33

doi: 10.3205/zma000876, urn:nbn:de:0183-zma0008769

This is the English version of the article.
The German version can be found at: http://www.egms.de/de/journals/zma/2013-30/zma000876.shtml

Received: November 23, 2012
Revised: March 31, 2013
Accepted: May 2, 2013
Published: August 15, 2013

© 2013 Raupach et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc-nd/3.0/deed.en). You are free: to Share – to copy, distribute and transmit the work, provided the original author and source are credited.


Abstract

Aims: In Germany, the final year of undergraduate medical education (‘practice year’) consists of three 16-week clinical attachments, two of which are internal medicine and surgery. Students can choose a specific specialty for their third 16-week attachment. Practice year students do not receive specific teaching to prepare them for the National Licensing Examination. It is unknown whether knowledge levels increase during this year. This study aimed at assessing knowledge at the beginning and the end of the final year of medical school.

Methods: Three hundred pre-selected United States Medical Licensing Examination type items from ten medical disciplines were reviewed by ten recent medical graduates from the Netherlands and Germany. The resulting test included 150 items and was taken by 77 and 79 final year medical students from Göttingen and Hamburg at the beginning and the end of their practice year, respectively.

Results: Cronbach’s α of the pre- and post-test was 0.75 and 0.68, respectively. Mean percent scores in the pre- and post-test were 63.9±6.9 and 69.4±5.7, respectively (p<0.001; effect size calculated as Cohen’s d: 0.87). In individual students, post-test scores were particularly high for items related to their specific chosen specialty.

Conclusion: The knowledge test used in this study provides a suitable external tool to assess progress of undergraduate medical students in their knowledge during the practice year. The pre-test may be used to guide individual learning behaviour during this final year of undergraduate education.

Keywords: undergraduate medical education, practice, knowledge


Introduction

Medical students in Germany spend their final year of undergraduate medical studies, the so-called “practice year”, full time in a clinical environment. While all students are required to spend 16 weeks each in internal medicine and general surgery, they can choose one additional specialty rotation for the remaining 16 weeks of their practice year. Until 2012, the practice year was followed by a final National Licensing Examination (NLE) comprising 320 multiple choice questions which cover both basic and clinical knowledge. Medical schools did not routinely offer specific training for this high-stakes examination. As a consequence, German students had to find a balance between maximizing their learning gain regarding clinical skills and meeting the assessment requirements regarding basic and clinical knowledge during the final year. In the absence of formal assessments during this year, students were unable to monitor their progress.

Progress testing is designed to foster learning processes that are characterized as meaning oriented in contrast to reproduction oriented learning [1]. Several countries have developed progress tests to measure general medical knowledge progression of their medical students [2], [3], [4]. Furthermore, some students use the feedback provided by progress testing to guide their own learning activities [5]. Thus, a formative test of knowledge for final year students at the beginning and end of the practice year might be a helpful tool to provide feedback regarding knowledge gain during the final year. This study uses a 150 item multiple-choice knowledge test which had originally been designed as a suitably balanced and non-biased test for the comparison of final year students from different European countries [6]. For the purpose of the present study, the test was administered at two German universities. The primary aim of this study was to assess the increase in overall knowledge as assessed with this test during the practice year. We hypothesized that exam scores would be significantly higher at the end of the practice year as exposure to a clinical environment might also foster the acquisition of factual knowledge. We expected this effect to be most pronounced in those subject areas that students specifically addressed during their 16-week specialty rotation.


Methods

Use of multiple-choice questions

It has been shown that multiple-choice questions (MCQs) are a valid method of assessing cognitive knowledge [7]. MCQ examinations can yield high reliability which is an important requirement to be able to distinguish between groups or individuals [8]. Furthermore, well-constructed MCQs can also assess higher-order cognitive processes such as interpretation and application of knowledge skills rather than just testing the recall of facts [9]. Different formats of MCQs have been designed, including variations on the number of item branches. The United States Medical Licensing Examinations (USMLE) typically include single best answer questions with five branches, and the same format is used in the German NLE.

Selection of examination items

We used a 150-item knowledge test that was compiled from 1000 -freely available United States Medical Licensing Examination Step 2 type items [10]. Initially, 300 items had been selected and adapted according to the following criteria:

1.
All items had to belong to major medical disciplines: general medicine, anaesthesiology & emergency medicine, internal medicine, surgery, urology, obstetrics and gynaecology, paediatrics, neurology, psychiatry, clinical pharmacology. Small disciplines like ear-nose-and-throat or dermatology were excluded.
2.
All items were based on patient cases.
3.
Diseases specific to the Americas (e.g. Rocky Mountain spotted fever) were not included.
4.
Items including figures of any sort (e.g. X-rays or ECGs) were excluded for copyright reasons.
5.
Answers for each question all had to belong to the same category (e.g. diagnostics, therapy or other). Items including, for instance, two diagnostic and three therapeutic answers were not chosen, or adapted to fit this criterion.
6.
If different items covered the same topic or disease, only one item was included.

In February 2011, the 300 selected items plus answers were reviewed by five recent medical graduates from both Hamburg and Utrecht University in the Netherlands (total n=10) to identify items which seemed appropriate in more than one European country. Every rater was asked to choose 50% of the items to match best the expected content level of graduates in their country, with a fixed number per discipline to ensure content validity. The final test included 150 items chosen by at least 2 of the 5 raters from each country. A suitability score was calculated eventually for the final test by checking for difficult versus easy items and basic science versus clinical knowledge items by one rater per country resulting in a well balanced test with intraclass correlations of 0.85 and 0.71, respectively. A pilot test with a total of 56 students from Germany and the Netherlands yielded a Cronbach’s alpha for internal consistency of 0.79 [6].

Application of the formative pre- and post-tests

A German version of this newly developed test was used in our project as a formative pre- and post-test for final year undergraduate medical students in Germany, offered to 286 students (164 students in Göttingen and 122 students in Hamburg) at the beginning of their practice year (April 2011). Students received an e-mail outlining the study rationale and aims and were invited to participate in the pre-test. Students were also informed that they would be released from their clinical duties on the day of the test; this had been agreed upon with all teaching hospitals at both study sites.

A second test featuring the same items as the first one was offered to the same student groups at the end of their final year, just before students started to prepare for their NLE in spring 2012. Questions and answers were not made available to study participants and all papers were collected after writing the test. However, students were informed about their results via e-mail. Socio-economic data and information on study time spent abroad and choice of specialty rotation of participating students were also recorded, and participation was voluntary. This study was approved by the Hamburg State Ethics Committee.

Data collection and statistical analysis

Data collected on questionnaires and examination papers were manually transferred to the statistical software package SPSS 19.0 (SPSS Inc., Chicago, Illinois, USA). Differences between the two student cohorts participating in the pre- and the post-test, respectively, were assessed using χ2-tests (dichotomous variables) and t-tests (continuous variables). Effect sizes were calculated as Cohen’s d with values of 0.2 indicating small and values of 0.8 indicating large effects [11]. Item characteristics of the pre- and post-examination were assessed in terms of item difficulty, corrected item-total correlations and Cronbach’s α as a measure of internal consistency. In order to detect a difference in exam percent scores of 3% (e.g., 68% versus 65% with standard deviations of 6.5% in each group) at a significance level of 5% with a statistical power of 80%, a minimum of 58 students had to be enrolled to participate in both the pre- and the post-test (equivalent to a longitudinal response rate of 20%). We chose a 3% difference as important as it usually represents approximately one-third of a step in the German marking system. Data are presented as mean±standard deviation or percentages (n), as appropriate. Significance levels were set to p<0.05.


Results

Response rate and subject characteristics

The pre-test was taken by 77 students, and the post-test was taken by 79 students (response rates 26.9% and 27.6%, respectively). The proportion of female students was 66.2% in the pre- and 73.4% in the post-test, respectively. Response rates were higher in Hamburg and differed slightly between the pre- and the post-test at both study sites (Hamburg: 45/122 (36.9%) versus 58/122 (47.5%); Göttingen: 32/164 (19.5%) versus 21/164 (12.8%)). A total of 47 students took both the pre- and the post-test. Subject characteristics of the two cohorts are displayed in Table 1 [Tab. 1]. As expected, students taking the post-test were significantly older than students taking the pre-test. There were no significant differences between the cohorts regarding gender, mother tongue, previous vocational training, spending parts of the practice year abroad and choice of the specialty rotation.

Item analysis

Cronbach’s α of the pre- and post-test was 0.75 and 0.68, respectively. Item difficulty ranged from 0.03 to 1.00 (mean 0.64) in the pre- and from 0.04 to 1.00 (mean 0.69) in the post-test. The percentage of items with a difficulty between 0.4 and 0.8 was 56.7% (n=85) and 50.7% (n=76) in the pre- and post-test, respectively. Corrected item-total correlations of exam items ranged from -0.20 to 0.39 (mean 0.13) in the pre- and from -0.32 to 0.45 (mean 0.10) in the post-test. The percentage of items with positive discriminatory power was 85.3% (n=128) and 70.7% (n=106) in the pre- and post-test, respectively.

Student performance

Students achieved a mean percent score of 63.9±6.9 in the pre- and 69.4±5.7 in the post-test. (T(154)=-5.376; p<0.001; t-test for independent samples). The effect size of this difference calculated as Cohen’s d was 0.87, indicating a large effect. A test for dependent samples in the subset of 47 students who took both tests yielded a similar result: 64.6±6.7 in the pre- and 69.6±5.3 in the post-test (T(46)=-7.299; p<0.001). Analysis of exam results for specific specialties (see Table 2 [Tab. 2]) revealed that students who had chosen anaesthesiology & medical emergencies as their specialty rotation performed no better in items related to anaesthesiology & medical emergencies in the post-test than students who had chosen any other specialty (11.7±1.4 versus 11.7±1.6 out of 15 points; p=0.985), and a similar pattern of results was observed for neurology (9.9±2.0 vs. 10.2±1.8 out of 15 points; p=0.639). However, students who had chosen paediatrics achieved higher post-test scores in items related to paediatrics than students who had chosen any other specialty (13.6±1.3 vs. 11.6±1.5 out of 15; p<0.001; Cohen’s d=1.40), and the same was true for obstetrics and gynaecology (7.8±1.3 vs. 6.7±1.4 out of 10 points; p=0.017; Cohen’s d=0.81). Anecdotally, five out of the 14 students who had chosen paediatrics as their specialty rotation answered all paediatrics items correctly. In contrast, only one out of the 65 students who had chosen a specialty other than paediatrics answered all paediatrics items correctly.


Discussion

Using a 150-item test consisting of USMLE type items as a formative pre- and post-test for final year students in Germany, this study demonstrates a significant increase in knowledge levels after the practice year of undergraduate medical education. This increase appeared to be greatest regarding items related to students’ specialty choices. A critical appraisal of individual performance levels by means of a formative test might be helpful to guide students’ self-study during this year. To this date, no validated knowledge test has been available for this purpose. Our newly developed test might close this gap by providing a tool for formative assessment of medical students in their final year. In fact, individual feedback from several study participants indicated that they appreciated being able to assess their own knowledge under simulated ’exam conditions’ and that they used their pre-test results to guide individual learning during the practice year.

Increase in exam performance levels during the practice year

Overall, performance levels as assessed in the post-test were rather low, and there are a number of potential explanations for this finding. Given that we used a formative rather than a summative exam, students might not have been sufficiently incentivized to achieve the maximum scores they would have been able to score [12]. On the other hand, it might be hypothesized that students participating in the study were highly motivated to know more about their performance level and would thus have tried the best they could to answer the exam items. However, they also might not have been used to the wording in the USMLE type item format.

Even taking into account these potential limitations, the increase from the pre- to the post-test results observed in this study indicates that this formative test provided a valid estimate of student performance levels. In our pre-test sample (i.e., students who had just completed a five-year undergraduate medical curriculum and who self-selected to participate in the time-consuming activity of taking a 150-item examination), the average percent score was as low as 64%. As a percent score of 60% is usually needed to pass the German NLE, this would translate into a 75% pass rate (58 out of 77 students taking the pre-test) in this highly motivated sample. At the end of the practice year, the average percent score was below 70%, and five out of 79 students taking the post-test scored less than 60% of the available points. While it is somewhat reassuring to see that a majority of students would pass the exam used in this study even before entering the practice year, the moderate performance in the post-test is an important finding as the average percent score reported for the spring 2012 NLE in Germany was 79.4% [http://www.impp.de/IMPP2010/pdf/ErgMedF12.pdf].

At first glance, the increase in performance from the pre- to the post-test might seem surprising as exposure to clinical practice during the final year of medical education would not be expected to increase factual knowledge as assessed in the test due to a lack of constructive alignment between teaching and assessment format [13]. However, it may be hypothesized that students working on the wards encounter a number of opportunities to increase their factual knowledge. Dealing with clinical cases prompts students to build on and expand their knowledge, particularly in the presence of experienced physicians who provide informal teaching during ward rounds. In addition, being involved in patient care is likely to increase student motivation to learn. Vice versa, our finding of students achieving particularly high scores in items related to their specific specialties might reflect a higher motivation to engage in self-directed learning activities regarding their chosen specialty rotation (i.e., higher a priori motivation). Curricular representation of these subjects (i.e., paediatrics, obstetrics and gynaecology) might be partially responsible for this finding. Alternatively, additional teaching in these specialties during the practice year might have been of particularly high quality. However, we did not formally assess teaching quality in the seven different specialties chosen by study participants.

Strengths and limitations

We used specific quality criteria during the selection process for items to be included in the test [14]:

1.
selected MCQs were to test important material which was appropriate for the level of training,
2.
the stems of the MCQs were mostly case vignettes which contained the majority of information in a focused manner,
3.
the five answers were homogeneous in content, length, and grammar.

Even though we used USMLE type items that study participants were not familiar with, this is unlikely to have significantly impacted on our results as this ‘origin bias’ has been shown to be small when new item formats are presented to advanced undergraduate medical students [15]. Despite careful selection of test items resulting in a balanced examination, item characteristics and internal consistency of the exam were suboptimal. However, similar Cronbach’s α values have been reported in a study on different multiple-choice question formats [16], and a recent analysis of data derived from 10 years of postgraduate progress testing in obstetrics and gynaecology yielded even lower values [17]. We cannot pin-point the reason for the relatively low internal consistency of our test and the ones referred to above. At the very least, this appears to be a problem frequently encountered with formative tests of clinical knowledge. It might be hypothesized that the student group self-selecting for participation in our study was relatively homogeneous with respect to performance levels. This might have reduced the variance in test results and thus decreased Cronbach’s alpha. More research is needed to determine the impact of the heterogeneity of participating students and/or of the items included in the test on its psychometric properties.

Testing bias (i.e., better performance in the post-test due to students having seen the same items in the pre-test) is unlikely as the pre- and post-tests were taken many months apart and answers had not been distributed after the pre-test. We initially chose a longitudinal design for this study. Power analysis was based on an anticipated response rate of 20%, but the longitudinal sample contained only 16.4% of eligible students. In order to assess the primary endpoint of the study, all students providing data at both time points were included in the analysis, yielding a response rate of over 25%. As a consequence, the test used in this study cannot be referred to as a true ‘progress test’ as this would have required all students to participate in both tests. The overall small response rate raises concern that selection bias might have impacted on study results. Accordingly, we cannot rule out the possibility that the difference between pre- and post-test results is due to students taking the post-test having genuinely higher performance levels than students taking the pre-test. However, given the overlap between the two cohorts, this is unlikely to completely explain the difference.


Conclusions

A US-based test for medical knowledge used in this study provides a tool for formative assessment of the progress of knowledge for undergraduate medical students in their final practice year. The pre-test may be used by students to guide their individual learning behaviour during the practice year, and the post-test could help them to identify specific areas in which more thorough preparation may be necessary to improve their basic and clinical knowledge.


Acknowledgements

The authors thank all participating panel members in Utrecht and Hamburg and all participating medical students and the administrative team members in Göttingen and Hamburg.


Conflict of interest

The authors declare that they have no competing interests.


References

1.
Berkel HJ, Nuy HJ, Geerlings T. The influence of progress tests and block tests on study behaviour. Instruct Sci. 1994;22:317-333. DOI: 10.1007/BF00891784 External link
2.
van der Vleuten CP. National, European licensing examinations or none at all? Med Teach. 2009;31(3):189-191.DOI: 10.1080/01421590902741171 External link
3.
Coombes L, Ricketts C, Freeman A, Stratford J. Beyond assessment: feedback for individuals and institutions based on the progress test. Med Teach. 2010;32(6):486-490. DOI: 10.3109/0142159X.2010.485652 External link
4.
Williams RG, Klamen DL, White CB, Petrusa E, Fincher RM, Whitfield CF, Shatzer JH, McCarty T, Miller BM. Tracking development of clinical reasoning ability across five medical schools using a progress test. Acad Med. 2011;86(9):1148-1154. DOI: 10.1097/ACM.0b013e31822631b3 External link
5.
Nouns ZM, Georg W. Progress testing in German speaking countries. Med Teach. 2010;32(6):467-470. DOI: 10.3109/0142159X.2010.485656 External link
6.
Vogel D, Gierk B, ten Cate O, Harendza S. Composition of an international medical knowledge test for medical students near graduation. Dundee: AMEE; 2011. Abstract book page 71.
7.
Downing SM. Assessment of knowledge with written test formats. In: Norman G, van der Vleuten C, Newble D (Hrsg). International handbook of research in medical education. Dordrecht: Kluwer; 2002. S.647-672. DOI: 10.1007/978-94-010-0462-6_25 External link
8.
Schwartz PL, Crooks TJ, Sein KT. Test-retest reliability of multiple true-false questions in preclinical medical subjects. Med Educ. 1986;20(5):399-406. DOI: 10.1111/j.1365-2923.1986.tb01184.x External link
9.
Case SM, Swanson DB. Constructing written test questions for the basic and clinical sciences. 3rd edition ed. Philadelphia: National Board of Medical Examiners; 2001.
10.
Le T, Vieregger K. First aid Q & A for the USMLE step 2 CK. 2nd edition ed. New York: McGraw-Hill; 2010.
11.
Cohen J. A Power Primer. Psychological Bulletin. 1992;112(1):155-159. DOI: 10.1037/0033-2909.112.1.155 External link
12.
Raupach T, Hanneforth N, Anders S, Pukrop T, Th J ten Cate O, Harendza S. Impact of teaching and assessment format on electrocardiogram interpretation skills. Med Educ. 2010;44(7):731-740. DOI: 10.1111/j.1365-2923.2010.03687.x External link
13.
Kern DE, Thomas PA, Howard DM, Bass EB. Curriculum development for medical education - A six-step approach. Baltimore, London: The John Hopkins University Press; 1998.
14.
Boland RJ, Lester NA, Williams E. Writing multiple-choice questions. Acad Psychiatry. 2010;34(4):310-316. DOI: 10.1176/appi.ap.34.4.310 External link
15.
Muijtjens AM, Schuwirth LW, Cohen-Schotanus J, van der Vleuten CP. Origin bias of test items compromises the validity and fairness of curriculum comparisons. Med Educ. 2007c;41(12):1217-1223.
16.
Coderre SP, Harasym P, Mandin H, Fick G. The impact of two multiple-choice question formats on the problem-solving strategies used by novices and experts. BMC Med Educ. 2004;4:23. DOI: 10.1186/1472-6920-4-23 External link
17.
Dijksterhuis MG, Scheele F, Schuwirth LW, Essed GG, Nijhuis JG, Braat DD. Progress testing in postgraduate medical education. Med Teach. 2009;31(10):e464-468. DOI: 10.3109/01421590902849545 External link