gms | German Medical Science

GMS Journal for Medical Education

Gesellschaft für Medizinische Ausbildung (GMA)

ISSN 2366-5017

Effects and Sustainability of Trainings for the Oral and Practical Part of the German Final Exam in Medicine

research article medicine

Search Medline for

  • corresponding author Wolfgang Öchsner - University Hospital Ulm, Department for Cardiac Anaesthesiology, Ulm, Germany; University of Ulm, Medical Faculty, Office of the Dean of Education, Ulm, Germany
  • author Sandra Geiler - University of Ulm, Department for Evaluation and Quality Management in Medical Education, Ulm, Germany
  • author Markus Huber-Lang - University Hospital Ulm, Clinic for Trauma Surgery, Hand Surgery, Plastic Surgery and Reconstructive Surgery, Ulm, Germany

GMS Z Med Ausbild 2013;30(3):Doc36

doi: 10.3205/zma000879, urn:nbn:de:0183-zma0008795

This is the translated version of the article.
The original version can be found at:

Received: July 27, 2012
Revised: January 13, 2013
Accepted: May 2, 2013
Published: August 15, 2013

© 2013 Öchsner et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( You are free: to Share – to copy, distribute and transmit the work, provided the original author and source are credited.


Study Goals: It is known that the manifold limitations of oral and practical examinations can be improved by specific training. With the help of an online survey, our present study analyzes the effects that can be achieved by the training conducted at the University of Ulm for examiners in the final medical examination, the long-lasting impact of the training, and differences among participant subgroups.

Method: All 367 participants in the training at Ulm (2007- 2012) were contacted via email. Sixty-three persons responded to the survey that included 28 items concerning demographic data, effectiveness, and sustainability.

Results: Six main effects of the training were identified (meaning effects rated with a grade of 1 or 2 on a 6-point scale by two thirds of the participants, with 1=“applicable” and 6=“not applicable”; cumulated percentage of answers of 1 or 2 in parentheses):

Conscious handling of strengths and weaknesses of oral examinations (71%),
Knowledge of factors contributing to the reliability of oral/practical examinations (76%),
Knowledge of factors contributing to the validity of oral/practical examinations (75%),
Improvement of competence in task construction (68%),
Improvement of competence in respect to examination formalities (75%),
Implementation of the concept of “structured oral examinations” (a priori planning of examination subjects, tasks, levels of expectation and grading criteria) (86%).

The responses of participants trained more than two years ago were not significantly different from the answers given by recently trained persons. This is an argument for the sustainability of the training effects.

Furthermore, participants without relevant prior experience in oral/practical examinations profited significantly more from the trainings, especially in the areas of stress reduction, confidence in grading, and competence in critical discrimination of grading.

Conclusion: The positive and sustained effects of the examiner training argue for continuing the training program, especially for inexperienced examiners. Expansion of the successful training program to include the first medical exam should be considered.

Keywords: Medical Education, Final Exam, Oral Examinations, Examiner Training


In the study of human medicine, oral and practical examinations are an integral part of state examinations [2]. This will continue to be adhered in the current pending changes to the licensing regulations [] [12]. The high reliability and good legal verifiability of the written exams through the use of multiple-choice questions (10, 13) is, however, limited in terms of oral and practical exams [14], [17]. Insufficient reliability leads to a lack of validity, even though these two factors are essential criteria for ensuring the good quality of examinations [10], [16].

Despite this, the grades for both oral exams (M1 and M2) comprise half of the overall grade for the medical examination; the percentage assigned to the M2 oral exam is after all 33% [2].

In Baden-Württemberg the Kompetenznetz Lehre in der Medizin (Competency Network for Teaching Medicine) was founded in 2007 as an alliance of all five state university medical schools [4]. In this Competency Network a basic concept encompassing eight instructional units was developed to train examiners for the oral part of the M2 exam. This concept also forms the basis for the M2 examiner training program at the medical school in Ulm. The aims and content are as follows:

  • Confident handling of the relevant formal provisions,
  • Evaluation of the significance as well as the strengths and weaknesses of oral and practical exams,
  • Mastery of “structured oral examinations” (predetermination of topics, tasks, levels of expectation and grading criteria),
  • Awareness and handling of positive and negative factors influencing the reliability and validity of oral and practical exams.

The training sessions took the form of seminars (expert input with interactive elements, individual and group work), along with offering practical elements (test simulations with peer and expert feedback).

Study Goals

It is known from the literature that the limitations of oral examinations can be favorably influenced not only by criteria-oriented selection of the examiners (e.g. subject knowledge coupled with uniformly accepted methods of thought and strategy among co-examiners, specific competence in designing and conducting oral examinations, and constructive team skills in small groups of co-examiners), but also through training and continually monitoring the examiners [14], [17], [18]. Thus, the issue investigated in this study was to identify which positive effects are actually achieved by the M2 examiner training that has been conducted since 2007 at the medical school in Ulm. In terms of the sustainability of the training, a second aim was to analyze whether the positive effects could still be determined after a longer period of time.


Since 2007, all of the examiner training sessions in Ulm have been held under the leadership of the same trainer; this trainer is a physician and holds a master’s degree in Medical Education.

Up until the time of the survey, 367 people had participated in the voluntary training program. All participants were requested by email to take part in a quantitative survey, which was administered using EvaSys version 5.0. Due to the normal fluctuation seen at universities, it was expected that some of the former participants could no longer be reached. The response rate was n=63 persons, of which 32 are deployed as examiners for the oral part of the M2 state examination at the teaching hospitals of the medical school, and 31 persons from the Ulm University Clinic. The random sample was comprised of both physicians working in the operative field (n=28) and those from non-operative disciplines (n=35).

The survey covered a total of 28 items on demographic information and the effectiveness of the training, taking the criteria introduced in a 2012 position paper by the Gesellschaft für Medizinische Ausbildung based on Griffith et al. [1], [6] for verifying the effectiveness of educational measures in medicine [3]. The evaluation of the training’s effectiveness was done using a scale for self-assessment of the acquired competency. The items were specifically designed to elicit information not only on changes in the participants’ attitudes, knowledge and skills, but also on their willingness to include these changes in future oral and practical examinations.

All data was collected and analyzed anonymously. Location and dispersion parameters were calculated, and presuming normal distribution, t tests were performed for independent random samples. A significance of p<0.05 was assumed.


Rate of response

Sixty-three of a total of 367 participants contacted responded to the survey. The frequency of being deployed as an M2 examiner was at least 2-4 examination days per year for the majority of those who took part in the survey (n=63).

Basic effects of M2 training

A basic effect of training was defined as an effect which was rated by at least two-thirds of the participants with a 1 or 2 on a six-point scale (with 1=applicable and 6=not applicable). The survey yielded six such effects; the cumulated frequency of 1 or 2 as the response is given in parentheses (see table 1 [Tab. 1]):

The M2 training helped me to be able to deal more consciously with the strengths and weaknesses of oral examinations (compared to written exams). (71%)
As a result of the M2 training, I have become aware of factors that influence the reliability of oral/practical examinations (formal reliability: “Do all examiners come to the same conclusions?”). (76%)
As a result of the M2 training, I have become aware of factors that influence the validity of oral/practical examinations (validity: “Does my exam cover the skills I wish to test for?”). (75%)
The M2 training has increased my confidence in regard to designing tasks for oral/practical testing. (68%)
The M2 training has increased my confidence in respect to examination formalities. (75%)
When conducting M2 examinations, I strive to implement the concept of “structured oral examinations” (prior definition of topics, tasks, levels of expectation and grading criteria). (86%)

Falling below the two-thirds majority (at 47%) was the aspect concerning increased confidence in the role of examination committee chairman.

Sustainability of the training effects

A desirable characteristic of didactic training is the sustainability of the achieved effects. Hofer et al. [7] have defined the effects of didactic training as sustained if they are still measurable after one year. Since the activities as examiner are exercised much less regularly than the teaching activities studied by Hofer et al., the authors here have chosen a longer timeframe of two years; we have defined sustainability to be when the attained effects can still be detected two years after the completion of the M2 training. Applying a t test for independent random samples, we therefore analyzed the extent to which the responses of the participants whose M2 training took place over two years ago (24 months or longer, n=28) differed from the responses of participants whose M2 training was received less than 2 years ago (1-23 months; n=35). No significant differences arose between the two groups in terms of the six basic training effects. With the methodological limitation that the collection of data did not take place at two points in time, this means that the training effects can be assessed as existing during this period of time and therefore viewed as sustained.

Analysis of the subgroups

In order to discern whether various subgroups of examiners differed in their responses, we performed a t test for independent random samples to analyze the significance of any potential difference of means in the following participant subgroups: participants with relevant prior experience as M2 examiner versus new examiners, and participants from the university teaching hospitals versus participants from the Ulm University Clinic.

Participants with relevant prior experience as M2 examiners versus new examiners

“Relevant prior experience” was defined as experience in administering M2 oral exams that amounted to at least two years and corresponded to a minimum of two to four complete M2 oral examinations or four to eight days of testing. On two of the surveys this item was left unanswered. For the subgroup of participants who did not possess any relevant prior experience as examiner (n=29), the reduction of personal stress levels as examiner in the M2 oral exams was significantly more pronounced than in the subgroup of participants with at least two years of previous experience (n=32). (Mean value for the subgroup without relevant prior experience: 3.1±1.6; mean value for the subgroup with relevant prior experience: 3.9±1.6; t(59)=2.1; p=0.04; mean difference 0.8). Moreover, the subjectively perceived level of confidence in assigning grades was significantly more pronounced in the subgroup without relevant prior experience than in the subgroup with a minimum of two years previous experience. (Mean value for the subgroup without relevant prior experience: 2.2±0.9; mean value for the subgroup with relevant prior experience: 2.9±1.4; t(55)=2.1; p=0.04; mean difference 0.7). In addition, the subgroup without prior experience indicated a significantly stronger influence of the M2 training sessions in regard to critical discrimination in grading. (Mean value for the subgroup without relevant prior experience: 2.5±1.3; mean value for subgroup with relevant prior experience: 3.3±1.4; t(59)=2.1; p=0.04; mean difference 0.8). From this it can be concluded that in addition to the six main training effects identified above, significant positive effects of the examiner training exist in the areas of stress reduction, confidence in grading, and critical discrimination of grade assignment for the new examiners in Ulm.

Participants from the university teaching hospitals versus participants from the Ulm University Clinic

Since examiners from both the Ulm University Clinic and the affiliated university teaching hospitals participate in administering the oral part of the medical exams, we investigated whether or not any significant differences between these two groups arose in terms of the responses given on the survey. It was discovered that none of the surveyed items showed a significant level of difference between the participants from the academic teaching hospitals (n=32) and the participants from the Ulm University Clinic (n=31).

Global assessment of M2 oral exams by the participants

The attitude of the participants toward the oral sections of the M2 exams is clear: 96% of participants are in favor of retaining oral components in the M2 examination. However, 93% of the participants voted to split the examination parts, with the written section of the M2 exam taking place before the fifth year of medical study (“Praktisches Jahr”) and the oral section afterward. This corresponds to the contents of the legislative draft to amend the licensing regulations as it was agreed upon by the Federal Council [,templateld=raw,property=publicationFile.pdf/96-10.pdf].

The overwhelming majority of the participants (86%) recommended participation in the examiner training upon taking up the responsibility of an M2 examiner.

Discussion and Conclusions

The two main questions addressed by this study concerning the effects of the M2 examiner training in Ulm and its sustainability can be answered positively: six basic effects were detected that comply closely with the training goals and contribute to improvement in the quality of oral examinations.

For examiners without relevant prior experience there are additional significant positive effects regarding stress reduction, confidence in grading, and critical discrimination of grade assignment.

Of interest is the fact that the positive responses of the participants do not taper off even if the M2 training occurred more than two years ago. This speaks for the sustainability of the training effects, similar to Hofer et al. finding sustainability in the didactic training conducted in Düsseldorf after a latent period of one year [7].

The benefit to the participants as examination committee chairman was rated much lower in the self-assessments. This is not surprising since the topic of “chief examiner” was not a defined focus of the training. Expansion of the training program to include this aspect should be considered in future. Almost all participants were in favor of retaining oral exam components as part of the M2 examinations, even if these were to be administered at a different time point than the written part of the examination. Another point of near consensus concerned participation in M2 examiner training when first taking up responsibilities as examiner. This emphasizes the subjectively perceived value of oral exams in the measuring and assessment of medical skills as a supplement to the written sections of the M2 state examinations. Even though the participants were not questioned as to why they cast their votes this way, it still must be assumed that not only the strengths of this specific testing format (such as the possibility to directly test practical skills, as well as directly develop analytical and algorithmic problem-solving) are viewed as basic elements for testing competence, but also the design and conduction of this kind of testing are considered as demanding and requiring training. This final aspect could also be one of the reasons for the still relatively low prevalence of oral exams among in the school-specific graded assessments [9].

Whether the participants came from the University Clinic or one of the teaching hospitals had no influence on how the survey questions were answered. Consequently, the continuation of the M2 examiner training program is worthwhile, particularly since the normally high turnover in staff at universities often makes it necessary to recruit “inexperienced” examiners. In response to this, expanding the training program to also include preclinical examiners for the oral parts of the M1 examination, either on the state level within the scope of the Competency Network for Teaching Medicine in Baden-Württemberg or at the medical school level, would appear to be the most sensible approach.

It must be mentioned as a limitation that despite the relatively large target group of over 300 M2 training program graduates, only a little over 60 responded to our online survey. This can be explained in part by what has been identified as “survey exhaustion” resulting from an increasing number of various online questionnaires [15], and also by the fact that the fluctuation among university examiners is relatively high due to job changes. A limited representativity can lead to distortion, for instance through the participation of persons with predominantly positive or negative impressions, meaning those who agree or disagree with either the training or the oral M2 examination, or both of them, more strongly than the average. In respect to this, a survey conducted at the national level may have led to a quantitative increase (when considered absolutely); however, even in this case the risk is present of primarily acquiring overly-engaged examiners (in a positive or negative sense). Qualitatively, this would have, despite high validity, also indicated the limitations of mixing all unique aspects of the concrete training programs specific to each location and trainer, as well as waiving the advantages of very homogenous, standardized study implementation.

A further limitation results from the fact that, at least in terms of the training effects surveyed, the focus is on self-assessment by the participants and not on conducting an intervention or comparative study with the corresponding control groups. Although the quality of self-evaluations in various study scenarios is assessed very differently in the literature, it is favorably influenced by skills training paired with expert feedback (such as in the M2 training sessions here) in terms of conformity between self-evaluation and reality [5], [8], [11].

Due to the high impact of the grade for the oral exam in respect to the overall grade on the medical examination, it cannot be seen as desirable to deploy untrained and/or inexperienced examiners in experimental or control groups in a real case of test taking. As a result, it would only be possible to realize a pre-post study design under simulated conditions.

In conclusion, the present study provides indications that a series of positive effects for oral examinations can be achieved through training the examiners and that these effects demonstrate a certain level of sustainability. Those who profit most strongly are the examiners who are just beginning their activities as examiners.

Competing interests

The Authors declare that they have no competing interests.


Belfield C, Thomas H, Bullock A, Eynon R, Wall D. Meauring effectiveness for best evidence medical education: a disussion. Med Teach. 2001;23(2):164-170. DOI: 10.1080/0142150020031084 External link
Bundesministerium für Gesundheit. Ärztliche Approbationsordnung (ÄAppO) in der Fassung vom 27.6.2002. Bundesgesetzbl. 2002;I:2405.
Fabry G, Lammerding-Köppel M, Hofer M, Ochsendorf F, Schirlo C, Breckwoldt J. Hochschuldidaktische Qualifizierung in der Medizin IV: Messung von Wirksamkeit und Erfolg medizindidaktischer Qualifizierungsangebote: Ein Positionspapier des GMA-Ausschusses Personal- und Organisationsentwicklung für die medizinische Lehre der Gesellschaft für Medizinische Ausbildung sowie des Kompetenzzentrums für Hochschuldidaktik in Medizin Baden-Württemberg. GMS Z Med Ausbild. 2010;27(4):Doc62. DOI: 10.3205/zma000699 External link
Fegert JM, Obertacke U, Resch F, Hilzenbecher M. Die Qualität der Lehre nicht dem Zufall überlassen. Dtsch Arztebl. 2009;106(7):A290–A291.
Gordon MJ. A review of the validity and accuracy of self-assessments in health professions training. Acad Med. 1991;66(12):762-769. DOI: 10.1097/00001888-199112000-00012 External link
Griffith III CH, Georgesen JC, Wilson JF. Six-year documentation of the association between excellent clinical teaching and improved students' examination performances. Acad Med. 2000;75(10 Suppl):S62-S64. DOI: 10.1097/00001888-200010001-00020 External link
Hofer M, Jansen M, Soboll S. Effektive Didaktiktrainings für Dozenten der Medizin. GMS Z Med Ausbild. 2005;22(1):Doc07. Zugänglich unter/available from: External link
Jünger J, Schellberg D, Nikendei C. Subjektive Kompetenzeinschätzung von Studierenden und ihre Leistung im OSCE. GMS Z Med Ausbild. 2006;23(3):Doc51. Zugänglich unter/available from: External link
Möltner A, Duelli R, Resch F, Schultz JH, Jünger J. Fakultätsinterne Prüfungen an den deutschen medizinischen Fakultäten. GMS Z Med Ausbild. 2010;27(3):Doc44. DOI: 10.3205/zma000681 External link
Möltner A, Schellberg D, Jünger J. Grundlegende quantitative Analysen medizinischer Prüfungen. GMS Z Med Ausbild. 2006;23(3):Doc53. Zugänglich unter/available from: External link
Nagler M, Feller S, Beyeler C. Retrospektive Anpassung der Selbsteinschätzung ärztlicher Kompetenzen – Beachtenswert bei der Evaluation praktischer Weiterbildungskurse. GMS Z Med Ausbild. 2012;29(3):Doc45. DOI: 10.3205/zma000815 External link
Richter-Kuhlmann EA. Medizinstudium: Hammerexamen soll bald Geschichte sein. Dtsch Arztebl. 2011;108(40):A-2061/B-1757/C-1741.
Schulze J, Drolshagen S. Format und Durchführung schriftlicher Prüfungen. GMS Z Med Ausbild. 2006;23(3):Doc44. Zugänglich unter/available from: External link
Seyfarth M, Reincke M, Seyfarth J, Ring J, Fischer MR. Neue ärztliche Approbationsordnung und Notengebung beim Zweiten Staatsexamen. Eine Untersuchung an zwei bayerischen medizinischen Fakultäten. Dtsch Arztebl Int. 2010;107(28–29):500–504.
Sheehan K. E-mail Survey Response Rates: A Review. JCMC. 2001;6(2). Zugänglich unter/available from: External link
Van der Vleuten CP. The assessment of professional competence: developments, research and practical implications. Adv Health Sci Educ. 1996;1(1):41-47. DOI: 10.1007/BF00596229 External link
Wakeford R, Southgate L, Wass V. Improving oral examinations: selecting, training and monitoring examiners for the MRCGP. BMJ. 1995;311(7010):931–935. DOI: 10.1136/bmj.311.7010.931 External link
Wass V, Wakeford R, Neighbour R, Vleuten CV. Achieving acceptable reliability in oral examinations: an analysis of the Royal College of General Practitioners membership examination's oral component. Med Educ. 2003;37(2):126–131. DOI: 10.1046/j.1365-2923.2003.01417.x External link