gms | German Medical Science

GMS Journal for Medical Education

Gesellschaft für Medizinische Ausbildung (GMA)

ISSN 2366-5017

Lectures based on cardinal symptoms in undergraduate medicine - effects of evaluation-based interventions on teaching large groups

research article medicine

  • corresponding author Olaf Kuhnigk - Universitätsklinikum Hamburg-Eppendorf, Klinik für Psychiatrie und Psychotherapie, Hamburg, Deutschland; Universitätsklinikum Hamburg-Eppendorf, Prodekanat für Lehre, Hamburg, Deutschland
  • Katja Weidtmann - Universitätsklinikum Hamburg-Eppendorf, Prodekanat für Lehre, Hamburg, Deutschland
  • author Sven Anders - Universitätsklinikum Hamburg-Eppendorf, Institut für Rechtsmedizin, Hamburg, Deutschland
  • author Bernd Hüneke - Universitätsklinikum Hamburg-Eppendorf, Klinik und Poliklinik für Geburtshilfe und Pränatalmedizin, Hamburg, Deutschland
  • René Santer - Universitätsklinikum Hamburg-Eppendorf, Klinik und Poliklinik für Kinder- und Jugendmedizin, Hamburg, Deutschland
  • author Sigrid Harendza - Universitätsklinikum Hamburg-Eppendorf, III. Medizinische Klinik, Hamburg, Deutschland

GMS Z Med Ausbild 2011;28(1):Doc15

doi: 10.3205/zma000727, urn:nbn:de:0183-zma0007272

This is the English version of the article.
The German version can be found at:

Received: January 15, 2010
Revised: August 2, 2010
Accepted: September 23, 2010
Published: February 4, 2011

© 2011 Kuhnigk et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( You are free: to Share – to copy, distribute and transmit the work, provided the original author and source are credited.


Despite critical voices lectures are still an important teaching format in current medical curricula. With the curricular reform at Hamburg Medical Faculty in the year 2004, all subject specific lectures were replaced by cardinal symptom oriented lectures (LSV) in the new clinical curriculum. LSVs are taught throughout all six thematic blocks in years three to five. Since regular student evaluations after each thematic block seemed to demand improvement of the LSVs, this study was carried out using evaluations of individual LSVs by the participating students and by trained auditors (final year students and academic staff). Based on these evaluations feedback containing the individual evaluation data was given in written form to the lecturers combined with information material on planning an LSV using modern didactic techniques. In a second evaluation period, the effects of this intervention were studied. Only small improvements in the LSVs’ quality were noted regarding the level of marks achieved. When individual items were evaluated, especially the didactic quality, significant improvements were noticeable. Overall, on the basis of individual items students ranked the quality of the LSVs significantly higher than trained auditors during the first evaluation period. This effect was no longer seen after the second evaluation period. The inter rater reliability among the auditors was very good. This study shows that regular quality assurance is needed on the structural levels and for staff to accompany the process of embedding teaching formats into curricular concepts. Further investigation is needed to determine the adequate frequency of evaluation and the format of feedback to guarantee sustainable effects of the didactic quality of lectures.

Keywords: Lecture, evaluation, audit, intervention, guideline, didactic skills, faculty development, quality assurance


Lectures as a learning format

The teaching format „lecture“ is, despite criticism and subsequent curricular reforms, still an important didactic element in undergraduate medical education [6]. On the one hand, traditional systematic lectures are criticised for not promoting the development of independent thinking. On the other hand, they provide an opportunity to impart information to groups of learners in an economical and resource-efficient way, to deliver an introduction to complex topics, and to describe current research results and personal, clinical or scientific experiences [5]. To benefit form the potential advantages of lectures as a teaching format they should be blended into a curricular framework [15] and linked with other teaching formats to stimulate students’ learning on their own accord [13]. In this respect, a case based format has proved its value [10].

Cardinal symptom-based lectures in the Hamburg curriculum

A comprehensive reform of the clinical part of the undergraduate medical curriculum at the medical faculty of the University of Hamburg was carried out in 2004 with the main focus of the reformed clinical curriculum in medicine (KliniCuM) being on better integration of the subjects and greater practical educational components [30]. Lessons of curricular years three to five are distributed in six thematic blocks and one elective block and content is organized according to the Hamburg catalogue of learning objectives [29] which depicts the different dimensions of learning and levels of competencies. Systematic and subject-specific lectures were abolished during the reform process and replaced by lectures which are case-based, with their contents geared to the cardinal symptoms of different diseases. The concept of these cardinal symptom-oriented lectures (LSV) is an integral part of KliniCuM which runs through all thematic blocks as a thread and reveals the content links to the other learning sessions (problem based tutorials, bedside teaching). First measurements of participant numbers and evaluation data promise greater student satisfaction with this new lecture format as compared with the lecture format prior to the reform. A need for further improvement was still visible shortly after the reform, yet concrete points of critique were formulated by the students in mostly global commentaries in the evaluations held at the end of every thematic block and remained somewhat unclear [32].

Evaluation as basis for intervention

Students have the ability to evaluate the didactic quality of courses in a valid and reliable way [12], [22]. At the same time there is a high demand to base the evaluation of teaching not only on student judgement [11], [21]. Since the LSV was designed to use the value of the teaching format “lecture” according to the above mentioned criteria [10], [13], we performed this study for quality control. This included a detailed investigation of the critiques and shortcomings mentioned in the evaluations and the observations of the impact of intervention based on this analysis regarding the improvement of this teaching format.


Question and hypotheses

Two questions were investigated in our study. One: Are there noticeable differences in the evaluation of the LSV after an intervention based on previous evaluation results, when the new LSV is evaluated by participating students and trained auditors? Two: Is there a difference between the assessment values of participating students and trained auditors?

The main hypotheses are:

The LSV will receive more positive ratings after the intervention, especially regarding its didactic values.
The evaluation of trained auditors will be more consistent and all in all more critical than the evaluation of the participating students.


Figure 1 [Fig. 1] shows an overview of the study and includes two evaluation phases and one intervention phase. A checklist for the audits of the LSV was designed and validated in a pilot phase [32]. This checklist includes seven items regarding structure and content of the lecture as well as nine items regarding the didactic qualities of the lecturers. The group of auditors comprises eight physicians and scientists and 14 students in the final year of their undergraduate studies. Pairs of auditors (one physician and one student) were allotted to control for possible systemic differences, e.g. diverging perspectives because of status (non-student/student). Prior to the pilot phase, all auditors underwent a three hour training session, including explanation of the instruments and rehearsal of standardised methodical means for the evaluation.

In addition, a questionnaire was developed for the students who participated in the audited LSV. This questionnaire included central aspects of the LSV such as focus on cardinal symptoms, practical applications, and structural design of the lecture as well as items regarding the lecturer, e.g. manner of contact with the students, comprehensibility and clarity of the lecture. Furthermore, characteristics of the students, e.g. gender and continuity of visiting the LSV were also documented. The items in the questionnaires and checklists were rated on a 6-point Likert-scale (1: “I strongly agree” to 6: “I strongly disagree”). In addition, free commentaries were also possible and an overall rating of the lecture following the school grade system was given (1=very good, 2=good, 3=satisfactory, 4=sufficient, 5=poor, 6=deficient). In the pilot, the instruments were found to be feasible and comprehensible. The only minor modifications required for the final study such as readjusting the question sequence and clarification of some items by giving an example in brackets.

Since the pilot phase revealed considerable differences between the two groups of auditors (physicians/students), it was assumed that the initial training of the auditors had not covered all main aspects of the study sufficiently. Therefore, a second training session took place before the first evaluation phase, including a summary of the LSV concept within KliniCuM, a delineation of the orientation along cardinal symptoms and connections between the subjects of a thematic block as well as cardinal symptom-oriented teaching of expert knowledge based upon concrete examples. The concordance of the ratings between the two groups of auditors was determined using the intra-class correlation coefficient (ICC) [34].

Design and sampling

All lecturers participating in the LSV were informed about this study. They were not notified though whether or which of their lectures had been randomly selected for an evaluation. For the first evaluation phase, a randomised representative sample of about one third of the LSVs per thematic block (altogether n=85 from all thematic blocks) was chosen from a total of 247 individual LSVs from all six thematic blocks during the trimester April to July 2006. This approach is classified as “drawing of stratified lots” and was chosen because the population parameters (all LSVs from all thematic blocks) were considered to be very heterogeneous, meaning that the feature characteristics of the basic population could exhibit major differences.

To reproduce all shades of the basic population sufficiently in a random sample, this sample needs to be very large according to the principle of a mere random selection to ensure representativeness. To address this problem, the basic population was divided into disjunctive classes (layers). It was assumed that the elements of each class will behave similarly regarding the research question and elements from different classes would be defined by their different characteristics [8]. In our study the classes are defined by the six thematic blocks. From proportionally layered samples, a random sample was drawn from each class by drawing individual lectures. SPSS 16.0 was used for the statistical evaluation of the data. Means were calculated by t-test for independent samples (level of significance p<0.05). The statistical tests were used descriptively.

Based on the data from the first evaluation phase the following intervention took place. Three groups were identified to receive an intervention: the teachers who were evaluated in the random sample (group 1), all teachers and all heads of departments participating in the LSV who had not yet been evaluated (group 2), students and the general public (group 3). Components of the intervention were:

  • Letters: all members of groups 1 and 2 were sent a personal letter describing background, approach and goals of the project as well as the respective feedback components and a contact person for questions.
  • General feedback: all three groups received the analysis of the data of the first evaluation phase with the non-personal, general statistics.
  • Individual feedback: all members of group 1 received their personal feedback with the rating and the free commentaries of auditors and students.
  • LSV manual: all groups received a manual based on the data of the first evaluation phase as a gold standard for the design of an LSV including concrete hints regarding content and form.
  • Publication: groups 1 and 2 received the paper “Teaching large groups“ [7], a publication about the design of teaching formats for large groups in medical education which contains easy-to-accomplish suggestions for the design of lectures following modern didactic concepts in a very consolidated description.
  • For group 3 information regarding the LSV and the LSV manual were displayed on the homepage of the dean of education’s office in the internet.

All evaluated lectures form the first evaluation phase (n=78, the smaller number is explained by three cancelled audits and four missing lectures) were subdivided in three leagues on the basis of their marks according to German school grades for the second evaluation phase (see table 1 [Tab. 1]). Selection of the LSVs which were to be evaluated was realized by drawing quota-samples [4], meaning a deliberate selection which aspires for the sample to be drawn to simulate conditions of the basic population. In our case the basic population is defined by the six thematic blocks and the three leagues in which the lectures were selected representatively. Since our study particularly focused on the question whether the didactic quality of the LSV improved with the above mentioned intervention, this was operationalized by certain criteria like the students’ evaluation of the LSV by school grades at the respective points in time. Considerations regarding the smallest difference desired in the students’ ratings as well as the power analysis a required sample size of n=633 was identified [20]. In this case the following relevant criteria were achieved: effect size d=0.3 (d=0.1: small effect, d=0.3: medium effect, d=0.5 large effect), minimal difference of the means Δ=+0,255, test power 1-ß=0.8 and α α=0.05. Since on average n=35 student ratings were collected during the first evaluation phase per LSV, n=18 lectures were required for the second evaluation phase. Fourteen lectures of the second evaluation phase were held by the same teachers as in the first evaluation phase. The topics of all 18 LSVs were exactly the same as in evaluation phase 1. To guarantee an even distribution of the measurements on the structure of the basic population one lecture from each of the six thematic blocks and every league was chosen.


Changes in the rating of LSV as per school grades

On the basis of school grades comparisons of the means of the student ratings of the same 18 LSV rated in evaluation phases 1 and 2 five show significant improvements (28%), three significant deteriorations (17%), and ten ratings are found unchanged (55%) (see table 2 [Tab. 2]). Hence, the majority of student evaluations of the LSVs in the second evaluation phase do not reveal changes. The above mentioned criterion of the minimal difference of the means in school grade of Δ=+0,255 is achieved by eight lectures (44%).

The evaluation of the LSV by school grade performed by the auditors does also not reveal an even picture regarding the effect of the intervention (see table 3 [Tab. 3]). The percentage of the improvements matches the above mentioned student ratings of the individual lectures with four of five same LSVs rated as improved. In the first evaluation phase the physician and scientist auditors rated six lectures of 18 by one school grade lower than the student auditors, in the second evaluation phase seven.

Rating of didactics on the basis of individual items

A more differentiated picture compared to the one drawn by the school grades is shown by the comparison of individual items by the auditors from both evaluation phases (see table 4 [Tab. 4]). The second evaluation phase reveals six significant improvements in ratings after the intervention and all other items except for three show a positive trend. In total, the improvements regarding the items “orientation to cardinal symptoms”, “encouragement to follow the general train of thought”, “use of LSV concept”, “interactive design”, “depictive presentation”, and “effort to support successful learning” display large effect sizes. In the first evaluation phase the comparison between student and auditor ratings (see table 5 [Tab. 5]) on the basis of individual items shows statistically significant differences between both groups for almost all items with the auditors rating the lectures more critically than the students. Ratings from the second evaluation phase reveal a significant difference between student and auditor ratings for only one item.

Conformity of auditor ratings

As the calculation of the intra-class correlation coefficient and of the significances show the conformity of the ratings on the basis of individual items within the auditors between group of students in the final year and the group of physicians and scientists lies between iCCmin=-0,030 und iCCmax=0,605 (see table 6 [Tab. 6]). The majority of included items show a significant positive correlation. Compared to the pilot tests [32], which revealed great differences between both groups of auditors especially regarding items referring to the LSV concept, conformity between both groups is very satisfying in the first evaluation phase. In the second evaluation phase the conformity between the two auditors groups lies between iCCmin=-0,022 and iCCmax=0,771 and is also mostly positive significant. The conformity between the two groups of auditors can be assessed as moderately high, the intra-class correlation coefficients display a quite broad spreading.


The results of the audits and the student evaluations during the first evaluation phase reveal a picture of the LSV that overall is more positive, as expected from the results of the previous student evaluations at the end of the thematic blocks. This could indicate actual improvement. Yet it has to be taken into account that retrospective and integrated evaluations have a tendency towards worse results compared with evaluations which are performed directly after a course [31]. Hence, the observation of improvement could have been caused by a methodological effect. On the basis of school grades, the hypothetical improvement of the overall LSV rating could only be noted to a moderate degree in the second evaluation phase. The postulated criterion for improvement was only achieved in 44% of the LSVs rated by the students while an improvement in the auditor ratings was found in only 28% of the LSVs. In contrast, the auditors’ ratings on the basis of individual items show mostly more positive evaluations, especially with regard to the didactic skills of the teachers in the second evaluation phase. A weakness can be seen in the small total number of lectures which is counterbalanced partly by the initial drawing of the random sample.

The chosen criterion for change – the school grade given – represents a relatively abstract measure. It can be assumed that this measure contains too little differentiation to reveal potential differences in the LSV after the intervention, since the construct “teaching quality” is a complex item [14]. The loss of information by using school grades could also be confirmed by the discrepancy within the group of assessors as far as the summative parameter of the school grade and the simultaneously rated individual items are concerned, which showed a clear improvement. During the first evaluation phase the trained assessors rated the LSV in almost all items significantly more critically compared to the participating students as was assumed in hypothesis 2. In the second evaluation phase the auditors’ assessment concerning the single items turned out to be considerably better compared to the students’ ratings. On one hand this could mean that an improvement of the didactic quality of the LSV had indeed taken place which was then observed and rated by the trained auditors in a more differentiated way. On the other hand the possible influence of the Rosenthal-effect must be taken into account [24], where the mere expectation of an improvement of the LSV after the intervention by the auditors could have led to a better rating. However, the assignment of trained auditors has been described as being a valid and research-oriented instrument for the rating of teaching quality [1], [17]. Others also found only moderate accordance of student and “peer-ratings“ [16]. In the second evaluation phase the rating differences are less prominent which could be suggestive of a more homogenous base for the ratings according to school grade. The high inter-rater reliability hereby verified supports the validity of the data [34].

Furthermore, there is a need to analyse whether the intervention chosen for this project was potent enough to improve the LSV. Since there is no evidence in the literature that student evaluation alone improves university teaching [23], [28], an intervention beyond the mere feedback of the evaluation data was chosen for this study. Yet the feedback to the targeted group was only given in written format. Other studies show that written feedback is rarely read by the teachers and therefore may have hardly any effect [9]. More effective improvements could be reached by other interventions, e.g. didactic skill enhancing counselling [23], [33] or direct discussions with teachers about the evaluation results [2]. Feedback given as early as possible can improve the possibility of a positive effect on the teachers [26]. In our study the time between data collection and feedback was comparatively long with up to four months. On the other hand the written personal feedback was, as described in the methods section, clearly edited and illustrated in detail. It is known that written feedback of evaluations without explanations is often not correctly interpreted by the teachers and hence not understood and without effect [2]. Another influencing factor for the rather weak effect of the intervention could be down to the fact that the LSV is a multi-instructor-event with a total of approximately 150 teachers in six thematic blocks. Such a format hosts special difficulties for the realization of changes or improvements compared with courses which are taught be only a few or even a single person [26]. Furthermore it is known, that provided information or counselling of teachers in evaluation projects is less called upon if teachers are not interested or unwilling to improve their didactic skills [18].

Another important aspect for the less than dramatic effect of the intervention could be assumed to lie in the inactivity inherent to the system of faculties when it comes to the realization of curricular innovations [27]. Additionally, until the intervention during this study the concept for the LSV did not exist in a written format and was sent to the teachers during the curricular planning. With that the factor “communication within the faculty”, which has a major impact during planning processes [3], was not regarded with enough attention when the new curriculum was implemented. It would be better to introduce a training procedure that acquaints all teaching personnel involved in the LSV with its concept [3]. In a subsequent survey of the quality of the LSV teachers and students should be involved to gain an acceptance as high as possible within the faculty [20]. To improve the overall effectiveness of courses the have to be integrated in a general procedure to measure and support the quality of teaching and research, since the evaluation of teaching quality alone is not sufficient for its improvement [25].

Summary and outlook

Our study demonstrated that the evaluation of the newly established LSV concept revealed didactic improvements after an intervention as well in the student ratings as in the ratings of trained auditors. These improvements were more notable on the basis of individual items regarding the teachers or the concept rather than on the basis of school grades awarded. Students rated the LSV altogether more positive than auditors who showed a good inter-rater reliability. Apparently, a three hour training session for the auditors is not sufficient to prepare them adequately for their role as givers of analysing feedback. Furthermore, it has to be taken into account, that the generalisability of our results is somewhat reduced because of the choice of a random sample with only 18 lectures in the second evaluation phase due to the methodology chosen. The necessity of a better integration of the LSV in the global concept of the curriculum regarding content and structure with regular quality control is visible in this study. How long the effects of feedback after an evaluation last within the target group needs to be studied in further projects. Mere written information about the lecture design according to modern didactic criteria seems to be an insufficient stimulus for intervention to many teachers to improve or change their lectures. Furthermore, it needs to be checked which effects occur in the indirectly affected group of students, e.g. effects on their motivation or learning success.


We thank the Medical Faculty of Hamburg University for supporting this project (L-107/2006) from their teaching funds.

Competing interests

The authors declare that they have no competing interests.


Albanese MA, Schuldt SS, Case D, Brown D. The validity of lecturer ratings by students and trained observers. Acad Med. 1991;66(5):26-28. DOI: 10.1097/00001888-199101000-00008 External link
Baggott J. Reaction of lecturers to analysis results of student ratings of their lecture skills. J Med Educ. 1987;62:491-496.
Bland CJ, Starnaman S, Wersal L, Moorhead-Rosenberg L, Zonia S, Henry R. Curricular change in medical schools: how to succeed. Acad Med. 2000;75(6):575-594. DOI: 10.1097/00001888-200006000-00006 External link
Bortz J, Döring N. Forschungsmethoden und Evaluation. Berlin: Springer; 2006.
Brown G, Manogue M. AMEE Medical Education Guide No. 22: Refreshing lecturing: a guide for lecturers. Med Teach. 2001;23(3):231-244. DOI: 10.1080/01421590120043000 External link
Butler JA. Use of teaching methods within the lecture format. Med Teach. 1992;14(1):11-23. DOI: 10.3109/01421599209044010 External link
Cantillon P. Teaching large groups. BMJ. 2003;326:437-440.
Clauß G, Ebner H. Grundlagen der Statistik für Psychologen, Pädagogen und Soziologen. Thun/Frankfurt a. M.: Harri Deutsch; 1977.
Cohen PA. Effectiveness of student-rating feedback for improving college instruction: a meta-analysis of findings. Res High Educ. 1980;13(4):321-341. DOI: 10.1007/BF00976252 External link
Copeland H, Longworth D, Hewson M, Stoller J. Successful lecturing. A prospective study to validate attributes of the effective medical lecture. J Gen Intern Med. 2000;15(6):366–371. DOI: 10.1046/j.1525-1497.2000.06439.x External link
Craig M. Facilitated student discussions for evaluating teaching. SIGCSE Bulletin. 2007;39(1):190-194. DOI: 10.1145/1227504.1227376 External link
Diehl JM. Normierung zweier Fragebögen zur studentischen Beurteilung von Vorlesungen und Seminaren. Psychol Erz Unterr. 2003;50:27-42.
Fyrenius A, Bergdahl B, Silén C. Lectures in problem-based learning - why, when and how? An example of interactive lecturing that stimulates meaningful learning. Med Teach. 2005;27(1):61-65. DOI: 10.1080/01421590400016365 External link
Gordon PA. Student evaluation of college instructors: an overview. Valdosta: Valdosta State University; 1997. Zugänglich unter/available under: External link
Grass G, Stosch C, Griebenow R. Renaissance der Vorlesung. Dtsch Ärztebl. 2005;102(23):A1642.
Greenwood GE, Ramagli HJ. Alternatives to student ratings of college teaching. J High Educ. 1980;51(6):673-684. DOI: 10.2307/1981172 External link
Imseis HM, Galvin SL. Faculty and resident preference for two different forms of lecture evaluation. Am J Obstet Gynecol. 2004;191(5):1815-1821. DOI: 10.1016/j.ajog.2004.07.068 External link
Irby D, DeMers J, Scher M, Matthews D.A model for the improvement of medical faculty lecturing. J Med Educ. 1976;51(5):403-409.
Leppek R, Jußen M, Berthold D, Sulzer J, Klose KJ. Windmühlenprinzip versus Uhrwerkprinzip - Tradition und Interaktion in der akademischen Vorlesung. Z Ärztl Fortbild. 1996;90:406-413.
Moßig I. Stichproben, Stichprobenauswahlverfahren und Berechnung des minimal erforderlichen Stichprobenumfangs. Gießen: Universität Gießen;1996.
Reed M. Electronic module evaluation: combining quality with quantity. Kongressbeitrag University of Leeds Inaugural Learning and Teaching Conference. Leeds: University of Leeds; 2004. Zugänglich unter/available under: External link
Rindermann H. Methodik und Anwendung der Lehrveranstaltungsevaluation für die Qualitätsentwicklung an Hochschulen. Sozialwis Berufspraxis. 2003;26(4):401-413.
Rindermann H. Quality of instruction improved by evaluation and consultation of instructors. Int J for Acad Develop. 2007;12(2):73-85. DOI: 10.1080/13601440701604849 External link
Rost DH. Handwörterbuch der Pädagogischen Psychologie. Weinheim: Beltz; 2001.
Schmidt B. Warum oft wirksam? Und warum manchmal wirkungslos? – Subjektive Erklärungen zur Wirkung von Lehrveranstaltungsevaluation aus der Sicht von Nutzern und Anbietern. Z Eval. 2008;7(1):7-33.
Stillman PL, Gillers MA, Heins M, Nicholson G, Sabers D. Effect of immediate student evaluations on a multi-instructor course. J Med Educ. 1983;58:172-178.
Sukkar MY. Curriculum development: a strategy for change. Med Educ. 1986;20:301-306. DOI: 10.1111/j.1365-2923.1986.tb01369.x External link
Turhan K, Yaris F, Nural E. Does instructor evaluation by students using a web-based questionnaire impact instructor performance? Adv Health Sci Educ. 2005;10(1):5-13. DOI: 10.1007/s10459-004-0943-7 External link
Universität Hamburg. Hamburger Lernzielkatalog. Hamburg: Universität Hamburg; 2009. Zugänglich unter/available under: External link
van den Bussche H, Anders S, Ehrhardt M, Göttsche T, Hüneke B, Kohlschütter A, Kothe R, Kuhnigk O, Neuber K, Rijntjes M, Quellmann C, Harendza S. Lohnt sich eine Reform der klinischen Ausbildung? - Die Qualität des Hamburger Curriculums unter der alten und der neuen Approbationsordnung im Vergleich. Z Ärztl Fortbild Qualitätssich. 2005;99:419-423.
van den Bussche H, Weidtmann K, Kohler N, Frost M, Kaduskiewicz H. Evaluation der ärztlichen Ausbildung: Methodische Probleme der Durchführung und der Interpretation von Ergebnissen. GMS Z Med Ausbild. 2006;23(2):Doc37. Zugänglich unter/available under: External link
Weidtmann K. Analyse des Status quo der Leitsymptom-Vorlesung und Planung einer evaluationsbasierten Intervention an der Medizinischen Fakultät Hamburg. Unveröffentlichte Projektarbeit im Studiengang Master of Medical Education. Heidelberg: Medizinische Fakultät Heidelberg; 2007.
Wilson RC. Improving faculty teaching: Effective use of student evaluations and consultants. J High Educ. 1986;57(2):196-211. DOI: 10.2307/1981481 External link
Wirtz M. Bestimmung der Güte von Beurteilereinschätzungen mittels der Intraklassenkorrelation und Verbesserung von Beurteilereinschätzungen. Rehabilitation. 2004;43:384-389. DOI: 10.1055/s-2003-814935 External link