gms | German Medical Science

GMS Journal for Medical Education

Gesellschaft für Medizinische Ausbildung (GMA)

ISSN 2366-5017

The Questionnaire "SFDP26-German": a reliable tool for evaluation of clinical teaching?

research article medicine

  • corresponding author Peter Iblher - Universität zu Lübeck, Klinik für Anästhesiologie, Lübeck, Deutschland; Universität Witten/Herdecke, Fakultät für Gesundheit, Institut für Didaktik und Bildungsforschung im Gesundheitswesen (IDBG), Witten, Deutschland
  • author Michaela Zupanic - Universität Witten/Herdecke, Fakultät für Gesundheit, Institut für Didaktik und Bildungsforschung im Gesundheitswesen (IDBG), Witten, Deutschland
  • author Christoph Härtel - Universität zu Lübeck, Klinik für Kinder- und Jugendmedizin, Lübeck, Deutschland
  • author Hermann Heinze - Universität zu Lübeck, Klinik für Anästhesiologie, Lübeck, Deutschland
  • author Peter Schmucker - Universität zu Lübeck, Klinik für Anästhesiologie, Lübeck, Deutschland
  • author Martin R. Fischer - Universität Witten/Herdecke, Fakultät für Gesundheit, Institut für Didaktik und Bildungsforschung im Gesundheitswesen (IDBG), Witten, Deutschland

GMS Z Med Ausbild 2011;28(2):Doc30

doi: 10.3205/zma000742, urn:nbn:de:0183-zma0007429

This is the English version of the article.
The German version can be found at:

Received: November 30, 2010
Revised: February 7, 2011
Accepted: March 23, 2011
Published: May 16, 2011

© 2011 Iblher et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( You are free: to Share – to copy, distribute and transmit the work, provided the original author and source are credited.


Aims: Evaluation of the effectiveness of clinical teaching is an important contribution for the quality control of medical teaching. This should be evaluated using a reliable instrument in order to be able to both gauge the status quo and the effects of instruction. In the Stanford Faculty Development Program (SFDP), seven categories have proven to be appropriate:

Establishing the Learning Climate,
Controlling a Teaching Session,
Communication of Goals,
Encouraging Understanding and Retention,
Feedback and
Self-directed Learning.

Since 1998, the SFDP26 questionnaire has established itself as an evaluation tool in English speaking countries. To date there is no equivalent German-language questionnaire available which evaluates the overall effectiveness of teaching.


Development and theoretical testing of a German-language version of SFDP26 (SFDP26-German),
Check the correlation of subscale of SFDPGerman against overall effectiveness of teaching.

Methods: 19 anaesthetists (7 female, 12 male) from the University of Lübeck were evaluated at the end of a teaching seminar on emergency medical care using SFDP-German. The sample consisted of 173 medical students (119 female (68.8%) and 54 male (31.2%), mostly from the fifth semester (6.6%) and sixth semester (80.3%). The mean age of the students was 23±3 years.

Results: The discriminatory power of all items ranged between good and excellent (rit=0.48-0.75). All subscales displayed good internal consistency (α=0.69-0.92) and significant positive inter-scale correlations (r=0.40-0.70). The subscales and “overall effectiveness of teaching” showed significant correlation, with the highest correlation for the subscale “communication of goals (p< 0.001; r = 0.61).

Conclusion: The analysis of SFDP26-German confirms high internal consistency. Future research should investigate the effectiveness of the individual categories on the overall effectiveness of teaching and validate according to external criteria.

Keywords: SFDP26, clinical teaching, teaching effectiveness, evaluation, questionnaire


Introduction of a new method

The improvement of teaching undoubtedly has a central role in German medical universities [1]. Raising awareness in the teaching institutes and clinics through incentive programs, such as performance-based funding (LOM) and teaching awards play essential role to this end [2]. Feedback systems, which give feedback to the teaching staff and departments on the teaching performance using standardised evaluation tools are an essential prerequisite for reproducible qualitative assessment of teaching performance. The discussion, especially in technical committees, contributes to the fact that in addition to research patient care, the formerly somewhat neglected field of teaching moves the fore and is perceived as a location advantage for team motivation and recruitment of young talent [1], [3], [4].

The establishment of learning objectives-based curricula and the use of appropriate forms of examination [5], the training of teachers undoubtedly plays a crucial role [6], [7]. A survey of doctors at the clinic of the University of Lübeck found that the perception of good teaching was almost always associated with exceptional individuals who were perceived as good teachers for a wide variety of reasons [8]. Despite this diversity, using definitions of “non-cognitive attributes” of a good teacher, such as relationship skills, mood and personality, established by Sutkin et al it was possible to structure these [9]. This raises the question of which educational competencies are particularly relevant for good and effective teaching performance. Number and acceptance of local and national training courses have increased significantly in recent years (Train-the-trainer concepts such as the National Master of Medical Education Programme of the Day of Medical Faculties (MFT) at the University of Heidelberg [10]) and has found its way into many post-doctoral training programs. However, it is often the case that providing training in teaching skills is particularly neglected in case of physicians working in clinical teaching, leading them to acquire these on their own through role models, experience and personal initiative [7], [11]. This means there are indeed opportunities to negatively impact the academic performance of students due to methodological, rhetorical or didactic weaknesses (in spite of subject-specific knowledge regarding course content) [12]. Consequently this means that

specific teaching skills of staff should be made accessible to review to identify weaknesses if necessary and improve such through training and
to study which categories of teaching performance were particularly relevant to good teaching in student evaluation.

It would therefore be desirable to have an established instrument which maps crucial areas of teaching competence, revealing individual strengths and weaknesses in teaching and which also can be used in the context of educational research projects. To our knowledge, to date there is no German-language instrument which meets these requirements.

Stanford Faculty Development Program (SFDP)

In terms of a faculty development program, Stanford University, Palo Alto, U.S. established the training program (SFDP), based on learning theory, in the early eighties. Under this program, teaching staff were observed in class and their strengths and weaknesses in teaching grouped together under seven core competencies:

Establishing the learning climate,
Control of session,
Communication of goals,
Facilitating understanding and retention,
Promoting self-directed learning.

This is one of the early develpment programs [13], [14] and hails from a conceptual framework which was developed from the late 1970s onwards under Dr. Kelley Skeff, an internal specialist at Stanford University and Dr. Georgette Stratos as part of their research on improving teaching. He initially filmed hundreds of hours of classroom teaching and correlated certain teaching methods and teacher behaviours with student assessments regarding their teaching effectiveness. In the course of numerous studies, a faculty development program was developed which made it possible for the faculty to analyse its teaching and at the same time improve the teaching competence of teachers in through standardised events [15], [16], [17], [18], [19], [20], [21], [22], [23], [24]. As part of this program, a reliable and dependable English-language instrument tool of teaching performance was developed on the basis of seven categories (SFDP26) and which has been widely introduced to training programs and medical educational research in the English speaking world [16],


This study investigates two aspects of the SFDP26 questionnaire:

Developing and theoretical testing of the German language version of SFDP26 (SFDP26-German) to review teaching skills using item and reliability analyses to determine internal consistency (Cronbach’s α);
Correlation analyses between the subscales and overall teaching effectiveness to verify the relevance of the individual subscales.


The SFDP26 questionnaire consists of 25 items in seven subscales

Establishing the learning climate,
Control of session,
Communication of goals,
Facilitating understanding and retention,
Promoting self-directed learning and

one item on “Overall teaching effectiveness” as a global measure for teaching.

They are measured on a five-point Likert scale for item 1-25 (1-5=No-Somewhat-Yes) and for the item “Overall teaching effectiveness” (1=very weak to 5=excellent). The original English questionnaire SFDP26 was first translated from English into German by native speakers and then translated back into English (see Table 1 [Tab. 1]). Seven female and twelve male physicians of the Clinic of Anaesthesiology at the University of Lübeck who were actively teaching in the emergencies course for students of the summer semester 2009 were evaluated at the end of their course by students using the questionnaire SFDP26-German. The sample was composed of 173 medical students, mostly from the fifth (8.7%) and sixth semester (80.3%) (6.6% higher semesters , 4.6% unspecified) who were randomly assigned to the instructors in course groups. The average age of the 119 women and 54 men was 23±3 years. The Ethics Committee of the University of Lübeck had no ethical concerns regarding this study. Both students and teachers were informed in advance about the study and had the opportunity to refuse participation in this study. Consent was taken as implied through action. All data was analysed anonymously. To determine the item parameters, statistical analyses of mean, standard deviation and selectivity (after Pearson) were performed. The selectivity coefficient is, after Lienert, the correlation of tasks answered with the total value of the scale and a parameter for the degree of differentiation between an individual’s agreement with the item with that of the scale as a whole [25]. In this, separation efficiencies are r' over 0.3 are seen as good, of 0.2-0.3 as acceptable, of 0.1-0.2 as marginal and as bad if below 0.1 [26]. To determine the scale parameters and inter-correlations, scale sums, standard deviation, uniformity coefficients (Cronbach's α) and corrected inter-scale correlations according to Pearson (r) were calculated using SPSS software Version 18 (SPSS Inc., Chicago, USA). For group comparison tests of reliability, Cronbach's α above 0.6 can be described as adequate and from 0.8 as good (27). Error probability of <5% (p <0.05) was taken as significant, <0.1% (p <0.001) as highly significant.


Theoretical Review of SFDP26-German

The results of the item analyses are shown in Table 1 [Tab. 1]. The most support was awarded to items 1 (“He/she listened to the students”) and 3 (“He/she was respectful toward students”) of the “Learning environment” scale with average values of M=4.8±0.7 and 4.7±0.7. Items 25 (“He/she encouraged students to read up on topics outside of class”) of the “self-directed learning” scale and 15 (“He/she checked prior medical knowledge of students”) of the “Evaluation” scale least highly rated with mean values of M=3.4±0.9 and 3.7±1.2.

In no case the discriminatory power of the items was below the critical value of 0.20, but invariably in the good to excellent range (r≥0.60).

The results of the scale analyses are shown in Table 2 [Tab. 2]. Students gave their biggest endorsement on the “Learning environment” scale with a mean of 4.6±0.6. The endorsement was lowest on the “Self-directed learning” scale with a mean of 3.6±1.1. On almost all scales, high internal consistency was evident as a sign of reliability which can be evaluated as good (Cronbach's α: 0.80-0.89). The scale “Understanding and retention” showed an average internal consistency (α=0.69) and the “Feedback” scale was in the very good range (α=0.92). Highly significant correlations (r) of positive proportionality (see Table 3 [Tab. 3]) were found between all scales of the German-language SFDP. The highest correlation was observed between the scales “Learning environment” and “Communication of goals” (r=0.73), and between “Understanding and retention” and “Communication of goals” (r=0.72). The shared variance (r2 x 100) of the two scales was 53% and 52%. The smallest but still significant positive correlations were found between the scales “Self-directed learning” and “Control of session” (r=0.21) and “Learning climate” (r=0.28). The shared variance here was 4% and 8%.

Correlations of subscales with global rating (Overall teaching effectiveness)

The correlations of the subscales with the global overall assessment of teaching performance (Overall teaching effectiveness) are shown in Table 2 [Tab. 2]. All correlations were highly significant (p<0.001). Here, the highest correlations with overall teaching effectiveness were found for the subscales “Communication of goals” (r=0.61), “Facilitating understanding and retention” (r=0.58) and “Learning climate” (r=0.51).


SFDP26 is in fact a well-researched evaluation instrument [16], [28], so transfer of the English SFDP26 to the German-speaking countries makes sense and is an obvious step. The reliability of such an investigative tool is a crucial quality criterion which is described by the reliability. The results of this study confirms the high internal consistency of the German SFDP26, so the reliability of this questionnaire can be regarded as given. Selectivity throughout was good so it was possible to capture the different aspects of staff teaching performance well. Furthermore, all subscales showed significant correlations of positive proportionality to the overall teaching effectiveness. This can be interpreted as meaning that the subscales are relevant aspects of overall teaching effectiveness. Beckman and colleagues, using a questionnaire based on SFDP but with new 16 items (Mayo Teaching Evaluation Form/MTEF-28), investigated the evaluation of teaching performance of junior doctors in clinical teaching at the Mayo Clinic [30]. For the subscales “Control of session,” “Facilitating understanding and retention” and “Feedback,” low internal consistencies (α: 0.147, 0.570, 0.648) were found. Litzelman and colleagues, who also used the original SFDP26 to evaluate teaching performance by junior doctors found high internal consistencies (α> .85) for all scales [17]. This indicates that SFDP26 is a very reliable evaluation instrument, whose reliability may indeed be reduced by adding additional items, as with Beckmann and colleagues. In a study by Williams and colleagues, the subscales of SFDP26 were compared with the “Global Rating Scale” of the University of Michigan (“Please rate the educational value of time with an attending physician.”), showing high correlations (r:0.86-0.98) [29], with the highest in the subscales “Communication of goals”, “Facilitating understanding and retention”, “Evaluation”, “Feedback” und “Promoting self-directed learning” (0.94-0.98). These results are surprising as this means that there is an almost totally positive linear correlation between the scales; and between the subscales of SFDP26 and a global “single-item” scale. The correlations of the SFDP26 subscales with the total teaching effectiveness, and between the SFDP26 overall teaching effectiveness and the “Global Rating Scale” of the University of Michigan (validity measure) are unfortunately not shown.

The scale “Communication of goals” in this study on the German-language SFDP26 most highly correlated with the overall teaching effectiveness (r=0.61), 37% of the variance was explained. This means that when considering all subscales of the German-language SFDP26 the strongest association is between this scale and the overall teaching effectiveness. The issue of causality cannot be clarifed in this study , however, the high correlation indicates that the scale “Communication of goals” is a relevant determinant and could have a major impact on the overall teaching effectiveness. The question about the importance of the individual subscales of the German-language SFDP26 on effective teaching ability should therefore be examined in more detail in future studies, for example to enable the development of targeted training measures for teaching effectiveness. It would be beneficial, in addition, to focus on the actual measurable gains in student competence which cannot be answered solely using the questionnaire. The comparison of the examination success of students and their performance in the clinical context as external criteria with the results of the German-language SFDP26 would then also allow conclusions about the validity of the questionnaire.

It should be mentioned critically that SFDP26 has no arrangements to prevent that response tendencies from students occur, e.g. that only the same number sequence is ticked. In our research we have found no indication that this could play a role but this issue needs to be critically evaluated in analyses. Also, the fact that the data will be collected only by means of quantitative elements must be discussed critically. Although the analyses confirm the high reliability of the SFDP26 scales, adding qualitative evaluation elements, such as the option of free text answers, could make sense to allow more detailed assessments. The arrangement of the stages on the Likert scale from one to five is also not without problems. While the captions of the full scale (“No - Somewhat - Yes”) provide some guidance, it is nonetheless possible that German students will unconsciously follow the school grading system used in German-speaking countries, leading to an accidental reversal when checking (i.e. 1=very good, etc.). This could be avoided by reversing the labelling. Despite the good results of this study, we should qualify that this is an investigation of a special collective amongst students of medicine. Studies on the application of the German-language SFDP26 in different subjects and at different points in time are therefore required to verify its general applicability.


This study for the first time investigates the German-language version of SFDP26 in clinical teaching of medical students and certifies that the procedure is sufficiently reliability for use in student evaluation of teaching. Follow-up studies in other clinical areas are necessary to illuminate the importance of the subscales and validity of the questionnaire by comparison with external criteria.


The authors would like to thank all the teaching staff and the students who participated in this study, in particular Joe Letkeman for the translation of the questionnaire.

Competing interests

The authors declare that there are no financial or other types of conflict of interest.


Wissenschaftsrat. Empfehlungen zur Qualitätsverbesserung von Lehre und Studium. Berlin: Wissenschaftsrat; 2008; Zugänglich unter/available from: External link
Wissenschaftsrat. IV.2. Anerkennung von besonderen Leistungen in der Lehre. In: Empfehlungen zur Qualitätsverbesserung von Lehre und Studium. Köln: Geschäftsstelle des Wissenschaftsrates; 2008.
Hahn EG, Fischer MR. Nationaler Kompetenzbasierter Lernzielkatalog Medizin (NKLM) für Deutschland: Zusammenarbeit der Gesellschaft für Medizinische Ausbildung (GMA) und des Medizinischen Fakultätentages (MFT). GMS Z Med Ausbild. 2009;26(3):Doc35. DOI: 10.3205/zma000627 External link
Lammerding-Köppel M, Fabry G, Hofer M, Ochsendorf F, Schirlo C. Hochschuldidaktische Qualifizierung in der Medizin: I. Bestandsaufnahme: Ein Positionspapier des GMA-Ausschusses Personal- und Organisationsentwicklung für die medizinische Lehre der Gesellschaft für Medizinische Ausbildung sowie des Kompetenzzentrums für Hochschuldidaktik in Medizin Baden-Württemberg. GMS Z Med Ausbild. 2006;23(4):Doc73. Zugänglich unter/available from: External link
Fischer MR, Gesellschaft für Medizinische Ausbildung, GMA-Ausschuss Prüfungen, Kompetenzzentrum Prüfungen Baden-Württemberg. Leitlinie für Fakultätsinterne Leistungsnachweise während des Medizinstudiums: Ein Positionspapier des GMA-Ausschusses Prüfungen und des Kompetenzzentrums Prüfungen Baden-Württemberg. GMS Z Med. 2008;25(1):Doc74. Zugänglich unter/available from: External link
Geraci SA, Kovach RA, Babbott SF, Hollander H, Buranosky R, Devine DR, Berkowitz L. AAIM Report on Master Teachers and Clinician Educators Part 2: faculty development and training. Am J Med. 2010;123(9):869-872 e6.
McLean M, Cilliers F, Van Wyk JM. Faculty development: yesterday, today and tomorrow. Med Teach. 2008;30(6):555-584. DOI: 10.1080/01421590802109834 External link
Iblher P. Lernen zu Lehren – Implementierung eines vorklinischen Wahlfaches zur frühen Förderung von Lehrkompetenzen an der Universität zu Lübeck-Unveröffentlichte Projektarbeit. Lübeck: Universität zu Lübeck; 2008.
Sutkin G, Wagner E, Harris I, Schiffer R. What makes a good clinical teacher in medicine? A review of the literature. Acad Med. 2008;83(5):452-466. DOI: 10.1097/ACM.0b013e31816bee61 External link
Jünger J, Fischer MR, Duelli R, Putz R, Resch F. Implementierung und Evaluation eines interfakultären Master of Medical Education Programms. Z Evid Fortbild Qual Gesundhwes. 2008;102(10):620-627.
Busari JO, Prince KJ, Scherpbier AJ, Van Der Vleuten CP, Essed GG. How residents perceive their teaching role in the clinical setting: a qualitative study. Med Teach. 2002;24(1):57-61. DOI: 10.1080/00034980120103496 External link
Morrison EH, Hafler JP. Yesterday a learner, today a teacher too: residents as teachers in 2000. Pediatrics. 2000;105(1 Pt 3):238-241.
Searle NS, Thompson BM, Friedland JA, Lomax JW, Drutz JE, Coburn M, Nelson EA. The prevalence and practice of academies of medical educators: a survey of U.S. medical schools. Acad Med. 2010;85(1):48-56. DOI: 10.1097/ACM.0b013e3181c4846b External link
Steinert Y, Mann K, Centeno A, Dolmans D, Spencer J, Gelula M, Prideaux D. A systematic review of faculty development initiatives designed to improve teaching effectiveness in medical education: BEME Guide No. 8. Med Teach. 2006;28(6):497-526. DOI: 10.1080/01421590600902976 External link
Litzelman DK, Stratos GA, Marriott DJ, Lazaridis EN, Skeff KM. Beneficial and harmful effects of augmented feedback on physicians' clinical-teaching performances. Acad Med. 1998;73(3):324-332. DOI: 10.1097/00001888-199803000-00022 External link
Litzelman DK, Stratos GA, Marriott DJ, Skeff KM. Factorial validation of a widely disseminated educational framework for evaluating clinical teachers. Acad Med. 1998;73(6):688-695. DOI: 10.1097/00001888-199806000-00016 External link
Litzelman DK, Westmoreland GR, Skeff KM, Stratos GA. Factorial validation of an educational framework using residents' evaluations of clinician-educators. Acad Med. 1999;74(10 Suppl):S25-S27.
Skeff KM. Evaluation of a method for improving the teaching performance of attending physicians. Am J Med. 1983;75(3):465-470. DOI: 10.1016/0002-9343(83)90351-0 External link
Skeff KM, Campbell M, Stratos G, Jones HW, 3rd, Cooke M. Assessment by attending physicians of a seminar method to improve clinical teaching. J Med Educ. 1984;59(12):944-950.
Skeff KM, Stratos G, Campbell M, Cooke M, Jones HW. 3rd. Evaluation of the seminar method to improve clinical teaching. J Gen Intern Med. 1986;1(5):315-322. DOI: 10.1007/BF02596211 External link
Skeff KM, Stratos GA, Bergen MR, Regula DP, Jr. A pilot study of faculty development for basic science teachers. Acad Med. 1998;73(6):701-704. DOI: 10.1097/00001888-199806000-00018 External link
Skeff KM, Stratos GA, Bergen MR, Sampson K, Deutsch SL. Regional teaching improvement programs for community-based teachers. Am J Med. 1999;106(1):76-80. DOI: 10.1016/S0002-9343(98)00360-X External link
Skeff KM, Stratos GA, Berman J, Bergen MR. Improving clinical teaching. Evaluation of a national dissemination program. Arch Intern Med. 1992;152(6):1156-1161.
Skeff KM, Stratos GA, Mygdal W, DeWitt TA, Manfred L, Quirk M, Roberts K, Greenberg L, Bland CJ. Faculty development. A resource for clinical teachers. J Gen Intern Med. 1997;12(Suppl 2):S56-S63. DOI: 10.1046/j.1525-1497.12.s2.8.x External link
Lienert GA, Raatz U. Berechnung von Schwierigkeitsindex, Trennschärfenkoeffizient und Aufgabeninterkorrelation. In: Lienert GA, Raatz U, editors. Testaufbau und Testanalyse. Weinheim: Beltz, Psychologie-Verl.-Union; 1994. S.73-113.
Möltner A, Schellberg D, Jünger J. Grundlegende quantitative Analysen medizinischer Prüfungen. GMS Z Med Ausbild. 2006;23(3):Doc53. Zugänglich unter/available from: External link
Bortz JD, Döring N. Hypothesengewinnung und Theoriebildung. In: Bortz JD, Döring N (Hrsg). Forschungsmethoden und Evaluation für Human- und Sozialwissenschaftler. Berlin, Heidelberg, New York, Tokio: Springer; 2006. S.355–936.
Litzelman DK, Westmoreland GR, Skeff KM, Stratos G. Student and resident evaluations of faculty-how dependable are they? Acad Med. 1999;74(Suppl.):s25-s27.
Williams BC, Litzelman DK, Babbott SF, Lubitz RM, Hofer TP. Validation of a global measure of faculty's clinical teaching performance. Acad Med. 2002;77(2):177-180. DOI: 10.1097/00001888-200202000-00020 External link
Beckman TJ, Lee MC, Rohren CH, Pankratz VS. Evaluating an instrument for the peer review of inpatient teaching. Med Teach. 2003;25(2):131-135. DOI: 10.1080/0142159031000092508 External link