gms | German Medical Science

GMS Journal for Medical Education

Gesellschaft für Medizinische Ausbildung (GMA)

ISSN 2366-5017

Review of multiple-choice-questions and group performance - A comparison of face-to-face and virtual groups with and without facilitation

research article medicine

Search Medline for

  • author Edda Kazubke - Humboldt-Universität zu Berlin, Institut für Psychologie, Berlin, Deutschland
  • corresponding author Katrin Schüttpelz-Brauns - Charitè - Universitätsmedizin Berlin, Dieter Scheffner Fachzentrum, Assessment-Bereich / Progress Test Medizin, Berlin, Deutschland

GMS Z Med Ausbild 2010;27(5):Doc68

doi: 10.3205/zma000705, urn:nbn:de:0183-zma0007053

This is the English version of the article.
The German version can be found at:

Received: March 9, 2010
Revised: July 9, 2010
Accepted: July 20, 2010
Published: November 15, 2010

© 2010 Kazubke et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( You are free: to Share – to copy, distribute and transmit the work, provided the original author and source are credited.


Background: Multiple choice questions (MCQs) are often used in exams of medical education and need careful quality management for example by the application of review committees. This study investigates whether groups communicating virtually by email are similar to face-to-face groups concerning their review process performance and whether a facilitator has positive effects.

Methods: 16 small groups of students were examined, which had to evaluate and correct MCQs under four different conditions. In the second part of the investigation the changed questions were given to a new random sample for the judgement of the item quality.

Results: There was no significant influence of the variables “form of review committee” and “facilitation”. However, face-to-face and virtual groups clearly differed in the required treatment times. The test condition “face to face without facilitation” was generally valued most positively concerning taking over responsibility, approach to work, sense of well-being, motivation and concentration on the task.

Discussion: Face-to-face and virtual groups are equally effective in the review of MCQs but differ concerning their efficiency. The application of electronic review seems to be possible but is hardly recommendable because of the long process time and technical problems.

Keywords: multiple choice questions, MCQ, face-to-fac, virtual, facilitation, review-committee


The regulations for medical licensure require gradings of all courses of the medical training. In many cases multiple choice questions are used in order to test the student’s achievements. The quality of the designed questions is of high importance here. To meet this requirement, it is necessary to evaluate the multiple-choice questions and to improve them where applicable before using them in the exams. Usually, the quality control is carried out by review committees (RCs) which are constituted of experts of various faculties and trained to recognize formal errors in questions. This special task is performed in groups to make the experience of many people usable. In order to minimise possible perturbances caused by group dynamic processes, the regular sessions are facilitated. Facilitation has a positive influence on the quality of communication and cooperation [1]. Scheduling the members of the RCs is a problem which could be solved by introducing electronic reviews. Virtual cooperation of the RC could be put into practice via email, for example. The self-determined choice of when to communicate and how long to work on a task is of an advantage [2]. Previously done research on comparison of work of face-to-face groups and virtual groups could not show differences in the achievements. However, virtual groups needed significantly more time for processing the same task. Moreover, the members of those groups were significantly more unsatisfied with the teamwork. A positive effect of facilitation could also be demonstrated for the field of virtual cooperation [3]. The comparability of the achievements of face-to-face groups and virtual groups as to the special task of reviewing multiple-choice exercises is still extensively unexplored. This leads to the question whether the results of the social psychological research activities can be applied to the area of medical education.

For evaluating the achievements, both effectiveness and efficiency criteria are applied. Furthermore, it is to be investigated to what extent face-to-face groups differ from virtual RC groups with respect to subjective rating of taking over responsibility, teamwork, sense of well-being, motivation and concentration on the task.


The study was conducted in two stages comprising RCs for evaluating and improving insufficient MCQs and testing the modified questions in order to determine their quality.

Sample survey

41 female students and 8 male students of psychology of the 1st and 2nd year from the Humboldt University in Berlin participated in the RCs, having an average age of 23.17 years (SD=4.21). They were recruited via notices and by individual approach in lectures and credited with test person hours by the Chair in Social Psychology. They worked in 16 small groups of two to four persons. The number of the groups was determined by means of G*Power to evidence an average effect with the help of the chosen testing statistics. 50% of the groups were homogeneously composed with regard to their sex. Excluded from the study were persons who were not students of psychology of the 1st and 2nd year, who were unskilled in using computers or who had no email account at their disposal.

In the second part of the investigation 80 students of other faculties from the Humboldt University in Berlin who were directly approached by the investigators on the university campus participated in the study.


32 MCQs regarding general knowledge containing intended defects [4] were presented to the RCs. The number of inserted formal defects varied from zero to six and was confirmed by an expert’s opinion. Two trained persons rated independently from one another the quality and number of the errors. Subsequently, a joint expert opinion was formed by discussion. In order to make the teamwork easier a scheme was used which serves to evaluate and improve multiple-choice questions and which was developed by the assessment center of the Charité - Universitätsmedizin Berlin. The applied scheme allows interrogating compliance with questionnaire guidelines [4] (e.g. implausible distractors or homogeneity of the answer options). During stage one of the study, the group’s achievement was measured via the difference between the number of errors which were detected by the RCs and the number of errors determined by the experts. In the second part of the investigation the achievement was determined by increasing and decreasing coefficients of item discrimination in the testing sessions. The described achievement measures were used to determine effectiveness while the recorded processing time served as efficiency criteria. Efficiency rating was further possible by a questionnaire comprising 10 items for recording the aspects taking over responsibility, approach to work (referring to teamwork vs. working on his/her own), sense of well-being, motivation and concentration on the task. The used items were partially taken from usual questionnaires like the German “Fragebogen zur Arbeit im Team” (FAT, Kauffeld, 2004) (Questionnaire on working in a team), or designed as one-item-measurements. All items are six-point Likert items.


The research question was investigated by means of a two-way factorial design with repeated measures of the variables form of review committee (face-to-face/virtual) and facilitation (yes/no). The participants were randomly assigned to the RCs and the groups finally consisted of two to four participants. This was due to the fact that some testing persons failed to appear or had fallen ill on the fixed date. The assignment to the groups was random, depending on the arrival of the participants at the training. The sequence of the testing conditions to be passed was permuted to prevent sequence effects. Prior to the testing, the test persons were familiarized with the material to use and they were trained with regard to the criteria for judging multiple-choice tasks. Then, the MCQs (8 per condition) and instructions were submitted to the participants via email. The assignment of the questions to the testing conditions was carried out at random. Before the teamwork started, the participants had half an hour for preparation. For the face-to-face conditions half an hour was scheduled for answering the MCQs, for the virtual conditions, one week was scheduled. Facilitation was provided by the investigators. In each condition the multiple-choice questions were to be evaluated and, where applicable, improved by means of the above-mentioned scheme. After each test condition the questionnaire was to be answered by the participants as to the aspects taking over responsibility, approach to work, sense of well-being, motivation and concentration on the task.

From the set of modified multiple-choice questions one version was randomly selected of each of the 32 questions and put together in four knowledge tests. Each test contained eight questions from each test condition. A fifth test contained the 32 original questions as comparative measure of the improvements. Finally, the five tests generated this way were presented to other students. They were processed in presence of the investigators.

Statistical analyses

The teamwork was evaluated by means of an analysis of variance (ANOVA) for repeated measurements which was calculated for the difference values between the group’s and expert’s opinion. Additionally to the test statistic and the p-value the effect size is indicated as η2. According to Cohen [5], an effect size from η2=0.0099 is a small effect, from η2=0.0588 is a medium effect and from η2=0.1379 is a large effect. The processing times of the virtual test conditions were evaluated by means of the Wilcoxon signed-rank test due to the non-existing normal distribution of the variable without facilitation. With the data collected for the aspects taking over responsibility, approach to work, motivation, sense of well-being, and concentration on the task, an analysis of variance by ranks for dependent samples according to Friedman was computed with the respective group means. The coefficients of item discrimination of the original questions collected in the second part of the investigation (point-bi-serial correlation coefficient with part-whole-correction) and of the modified questions collected in the modified test conditions were compared by means of the Wilcoxon rank-sum test. Differences were considered to be statistically significant with p≤0.05. All data were analyzed with the SPSS statistics software 15.0.


Effectiveness of face-to-face and virtual groups

1. Expert’s opinion

The prerequisites for computing the ANOVA were verified and are given. In the test condition virtual review without facilitation the mean difference to the expert’s opinion is M=7.69 (SD=3.11) errors, in the test condition virtual review with facilitation M=7.50 (SD=2.73) errors. In the condition review face-to-face without facilitation experts’ errors are not detected on an average M=8.12 (SD=2.75), and with facilitation M=6.75 (SD=2.77) (see figure 1 [Fig. 1]). Neither the RC form (F(1)=0.04, p=0.84, η2=0,00) nor applying facilitation (F(1=1.54, p=0.23, η2=0.09) or interaction of both variables (F(1)=0.73, p=0.41, η2=0.05) have significant influence on the size of the difference to the expert’s opinion. In this case, the shared explained variation for the variable facilitation is highest with 9%.

The investigation of the effect of the influencing factors group size (F(2)=0.18, p=0.84, η2=0.03) and sex composition (F(1)=2.06, p=0.17, η2=0.13) which were constant over all test conditions does not show any significant results either. In the test condition face-to-face the mean difference to the expert’s opinion of same-sex groups is M=8.38 (SD=2.05) and of mixed-sex groups is M=6.50 (SD=1.04). In the virtual RC errors of the expert’s opinion are detected in same-sex groups with M=7.69 (SD=3.10) and in mixed-sex groups with M=7.59 (SD=2.40) (see figure 2 [Fig. 2]).

2. Reliability

The reliabilities of the respective tests which were modified by the RCs improve in comparison to the original test (see table 1 [Tab. 1]). Two of four tests show relevant higher reliability.

3. Item discrimination

The Wilcoxon test shows that the number of multiple-choice questions whose coefficient of discrimination is higher than originally after modification by the RCs differs only slightly from the number of multiple-choice questions whose coefficient of discrimination decreased. Changes into one of the two directions can be recognized in approximately 50% of the question sample per test condition. For the overall of the questions per test condition a change in the power of the test cannot be proved statistically (see table 2 [Tab. 2]).

Efficiency of face-to-face and virtual groups

1. Processing time

In both face-to-face conditions the processing time is constantly 30 minutes. The teamwork in the virtual test condition with facilitation requires between 5 and 28 days (M=14.31; SD=7.54). Meanwhile, the groups in the virtual test condition without facilitation process the task within 6 and 55 days (M=12.06; SD=11.69). As to the efficiency criteria processing time, the various virtual conditions do not differ with Z=-1.3 and p=0.20.

2. Further efficiency criteria

The ranking of the test conditions shown by the Friedman test is identical for the investigated aspects taking over responsibility, approach to work, sense of well-being, motivation and concentration on the task (see table 3 [Tab. 3]). The condition face-to-face without facilitation reaches the highest scores in each case. This is followed by the conditions face-to-face with facilitation and virtual without facilitation. The virtual facilitated condition reaches the lowest score in each case. The differences between the test conditions are statistically relevant for all aspects except motivation.


Effectiveness of face-to-face and virtual groups

As to detecting errors in MCQs no effect between the face-to-face and the virtual groups could be proved (0% explained variation) Also, as far as the increase of discriminatory power coefficients after the review is concerned, no difference between the groups was found. This result is consistent with other achievement comparison studies [3], [6], [7].

The Baltes meta-analysis [6] could not identify any effect of the group size. This result is only partly supported by the findings of the present study since we were able to prove a small effect (3% explained variation). The influencing factor sex composition showed a medium effect within the framework of the ANOVA (13% explained variation). Interesting is the fact that the processing of the questions by a review committee, no matter of which kind, could increase the reliability of the tests in parts relevantly.

Efficiency of face-to-face and virtual groups

The allowed processing time limit was always met by the face-to-face groups and coordinating the fixed dates was uncomplicated. Yet, the period of time in the virtual conditions turned out ot be a big problem for the teamwork. On the one hand, the time limit of two weeks fixed with the probands was nearly never met, on the other hand, this produced an increased demand for coordination by the investigator.

The participants of this investigation felt more comfortable with face-to-face/unfacilitated cooperation than in the other working conditions. In addition, they rated themselves as more concentrated and responsible and worked more consistently as a group together. Computer-mediated communication is always a bit cumbersome [8], which explains the finding of the lower satisfaction in virtual conditions.

Impact of facilitation on the group’s achievement

Facilitating groups has a medium effect on the achievements of the review committees. This result is conform with other empirical investigations on increasing achievement in groups via facilitation [3], [9], [10].

Critical aspects

The two-stage training process (3 hours for the investigators followed by 30 minutes for the test persons) might have led to loss of information and problems in understanding. In addition, members of real RCs have more practice and the regular meetings lead to a different kind of group dynamic. The scheme used for judging the multiple-choice questions does not reproduce completely all possibilities of defects in comparison to the guidelines of Haladyna [4]. Another methodical problem is the use of repeated measurements which might lead to loss in motivation. With regard to the virtual teamwork it could not be controlled how often the test persons checked their email accounts and how fast they answered the received emails. Furthermore, unforeseeable technical problems occurred when sending the emails: Email addresses had been misstated by the test persons, the received emails disappeared in spam folders or could not be opened on computers with a Linux operating system.

The complete facilitation method which has, as could be demonstrated by Schimansky [10], a superior impact on increasing effectiveness in comparison to partial aspects of the method, was not used in the present study. Another limitating factor is that no trained facilitators were available for this investigation. As can be seen from the literature, professional facilitators proceed more methodically and oriented on tasks than those who perform this role solely casually [11].

Since motivation was basically rated lower than all other aspects, the lower score might not trace back to the passed test conditions but to a low initial motivation of the students [12]. Another problem of the investigation is insufficient examination of the quality of the used questionnaire.


The application of electronic review via email seems to be possible but is hardly recommended due to efficiency reasons. Virtual teamwork is considerably more time consuming, often associated with technical problems and requires a higher demand for coordination by the facilitator. A possible alternative could be a virtual application according to nominal group technique. Further investigation on real RCs is necessary to permit for example more differentiated achievement comparisons or to substantiate the role of the RC leader (facilitator). For example, the realization of virtual teamwork in internet forums could be imagined. It remains to be verified whether eliminating the above problems is possible in this way.

Competing interests

The authors declare that they have no competing interests.


Scholl W. Grundprobleme der Teamarbeit und ihre Bewältigung - Ein Kausalmodell. In: Gemünden HG, Högl M (Hrsg). Management von Teams. Theoretische Konzepte und empirische Befunde. 3. Auflage. Wiesbaden: Gabler; 2005.
Döring N. Sozialpsychologie des Internets. Die Bedeutung des Internets für Kommunikationsprozesse, Identitäten, soziale Beziehungen und Gruppen. Internet und Psychologie. Neue Medien in der Psychologie, Band II. Göttingen: Hogrefe; 1999.
Unger D, Witte EH. Virtuelle Teams – Geringe Kosten, geringer Nutzen? Zur Leistungsverbesserung von Kleingruppen beim Problemlösen durch elektronische Moderation. Hamburger Forschungsbericht zur Sozialpsychologie (HAFOS), 73. Hamburg: Universität Hamburg, Arbeitsbereich Sozialpsychologie; 2007.
Haladyna TM, Downing SM, Rodriguez MC. A review of multiple-choice item-writing guidelines for classroom assessment. Appl Meas Educ. 2002;15(3):309-334. DOI: 10.1207/S15324818AME1503_5 External link
Cohen J. Statistical power analysis for the behavioral science. 2Nd, edition.Hillsdale, NJ: Lawrence Erlbaum; 1988. S.285.
Baltes BB, Dickson MW, Sherman MP, Bauer CC, LaGanke, JS. Computer-mediated communication and group decision making: A meta-analysis. Organ Behav Hum Decis Process. 2007;87:156-179. DOI: 10.1006/obhd.2001.2961 External link
Li SS. Computer-mediated communication and group decision making. Small Group Res. 2007;38(5):593-614. DOI: 10.1177/1046496407304335 External link
Becker-Beck U, Wintermantel M, Borg A. Principles of regulating interaction in teams practicing face-to-face communication versus teams practising computer-mediated communication. Small Group Res. 2005;36(4):499-536. DOI: 10.1177/1046496405277182 External link
Kramer TJ, Fleming GP, Mannis SM. Improving face-to-face brainstorming through modeling and facilitation. Small Group Res. 2001;32(5):533-557. DOI: 10.1177/104649640103200502 External link
Schimansky A. Die Moderationsmethode als Strukturierungsansatz effektiver Gruppenarbeit. Berlin: Papst; 2007.
Immig S, Bachmann T. Was professionelle Moderation leistet. Berlin: artop – Institut an der Humboldt-Universität zu Berlin; 2001. Zugänglich unter/available under: External link
Geister S, Konrad U, Hertel G. Effects of process feedback on motivation, satisfaction and performance in virtual Teams. Small Group Res. 2006;37(5):459-489. DOI: 10.1177/1046496406292337 External link