gms | German Medical Science

GMS Journal for Medical Education

Gesellschaft für Medizinische Ausbildung (GMA)

ISSN 2366-5017

The Effects of Anonymity on Student Ratings of Teaching and Course Quality in a Bachelor Degree Programme

research article medicine

  • corresponding author Theresa Scherer - Bern University of Applied Sciences, degree programme Nursing, Bern, Schweiz
  • author Jan Straub - Bern University of Applied Sciences, degree programme Nursing, Bern, Schweiz
  • author Daniel Schnyder - Bern University of Applied Sciences, degree programme Nursing, Bern, Schweiz
  • author Noemi Schaffner - Bern University of Applied Sciences, degree programme Nursing, Bern, Schweiz

GMS Z Med Ausbild 2013;30(3):Doc32

doi: 10.3205/zma000875, urn:nbn:de:0183-zma0008755

This is the English version of the article.
The German version can be found at:

Received: November 12, 2012
Revised: January 31, 2013
Accepted: April 7, 2013
Published: August 15, 2013

© 2013 Scherer et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( You are free: to Share – to copy, distribute and transmit the work, provided the original author and source are credited.


Research Question: Are there any clear differences between the outcomes of anonymous and personalised student evaluations of teaching quality?

Methods: During a two-year period students were randomly divided into two separate groups, “anonymous” and “personalised”, for end-of-module evaluations. The quality of the module was assessed using a standardised questionnaire. Additionally, students were given the option to add “further comments” if they wanted to highlight specifics.

These optional comments were independently assessed by three people, using a five-dimensional rating instrument: positive/negative; differentiated/absolute; naming a person/general; containing an order/neutral; visually accentuated/blank.

The database consisted of 615 evaluation forms, of which 306 were completed anonymously. In order to identify whether there were any differences between the anonymous and personalised data, a multivariate variance analysis was performed.

Based on the scale, the answers to the questions and the quality of the comments were evaluated. Furthermore, an assessment was made to determine if there were any differences in the number of optional comments between the two groups.

Results: No significant differences were identified in the informative quality of data between the anonymous and personalised student evaluations. However, students in the personalised group had a tendency to include more details in their written answers.

Conclusion: Personalised evaluations do not generate more biased results in terms of social desirability, as long as the evaluation concept is characterised by a closed-circle process and is transparent. In other words, it is imperative that the outcomes of the evaluation are reported back to the students. Moreover, there has to be an opportunity for students to discuss any further suggestions and/or future desires in an open environment. In this way the students respect and understand that their feedback is being taken seriously; consequently, they feel able to provide a constructive and honest evaluation.

Keywords: Education, curriculum development, programme evaluation, respondent anonymity


Initial Situation

In 2006, the Bachelor of Science in Nursing at the Bern University of Applied Sciences (BFH) was redesigned in accordance with the requirements of the Bologna Process. The degree programme is a competence-based general studies course, combining a scientific basis and consistent practical focus. The structure is modular; the pedagogic-didactic concept is Problem-based Learning (PBL). The teaching staff consists of twenty lecturers who have generally completed basic training in nursing and hold an academic degree in Nursing Science, Educational Science or Psychology. On average, 100 students are educated in Nursing Science every academic year.

Based on the evaluation literature which has proven itself well for PBL curriculums, the evaluation process of the degree programme was developed in parallel to the curriculum [1], [2]. The objective of the teaching evaluation (evaluations of modules, lecturers, etc.) was the continuous optimisation of the degree programme. This led to a corresponding evaluation concept, which can be characterised as follows:

Continuous evaluation of all modules by students and lecturers for structural-organisational adaptations within the single modules.
Continuous curriculum adaptation by the study group. This is the evaluation group, consisting of the head of the degree programme, the head of the training programme and the research associate. It sees to structural, content-orientated adaptations within the whole curriculum.
Continuous quality development by the quality circle. The quality circle is composed of the entire teaching staff. It sees to the systemic-attitudinal adaptations, for example, concerning the basic educational attitude, for the realisation of the future professional profile.

At the end of every module, the students evaluated the module using a standardised questionnaire. The questions were designed to monitor the module quality. The students completed the questionnaire during a teaching session at the end of every module. This resulted in the very high return rate of 95 per cent. All evaluations were carried out anonymously.

The collected data was processed statistically and summarised. In the following module, the head of the degree programme presented the summary to the students and the results were discussed. The students received feedback as to whether and how their proposed changes or suggestions would be put into practice.

This approach has proven itself well in the past years. However, the question of whether the student survey should still be carried out anonymously led to controversial discussions among the staff. The reason for this was the appearance of single, noticeably negative numerical assessments, for example, rating all the questions with 11 or giving offensive and destructive comments such as “Mrs Müller2 is completely incompetent! Send her to a further education course!” In addition, the very negative comments from the students were often not consistent with the corresponding numerical assessments. In such cases, it was not clear to the teaching staff how the results should be interpreted. Since the students evaluated the modules without giving their names, the lecturers could not check back with them, which led to great irritation. Part of the teaching staff suspected that anonymous evaluation favoured such very negative student responses. However, the amount of data available at the time was very small and therefore no well-founded conclusions could be drawn.

Supporters of the anonymous approach argued that students might fear that negative criticism could result in unpleasant consequences from lecturers. The research literature also points out that, particularly with sensitive or threatening answers that could have negative effects, there is a risk that the questionnaire will not be completed honestly but rather in accordance with social desirability [3]. Moreover, various studies have provided evidence that respondents tend to give more obliging answers if they know or suppose that the lecturers will be able to see their names alongside their answers [4], [5]. According to another study, carried out by Fries and McNinch [6], students do not give their names, even when asked to do so, whenever they have something negative to say. Therefore the specialised literature recommends that questionnaires should be completed anonymously in order to ensure the accuracy and reliability of the data [3].

However, there have been just as many studies that could not prove that there are any differences between anonymous and personalised data [7], [8]. Furthermore, the teaching staff adopts an attitude, in accordance with the basic educational concept, that students should be treated as adult partners and that an objective and open exchange regarding the performance quality must be possible. In the course of their studies, the students should learn to be able to give critical yet respectful and responsible feedback.

Research Question

Previous research results on the subject of anonymous and personalised evaluation are inconsistent and some of the specialised literature is more than ten years old. This means that the question has not received much attention in recent years. The common approach to carrying out surveys anonymously – particularly when using psychometric methods – was therefore applied indiscriminately to programme evaluations. To be able to base the evaluation process on evidence and to close the current research gap regarding anonymous and personalised evaluation, those responsible decided to carry out a study. Its aim was to discuss the following question: Are there any differences between the outcomes of anonymous and personalised evaluations?

To answer this core research question, three sub-questions were defined, allowing a differentiated analysis regarding the research objective:

Do the modules receive better quantitative assessment if the evaluations are anonymous or if they are personalised?
Are more optional comments given in anonymous or personalised evaluation?
Is the quality of the optional comments different in anonymous and personalised evaluation (see Chapter 2.3 “Data Processing”)?



The module evaluation questionnaires, specially designed for the curriculum, contain six items for the aspect Overall Impression of the module (objectives, relevance, specialism, subject matter, organisation, quality), four items for the aspect Lectures (didactic questions, lecturers, structure, quality, see Figure 1 [Fig. 1]) and four items for the aspect Exams (level, scale, content, quality). The students assessed all of these items on a scale from 1 (completely disagree) to 6 (completely agree). This scaling was chosen because it is consistent with the Swiss grading system and therefore demands little cognitive effort from the students while evaluating and because it allowed a tendency towards sufficient/insufficient to be observed. In addition, depending on the structure of the individual modules, different items were questioned, e.g. concerning tutorials, skill trainings, practical courses or seminars, see Figure 1 [Fig. 1] [1], [2], [9], [].

To be able to subsequently analyse the research question, an additional six-level scale was designed for every aspect by averaging its items.

In addition to the module assessment using the given scale, the students had the possibility to make comments and/or suggestions about every aspect. This means that the students could make more than one optional comment in the questionnaire.

Sample and Database

For the analysis of the research question the evaluation setting was changed: one half of the students still completed the questionnaires anonymously, whereas the other half completed questionnaires with their names printed on them. The data collection for this study took two years. The sample was composed of students from four cohorts. For every module evaluation the students were randomly re-assigned either to the “anonymous” group or the “personalised” group.

During the study period a total of 27 modules were evaluated. All the practice and communication modules as well as the Clinical Assessment modules were excluded from the study because other questionnaires were used to evaluate these. The people in charge of the study chose five out of the 27 evaluated modules that suitably represent the multifaceted Bachelor in Nursing: “The study of nursing (introductory module)”, “Dealing with emergency situations”, “Understanding research”, “Acquiring basic statistical knowledge” and “Ensuring nursing quality”. Altogether 615 completed questionnaires, of which 306 were completed anonymously, were used in the study.

The analyses of the answers were carried out individually and independently for each module. Since the groups for every module evaluation had been randomised anew, it was quite likely that the same student assessed one module anonymously and another module with a personalised questionnaire.

Data Processing

For the first two sub-questions, the data was entered into the statistical programme SPSS and the analysis carried out as described in Chapter 3 “Results”.

For the third sub-question the data had to be processed separately. In order to do this, all the comments were assessed using a rating instrument developed by one of the authors. The instrument was developed with the objective of coding statements in a way that shows whether they are offensive or destructive. First, the common features of offensive or destructive comments had to be considered. Based on this, it was defined that a statement is offensive or destructive only if it meets all of the following five conditions: it is

refers to a person,
contains an order and
is visually accentuated.

The five resulting dimensions were defined as follows:

In the dimension Assessment the statements were coded as positive, negative or both, e.g. “The module was poorly organised, but the topics were all very interesting”.
In the dimension Differentiation it was assessed whether the statements were differentiated or absolute. A statement, for example, was considered differentiated if it contained words like “partly”, “sometimes”, “but”, etc. Statements like: “The topic of normal distribution is unimportant” were coded as absolute.
The dimension Individual showed whether a statement referred to a person, namely a lecturer, and whether her/his name was mentioned, e.g. “Mrs Müller is incompetent!”
In the dimension Order it was assessed whether the responses contained an order, e.g. “Explain to the associate lecturers once and for all how the digital projector works”.
In the dimension Visual Accentuation, statements were coded as to whether they were accentuated with punctuation like an exclamation mark or highlighting, e.g. “Super!!!” or “HELLO PLANNING!”.

Using the rating instrument presented above, three people independently coded all comments and suggestions into categorical variables and analysed and assessed them. During the coding process the assessors did not know whether a statement came from the anonymous or the personalised group. To verify the assessors’ consistency, the Pearson correlation coefficient was calculated. The value of the consistency for the single modules was between r=.74 and r=.98.


Sub-Question 1: Differences in the Quantitative Assessment

To analyse whether anonymous evaluations tended to be more negative than personalised evaluations, a multivariate variance analysis (MANOVA) was carried out for every single module. The scales developed according to the described aspects of the module evaluation were the dependent variables; the grouping variable “anonymous” and “personalised” was the independent variable.

For none of the modules could any significant difference between anonymous and personalised data be identified (see Table 1 [Tab. 1]).

Sub-Question 2: Frequency of Optional Comments

After a simple frequency count of the optional comments, a Mann-Whitney test for independent samples was carried out for every module to verify whether the students evaluating anonymously tended to make optional comments more frequently. The Mann-Whitney test was chosen because the data did not meet the requirements for a t-test.

As can be seen in Table 2 [Tab. 2], the students in the personalised group tended to make suggestions and comments more often. However, the frequency does not differ significantly from one group to the other. The optional comments in the module “Understanding research” can be considered as an exception: in this module, the personalised group made suggestions or expressed opinions significantly more often on average.

Sub-Question 3: Quality of Optional Comments

The students formulated a total of n=2152 statements about the five modules relevant for the study; n=6 of these were offensive or destructive, meaning they met all of the above-mentioned conditions (see Chapter 2.3 “Data Processing”). Due to this small number of statements meeting the conditions of an offensive or destructive statement, it was not possible to carry out a statistical analysis to determine whether the students evaluating anonymously made offensive or destructive statements more often.

As a first step for the statistical analysis, the cumulative value per dimension was established for all the optional comments in the questionnaire. This means that if a student made three non-personal comments in one evaluation, he or she received a cumulative value of 3 on the dimension “Individual”. Thus the lower the cumulative value, the more often the students wrote responses that were negative, absolute or personal or that contained an order or a visual accentuation.

As a second step, the data was checked for possible differences between the groups in the quality of the optional comments using MANOVA (IV=grouping variable anonymous vs. personalised, DVs=cumulative values of all dimensions) (see Table 3 [Tab. 3]).

No significant qualitative differences between anonymous and personalised comments could be identified in any of the five dimensions.


The objective of the study was to determine whether there is a difference between anonymous and personalised programme evaluation by students. As a result it could be shown that, regarding the quantitative assessments, there are no differences between the two analysed groups. Due to the small amount of data it could not be determined whether there is a difference concerning offensive or destructive statements, but an analysis of the characteristics of the five qualitative dimensions shows no significant difference between anonymous and personalised evaluation. The frequency of the optional comments differed significantly only for one module; for all the other modules no difference between the anonymous and personalised group could be found. These results leave much room for interpretation. The single significant result might be explained by the fact that students who make an effort to formulate an elaborate personal comment would like to be perceived as individuals. They would like to initiate a further exchange.

The consistent feedback cycle presented in Chapter 1.1 “Initial Situation” and the high degree of transparency in the handling of the results gives the students the certainty that their feedback will be taken seriously and considered carefully. The students therefore do not abuse questionnaires to vent frustration about other issues.

In addition to the standardised module evaluation other feedback possibilities exist, notably a mentoring concept that facilitates individual exchange between students and lecturers. Possible dissatisfactions or problems can thus be addressed specifically and in various ways. Another explanation for the present results might be that the lecturers signalised that feedback and suggestions for the development of the programme would be welcome.

This evaluation concept affects lecturers as well as students. Both still see themselves, to a certain extent, as pioneers of the new degree programme and would like to contribute to the curriculum design. Furthermore it can be assumed that rules for communication and feedback are fully integrated in the curricula of the feeder schools, where the higher education entrance qualification is obtained. The generation studying today is used to communication and discussion.

Based on the results of the study, offensive and destructive statements must be viewed as isolated cases. The lecturers were unaware of this fact when the evaluation concept was first used (see Chapter 1.1 “Initial Situation”), and the amount of data available at the time did not allow any universally applicable conclusions to be drawn. Outliers were not recognised as such and therefore overrated. It was suspected that they came about because of the use of anonymity.

The theory referred to as “Negativity Bias” in the literature confirms that in most situations something negative will be perceived more strongly, dominantly and drastically than something positive [10]. Royman and Rozin [11] illustrate this theory with the comparison that the slightest contact with a cockroach makes a delicious meal inedible. According to the principle of “Negativity Dominance” [11][, the perception and judgement of events that feature both positive and negative aspects are more negative than the arithmetical sum of these subjective values.

Numerous other reasons may be accountable for single negative statements: professional overload, private problems, lack of interest in a subject or a “Negativity Bias” on the students’ side, for example, one exam question that is too difficult makes the whole module bad.

As a limitation of the present study one could name the self-elaborated design of the five-dimensional instrument to determine the statement quality of the optional comments, since this categorisation is subjective and cannot be exhaustive. Nevertheless, the high assessment consistency between the independent assessors is indicative of the validity of the instrument. It should also be noted that it is precisely the use of this rating instrument and the assessment of the optional comments that made it possible to gain an insight into the students’ critical engagement with the curriculum.

A strength of the study lies in the large amount of data available: due to the comprehensive implementation of the module evaluation, the response rate was almost 100 per cent. This in turn was only made possible by the well-thought-out concept. Over the years, the evaluation questions have proven to be relevant. All the careful preliminary work and the well-practised setting made a smooth implementation of the research project possible.


It is worthwhile to invest in the development of an evaluation concept whose central aspect is a closed and transparent feedback cycle. This provides the students with the possibility of participation and shows them how the evaluated data is integrated into the continuous development of the course offer. What is crucial is not the number of evaluations – although a minimum should certainly be ensured – but rather the consistent reporting of the results to the students and the discussing of possible measures. The study shows that it does not matter whether an evaluation is carried out anonymously or in a personalised way, as long as these conceptual evaluation conditions are met. With this in mind, the results of the present study can be applied to other academic institutions.

Extremely negative responses must be put into perspective, since negativity seems to have a stronger effect than positivity. This is a further, possibly relieving, conclusion.


1 Swiss grading system: 6 = excellent, 5 = good, 4 = pass, 3 = fail, 2 = very poor, 1 = no performance

2 Real name withheld.

Competing interests

The authors declare that they have no competing interests.


Kern DE, Thomas PA, Hughes MT. Curriculum Development for Medical Education - A Six-Step Approach. Baltimore: The Johns Hopkins University Press; 1998.
Kromrey H. Evaluation - ein vielschichtiges Konzept: Begriff und Methodik von Evaluierung und Evaluationsforschung. Sozialwiss Berufspraxis. 2001;24(2):105-131.
Borg WR, Gall MD. Educational research: An introduction. New York: Longman; 1983.
Braskamp LA, Ory JC. Assessing faculty work: Enhancing individual and institutional performance. San Francisco: Jossey-Bass; 1994.
Seldin P. How administrators can improve teaching: Moving from talk to action in higher education. San Francisco: Jossey-Bass; 1990.
Fries CJ, McNinch RJ. Signed versus unsigned student evaluations of teaching: A comparison. Teach Sociol. 2003;31(3):333-344. DOI: 10.2307/3211331 External link
Opren C. The susceptibility of studen evaluation of lecturers to situational variables. High Educ. 1980;9(3):293-306. DOI: 10.1007/BF00138519 External link
Goh JW, Lee OK, Salleh H. Self-rating and respondent anonymity. Educ Res. 2010;52(3):229-245. DOI: 10.1080/00131881.2010.504060 External link
Baartmans P. Qualität nach Mass: Entwicklung und Implementierung von Qualitätsverbesserungen im Gesundheitswesen. 2 ed. Bern: Huber Verlag; 2006.
Baumeister RF, Bratslavksy E, Finkenauer C, Vohs KD. Bad is Stronger Than Good. Rev Gen Psychol. 2001;5(4):323-370. DOI: 10.1037/1089-2680.5.4.323 External link
Rozin P, Royzman EB. Negativity Bias, Negativity Dominance, and Contagion. Personality and Social Psychology Review. 2001;5(4):296-320. DOI: 10.1207/S15327957PSPR0504_2 External link