gms | German Medical Science

50. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (gmds)
12. Jahrestagung der Deutschen Arbeitsgemeinschaft für Epidemiologie (dae)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie
Deutsche Arbeitsgemeinschaft für Epidemiologie

12. bis 15.09.2005, Freiburg im Breisgau

Planning and analysis of clinical trials with binary endpoints and “gold standard” design

Meeting Abstract

Search Medline for

  • Meinhard Kieser - Department of Biometry, Dr. Willmar Schwabe Pharmaceuticals, Karlsruhe, Germany
  • Tim Friede - Biostatistics and Statistical Reporting, Novartis Pharma AG, Basel, Switzerland

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. Deutsche Arbeitsgemeinschaft für Epidemiologie. 50. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (gmds), 12. Jahrestagung der Deutschen Arbeitsgemeinschaft für Epidemiologie. Freiburg im Breisgau, 12.-15.09.2005. Düsseldorf, Köln: German Medical Science; 2005. Doc05gmds036

The electronic version of this article is the complete one and can be found online at:

Published: September 8, 2005

© 2005 Kieser et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( You are free: to Share – to copy, distribute and transmit the work, provided the original author and source are credited.



In an increasing number of clinical trials, an experimental treatment is compared to an active control for which efficacy has already been established. The objective of these trials is usually to demonstrate non-inferiority of the new treatment compared to the reference. This is done by establishing that the efficacy of the new treatment as compared to the reference is not below some pre-specified clinically irrelevant amount. Two-arm non-inferiority trials with a head-to-head comparison of a test and a well-established reference treatment are an attractive option because they avoid to expose patients to placebo in situations where a proven active treatment exists. Furthermore, they help to clarify the role of the new treatment as compared to already marketed drugs. However, two-arm trials with active control exhibit some major problems in design, analysis and interpretation (see, for example, [1]). Comparison to an active treatment can only provide an indirect proof of efficacy. To achieve this goal, it is necessary to establish assay sensitivity of the study, i.e., to demonstrate that the trial was able to detect a difference between treatments if there is any. Furthermore, the non-inferiority margin has to be chosen such that non-inferiority of the test treatment versus the reference assures that the experimental treatment would also have been superior to placebo. For this, the effect of the reference treatment as compared to placebo has to be estimated from historical data. This can be reliably done only if rather strong assumptions hold true such as availability of placebo-controlled historical trial data and absence of publication bias, avoidance of selection bias when choosing suitable studies, and constancy of trial design, clinical practice as well as effects over time [1], [2]. Establishing these crucial points is usually an extremely difficult task. As a consequence, it is proposed to include a placebo group in trials comparing active treatments whenever ethically justifiable. These three-arm non-inferiority trials are sometimes referred to as the "gold standard" [3] because they avoid the difficulties described above. Efficacy of the test treatment can here be demonstrated by direct comparison to placebo. Moreover, non-inferiority of the new treatment compared to the reference can be assessed. The hypotheses can be formulated in such a way that proof of non-inferiority assures that at least a pre-defined portion of the effect the reference shows versus placebo is preserved by the experimental treatment.

In our presentation we give methods to address issues related to this test problem in the situation of binary endpoints. Test procedures are given and their characteristics with respect to the actual type I error rate are investigated. Especially, it will turn out that the results and conclusions given by Tang and Tang [4] are disputable. Furthermore, formulas for sample size calculation are presented and their characteristics are compared. We illustrate the application of these methods by a clinical trial example.


D'Agostino RB, Massaro JM, Sullivan LM. Non-inferiority trials: design concepts and issues - the encounters of academic consultants in statistics. Stat Med 2003; 22:169-186.
CPMP Points to Consider on the choice of non-inferiority margin (CPMP/EWP/2158/99 draft). London: EMEA, 2004. At ewp/ 215899en.pdf.
Koch A, Röhmel J. Hypothesis testing in the "Gold Standard" design for proving the efficacy of an experimental treatment relative to placebo and a reference. J Biopharm Stat 2004; 14:315-325
Tang M-L, Tang N-S. Tests of noninferiority via rate difference for three-arm clinical trials with placebo. J Biopharm Stat 2004; 14:337-347.