gms | German Medical Science

GMS Medizinische Informatik, Biometrie und Epidemiologie

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS)

ISSN 1860-9171

Multiple testing procedures for identifying desirable dose combinations in bifactorial designs

Multiple Testprozeduren zur Identifikation sinnvoller Dosiskombinationen in bifaktoriellen Plänen

Original Article

Search Medline for

  • author Bettina Buchheister - Institute for Medical Statistics, Informatics and Epidemiology, University of Cologne, Cologne, Germany
  • corresponding author Walter Lehmacher - Institute for Medical Statistics, Informatics and Epidemiology, University of Cologne, Cologne, Germany

GMS Med Inform Biom Epidemiol 2006;2(2):Doc07

The electronic version of this article is the complete one and can be found online at:

Published: June 29, 2006

© 2006 Buchheister et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( You are free: to Share – to copy, distribute and transmit the work, provided the original author and source are credited.


Hung, Chi, and Lipicky proposed the AVE and MAX tests to analyse in a bifactorial design whether combinations of two drugs at several doses fulfil the desirable property of superiority to both their single drug components. These are global tests and do not identify the special combinations which are more effective than their respective single components. Here multiple testing procedures based on linear contrast tests and on the closed testing principle will be presented. They will be compared with simultaneous Min tests of Laska and Meisner. The performance of these approaches is investigated by simulation studies.

Keywords: drug combination, Min test, bifactorial design, AVE- and MAX test, closed testing procedure, linear contrast test, experimentwise error rate


Die von Hung, Chi und Lipicky vorgeschlagenen AVE- und MAX-Tests ermöglichen die Analyse in einem bifaktoriellen Plan, ob Kombinationen von zwei Medikamenten mit verschiedenen Dosen existieren, die die wünschenswerte Eigenschaft der Überlegenheit zu beiden Einzelkomponenten besitzen. Die Tests sind jedoch globale Tests und können keine speziellen Dosis-Kombinationen identifizieren, die effektiver sind als ihre beiden Einzelkomponenten. In dieser Arbeit werden multiple Testprozeduren vorgeschlagen, die auf linearen Kontrast-Tests und dem Abschlusstest-Prinzip beruhen. Ein Vergleich mit simultanen Min-Tests nach Laska und Meisner wird angestellt sowie das Verhalten dieser Ansätze anhand von Simulationsstudien untersucht.

Schlüsselwörter: Medikamenten-Kombination, Min-Test, bifaktorielle Pläne, AVE- und MAX-Test, Abschlusstest-Prozedur, Linearer Kontrast-Test, experimentweises Signifikanzniveau


According to the guidelines of the U. S. Food and Drug Administration (FDA, CFR 300.50), one of the requirements for approving the use of a drug combination is that each component must make a contribution to the claimed effects. Analogously the guideline for combination drugs of the European Agency for the Evaluation of Medical Products (EMEA, CPMP/EWP/240/95) requires that the benefit/risk assessment of the fixed combination is equal or exceeds the one of each of its substances taken alone. That means, the combination needs to be simultaneously more effective than its single components.

This property of superiority may be tested by using the Min test of Laska, Meisner [1], [2] in (2x2) factorial design trials where each component of the combination is chosen at some fixed dose level based on prior information. Due to unknown potential interactions of the components the preselection of the dose combination is often difficult. Therefore multi level factorial designs involving simultaneous multiple dose combinations are demanded.

In case of more than one combination drug, however, there is a multiple testing problem and two questions are posed: (1) Globally: Is there any combination which fulfils the property of superiority to their single components? - Here one global hypothesis is tested. (2) Locally: Which specific combinations fulfil this property? - In this case test procedures controlling the experimentwise error rate α are required. That means the probability of at least one wrong inference should be controlled by the error rate α.

Neither dose response analyses nor general analysis of variance with interactions are suited for answering these two questions. Approaches which solely compare the effect of the combinations with the effects of their components are interesting. Hung, Chi, Lipicky [3] and Hung [4] developed two global test procedures which protect the overall type I error rate. For the local problem they only recommended to use a adjusted simultaneous Min tests according to the Hochberg [5] procedure.

In this article new procedures based on the closed testing principle will be presented. They are formed of different families of elementary hypotheses and allow for multiple testing hypotheses in a step down manner. Specific linear contrast tests were used. Furthermore, a local maximum test based on the global MAX test from Hung et al. [3] will be developed and a modification of simultaneous Min tests will be suggested. All procedures control the experimentwise error rate α. The performance of the proposed test procedures is investigated by simulation studies and suggestions for practical applications will be formulated.


Consider a two factorial design with I dose levels of drug A and J dose levels of drug B. Let µij, i=0, …, I and j=0, …, J, denote the true mean responses of the dose combination (i, j), whereby high values of µij's indicate benefit. (i, 0) and (0, j) denote the single drug components (Table 1 [Tab. 1]).

Let µij be estimated by the group mean Equation 1, where Xijk, k=1, …, nij, is the observed effect of the k-th subject in the (i, j)-th dose combination group. nij is the sample size of the (i, j)-th dose combination group. Assuming variance homogeneity, the pooled estimator of σ2 is given by

Equation 2

There are 2 • IJ marginal hypotheses, which compare each combination with one of its single component:

Equation 3 versus Equation 4

Equation 5 versus Equation 6.

Examining the claimed combination superiority, IJ local combination hypotheses can be formulated as union hypotheses of the two marginal hypotheses:

Equation 7

versus Equation 8

These IJ local combination hypotheses should be tested controlling the experimentwise error rate α. If the global testing problem is considered the global hypothesis is

Equation 9.

Previous approaches

Up to now, two approaches have been published. In case of only one combination drug Laska and Meisner [1], [2] suggest the Min test. For the general (I+1) x (J+1) design, two global tests are proposed by Hung, Chi, Lipicky [3].

Laska-Meisner Min test

In the simple case (I = J = 1) only one combination drug is observed and the hypothesis of interest is the following union hypothesis:

Equation 10

versus Equation 11

HAB will be rejected if both marginal hypotheses HA and HB are rejected at level α, using appropriate test statistics. This so-called Min test is a test for the simple combination drug problem with experimentwise level α.

Under rather mild conditions Laska and Meisner [1], [2] showed, that this test is the uniformly most powerful within the class of monotone level α tests. A generalization for testing union hypotheses with more than two hypotheses is possible. By the extended Min test procedure a union hypothesis is rejected if each partial hypothesis can be rejected at level α.

Global tests from Hung, Chi and Lipicky

In the general (I + 1) x (J + 1) case, a multiple testing problem arises. Two global tests are presented by Hung, Chi, Lipicky [3]. Their test statistics are based on the minimum gains over all dose combinations Equation 12. The tested global hypothesis Equation 13 versus Equation 14 is equivalent to H0.

The "AVErage" global test statistic TAVE is defined by the average of the observed minimum gains, and the "MAXimum" global test statistic TMAX is the maximum of the observed minimum gains. That is,

Equation 15

where S is the pooled estimator of σ and Equation 16 the observed mean effect of the combination (i, j). Both tests are one-sided level α tests, requiring a balanced design and normally distributed data with homogeneous variances. The distributions of TAVE and TMAX are derived by Hung et al. [3]. A more precisely and extended table of critical values Equation 17 of the tests as that given by Hung et al. [3] is presented in Table 2 [Tab. 2]. The tests, however, are not concerned with the multiplicity of the testing problem of the IJ local combination union hypotheses.

Note that these two global tests are developed for balanced designs, but two modified global tests in case of unequal sample sizes are provided [6].

New approaches

In practice, one might be often interested in the local question: Which dose combination(s) have the property of superiority to their respective components? Therefore, multiple procedures to find desirable combinations which control the experimentwise error rate are required.

Closed testing procedure of IJ local combination hypotheses

Consider the IJ local combination hypotheses Equation 18 as elementary hypotheses. Constructing a closed system of hypotheses, the global hypothesis is the intersection of all IJ local combination hypotheses H0. Therefore each global test for H0 is a competitor to the AVE and MAX global tests. The family of all intersections of the IJ local combination hypotheses can be tested at the α level by using a step down procedure.

Two examples of closed system of hypothesis with local combination hypotheses are given in Figure 1 [Fig. 1] and Figure 2 [Fig. 2]. In latter only the hierarchy of hypothesis is presented; arrows are omitted for sake of clearness.

Notice, that all hypotheses are intersection union hypotheses. But most of the general used level α test procedures are constructed for union intersection hypotheses. Thus each intersection union hypothesis must first be transformed into union intersection hypothesis by the rules of elementary set theory algebra. Afterwards generalized Min tests may be used. Specific level α tests for the intersection hypotheses based on linear contrast tests will be specified later on.

Closed testing procedure of 2 • IJ marginal hypotheses

Consider the 2 • IJ marginal hypotheses Equation 19 and Equation 20, which compare the effect of the combination with the effect of one of its components, as elementary hypotheses. Then, a system of hypotheses closed under intersection can be constructed (e. g. Figure 3 [Fig. 3]). This system of hypotheses contains Equation 21 hypotheses and is substantially larger than the system of hypotheses constructed by the IJ elementary local combination hypotheses (cf. Figure 3 [Fig. 3]). However, it contains only intersection hypotheses without unions which are easier to test. The family of all intersections of the 2 • IJ marginal hypotheses will be tested by a step down procedure. Subsequently a local combination hypothesis Equation 18 can be rejected by the Min test principle, if both of its marginal hypotheses Equation 19 and Equation 20 are rejected by the step down procedure using level α tests.

The global hypothesis Equation 22 of this closed system of hypotheses is the intersection of all marginal hypotheses and differs from H0:

Equation 23.

Accordingly, a global test for Equation 22 is not a competitor to the AVE and MAX global tests. This test procedure allows one only to answer the local question.

Two simultaneous closed testing procedures for each drug

Consider two simultaneous closed testing procedures generated by the IJ marginal hypotheses Equation 19 and the IJ marginal hypotheses Equation 20 (e. g. Figure 4 [Fig. 4]).

This procedure includes the advantages of both approaches mentioned above: Both systems of hypotheses are as small as in the first approach, and there are only intersection hypotheses without unions as in the second approach. But the disadvantage is that an α adjustment is required in order to control the overall error rate α. The two families of hypotheses will be tested separately for drug A and for drug B by step down procedures at level α/2. Finally Min tests can be applied to test the local combination hypotheses.

As in the approach before both global hypotheses Equation 24 and Equation 25 and their union or intersection differ from H0. This approach does not test the global question.

Simultaneous Min tests and a modification

A common procedure to answer the local question by controlling the overall error rate are simultaneous Min tests. Each local combination hypotheses will be simultaneous tested using the Min test from Laska, Meisner [2] at an adjusted level α* ≤ α . Several α adjustments are described in the literature (e. g. Bonferroni [7], [5] or [8], [9]). Simultaneous Min tests also belong to the class of closed testing principle. An α adjustment by Holm is e. g. a closed testing procedure using the Bonferroni inequality at each step.

Lehmacher [10] and Lehmacher, Wassmer, Reitmeir [11] propose a modification which is a short cut version of a closed testing procedure. Their suggested approach is a two step procedure where both the global and the local combination hypotheses are tested. A combination drug fulfils the property of superiority over its components if the global hypothesis can be rejected at level α and the corresponding local combination hypothesis can be rejected using the modified Bonferroni-Holm procedure with modified levels α/(IJ-1), α/(IJ-1), α/(IJ-2), …, α/2, α.

Special linear contrast tests

When testing superiority of combination drugs in a multi level two factorial design not all pairwise comparisons of treatments will be considered, but the 2 • IJ comparisons of combination drug with their components. Therefore, in the above described closed testing procedure partition hypotheses will be tested where two or more disjunctive hypotheses are intersected (e. g. Equation 26. Usual test statistics like, e. g., F-tests do not apply. In order to control the experimentwise error rate, partition hypotheses can be tested by multiple tests with α adjustment. Another possibility is to use a special linear contrast test which could be less conservative and more applicable.

Thus, each hypothesis in the closed test procedures will be tested by a specific linear contrast statistic. In case of testing an intersection of union hypotheses a transformation in an union of intersection hypotheses is required. The test statistics of the partition hypotheses will be constructed by averaging the corresponding marginal hypotheses. That is, the suitable contrasts cij will be calculated as the sum of the differences between the effect of the combinations and the effect of their single components. The test statistic is given by

Equation 27


Equation 28

Index set π1 and π2 Equation 101 {(i, j) |i=1,…, I; j=1,…,J} and |π1| and |π2| denote the number of elements in π1 and π2, respectively. S is the pooled estimator of σ for all treatment groups with cij Equation 30 0.

In case of normally distributed data with homogeneous variances T is t-distributed with Equation 31 degrees of freedom.

Local MAX test

The MAX test from Hung, Chi, Lipicky [3] is a global test. Here an extension to the local question, the local MAX test will be developed. The test statistic TMAX of the global MAX test is based on the combination drug with the maximum observed minimum gain over its components (in the following called "MAX-combination"). Rejecting the global hypothesis (TMAX > Equation 32) at least the MAX-combination fulfils the property of superiority to their respective components. But indeed there could be other combinations which fulfil this property too.

The idea of the local MAX test is to test all local combination hypotheses against the critical value Equation 32. Thus, each combination drug whose local test statistic Equation 33 is greater or equal to Equation 32 fulfils the property of combination superiority.

The local MAX test is a step up procedure (cf. [12]) based on the ordered local test statistics T(1) ≤ … ≤ T(IJ) = TMAX and a fixed critical value C:

step 1: test the local combination hypothesis Equation 34 by the critical value C:

Equation 35 2. step

Equation 36 Stop! Reject all local combination hypotheses

step 2: test the local combination hypothesis Equation 37 by the critical value C:

Equation 38 3. step

Equation  111 Stop! Retain Equation 34 and reject all Equation 34, i = 2,..., IJ

step n: test the local combination hypothesis Equation 40 by the critical value C:

(n = 3,...,IJ) Equation 41 n + 1. step

Equation 42 Stop! Retain Equation 43 and reject all Equation 44.

It is obvious that the local MAX test controls the experimentwise error rate α when the critical value C is Equation 32. Consider any true hypothesis Equation 45, then

error rate

= P (Reject Equation 45)

= 1 - P (Retain Equation 45)

= 1 - P (T(1) < C, …, T(i) < C).

Under Equation 46 this error rate is clearly maximum and from this follows that α = 1 - P (T(1) < C, …, TMAX < C) and consequently C = Equation 32.

Simulation studies of statistical power

All approaches described above control the experimentwise error rate α. The power of the following methods for the two posed questions is compared by simulation.

The global hypothesis H0 is tested by:

• Average (AVE) and maximum (MAX) test,

• Linear contrast test for the transformed global hypothesis H0 (GCo) and

• Multiple test of Simes [13] using Min tests (SIM).

The IJ local combination hypotheses are tested by:

• Simultaneous Min tests according to the Hochberg procedure (simMin),

• Simultaneous Min tests according to the modified Holm procedure after rejection of the global hypothesis (MinAVE, MinMAX, MinGCo, MinSIM),

• Closed test procedure of IJ local combination hypotheses (CTPloc),

• Closed test procedures of 2 • IJ marginal hypotheses (CTPmar),

• Two simultaneous closed test procedures at level α/2 (TwoCTP) and

• Local MAX test (loMAX).

The simulations are based on normally distributed data with different means, homogeneous variances σ = 1, balanced design nij = 30 and significance level α = 0.05. All approaches are very conservative. There is no procedure which is uniformly more powerful than the other ones. Nevertheless depending on the kind of design and a priori informations, suggestions for practical applications can be formulated.

In this paper the results of simulation are mainly qualitatively described. Some quantitative results for a (2x3) as well as for a (3x3)-design (Table 3 [Tab. 3]) are presented as an example. Detailed power analyses are given by Buchheister [14].

In case of the global question the results clearly show that the global contrast test (GCo) may substitute the global AVE test. GCo is the most powerful test in situations where as yet the global AVE test was more powerful than the global MAX test (see Figure 5 [Fig. 5]).

An example of comparison of power simultaneous in different situations for (2x3)-design is given in Figure 6 [Fig. 6]. There the black areas describe situations where the global contrast test is more powerful than the global MAX test and in grey areas is power(GKo) < power(MAX).

The global contrast test (GCo) is recommended when all combination drugs fulfil the property of superiority or only few combinations do not and the others are similarly strong effective than their components. Otherwise the global MAX test of Hung, Chi, Lipicky [3] is suggested. In case of very large multi level two factorial design, when the tables of critical values for the global MAX test (cf. Table 2 [Tab. 2] or [3]) do not suffice, for practical reasons the multiple test of Simes (SIM) will be recommended. The loss of power is negligible.

The results of the simulations, concerning the local question are quite difficult to summarize. Comparing the three new closed testing procedures among each other shows that the closed testing procedures of IJ local combination hypotheses (CTPloc) are often the most powerful one. But in large multi level designs this closed testing procedure is hardly used in practice because of many transformations of intersection union hypotheses in union intersection hypotheses. In contrast the practical application of the closed testing procedures CTPmar and twoCTP is much easier. Therefore the closed testing procedure of 2 • IJ marginal hypotheses is preferable, except that there is no interest in the global hypothesis. The loss of power in relevant situations is negligible too. The approach of two simultaneous closed testing procedures at level α/2 is very conservative in small designs. However, it becomes interesting in larger designs because of its manageable systems of hypotheses.

Depending on the number of dose combinations which fulfil the property of superiority and the size of their effectiveness, different most powerful procedures can be specified:

1) In case that all or nearly all combinations are similar more effective than their components the three new closed testing procedures have the largest power. In this situation the closed testing procedure of the 2 • J marginal hypotheses is suggested for small designs and two simultaneous closed testing procedure at level α/2 are suggested in case of large designs.

2) If the sizes of effectiveness of the dose combinations differ a lot, but there are only few combinations which are not simultaneously more effective than all of their single components, simultaneous Min tests like simMin, MinMAX or MinSIM are more powerful. The power of these three procedures is similar.

3) Otherwise, if only few dose combinations fulfil the property of superiority the local maximum test has the largest power. Using the tabulated level α critical value (cf. Table 2 [Tab. 2] or [3]) it is easy to apply.

These suggestions are simplified and summarized in Table 4 [Tab. 4] and an example is given in Figure 7 [Fig. 7].

If no a priori informations about the effect of the combinations are known it will be difficult to assume that all dose combinations are similarly effective to their components, especially in larger multi level two factorial designs. Therefore concerning simplicity, robustness and power, the new global linear contrast test with subsequent α adjusted simultaneous Min tests according to the modified Holm procedures (MinGCo) are a good compromise and thus the procedure of choice in this situation. The loss of power by using a simultaneous Min test in case of similarly strong effects is small.


When more than one dose combination should be tested if they are simultaneous more effective than all of their single components, a multiple testing problem arises. Different test procedures concerning the global or the local approach can be used. Due to the complexity of the problem, an optimal procedure cannot be recommended in general (cf. [14]).

It is not really worse retaining the use of the hitherto most used global tests of Hung et al [3]. Beside a very easy quick extension of the global MAX test to the local question is the here called local MAX test. Anyway there are some advantages of the closed testing procedures. With the closed testing procedures there is less restriction to the data in contrast to the two global tests from Hung et al [3]. which requires normally distributed data with homogeneous variances. Each intersection hypothesis may be tested by suitable level α tests regarding the nature of the data. Because of the numerous partition hypotheses in the closed testing procedures for testing the property of combination superiority it is recommended to use special contrast tests. In case of variance heterogeneity Welch-type modifications of the tests can be easily applied. In case of binary data Gauss tests can be used. Note also that applying closed testing procedures no placebo dose combination must be included in the study. This can be important when there are ethical or medical problems to administer a placebo.

In view of the complexity and the multiplicity of the problem (cf. the application in [15]) a sequential design as presented by Lehmacher, Kieser, Hothorn [16] could be more advantageous, because during the conduct of the study drug combinations can be skipped.

This paper is focussed on the multiple identification of desirable dose combinations. There are multiple decision procedures and related simultaneous confidence intervals are not available. Another related topic is the identification of minimum effective doses; multiple testing procedures for this problem are proposed by Hellmich and Lehmacher [17].


Laska EM, Meisner MJ. Testing whether an identified treatment is best: The combination problem. Proceedings of the Biopharmaceutical Section of the American Statistical Association. 1986;163-70.
Laska EM, Meisner MJ. Testing whether an identified treatment is best. Biometrics. 1989;45:1139-51.
Hung HMJ, Chi GYH, Lipicky RJ. Testing for the existence of a desirable dose combination. Biometrics. 1993;49:85-94.
Hung HMJ. Testing for existence of desirable dose combination (Correspondence). Biometrics. 1994;50:307-8.
Hochberg Y. A sharper Bonferroni procedure for multiple tests of significance. Biometrika. 1988;75:800-2.
Hung HMJ. Evaluation of a combination drug with multiple doses in unbalanced factorial design clinical trials. Statist Med. 2000;19:2079-87.
Holm S. A simple sequentially rejective multiple test procedure. Scand J Statist. 1979;6:65-70.
Hommel G. A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika. 1988;75:383-6.
Hommel G. A comparison of two Bonferroni procedures. Biometrika. 1989;76:624-5.
Lehmacher W. Verlaufskurven und Crossover. Heidelberg: Springer; 1987.
Lehmacher W, Wassmer G, Reitmeir P. Procedures for two-sample comparisons with multiple endpoints controlling the experimentwise error rate. Biometrics. 1991;47:511-21.
Tamhane AC, Hochberg Y, Dunnett CW. Multiple test procedures for dose finding. Biometrics. 1996;52:21-37.
Simes RJ. An improved Bonferroni procedure for multiple tests of significance. Biometrika. 1986;73:751-4.
Buchheister B. Statistische Methoden zum Nachweis der Effektivität von Kombinationspräparaten [Dissertation]. Köln: Medizinische Fakultät der Universität zu Köln; 2001.
Letzel H, Blümner E. Bivariate Dosis-Wirkungs-Beziehungen für ein Kombinationsantihypertensivum: Biometrische Erfahrungen mit einem komplexen Studienmodell. In: Baur MP et al.: Medizinische Informatik, Biometrie und Epidemiologie, 41. Jahrestagung der GMDS, Bonn. München: Urban und Vogel; 1997. p. 382-6.
Lehmacher W, Kieser M, Hothorn L. Sequential and multiple testing for dose-response analysis. Drug Inf J. 2000;34:591-7.
Hellmich M, Lehmacher W. Closure Procedures for Monotone Bi-Factorial Dose-Response Designs. Biometrics. 2005;61:270-7.