gms | German Medical Science

GMDS 2014: 59. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

07. - 10.09.2014, Göttingen

An improved method of power calculation for the one-sample log-rank test

Meeting Abstract

Suche in Medline nach

  • R. Schmidt - WWU/UKM Münster, Münster
  • R. Kwiecien - WWU/UKM Münster, Münster
  • A. Faldum - WWU/UKM Münster, Münster
  • S. Ligges - WWU/UKM Münster, Münster

GMDS 2014. 59. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Göttingen, 07.-10.09.2014. Düsseldorf: German Medical Science GMS Publishing House; 2014. DocAbstr. 84

doi: 10.3205/14gmds167, urn:nbn:de:0183-14gmds1677

Veröffentlicht: 4. September 2014

© 2014 Schmidt et al.
Dieser Artikel ist ein Open Access-Artikel und steht unter den Creative Commons Lizenzbedingungen (http://creativecommons.org/licenses/by-nc-nd/3.0/deed.de). Er darf vervielfältigt, verbreitet und öffentlich zugänglich gemacht werden, vorausgesetzt dass Autor und Quelle genannt werden.


Gliederung

Text

Introduction: Traditional designs in phase II cancer trials are single-arm designs including a binary outcome as primary endpoint, e.g. tumor response, which is defined by whether a response to treatment is observed after a fixed time span or not. The success rate of the (small) population under the new treatment is then compared to that of a historic control. However, there are trial settings, where a binary endpoint may either be not desirable or appears inadequate, e.g. due to potential loss to follow-up. Then, it may be more appropriate to employ a design allowing for a time-to-event endpoint.

For such settings, Finkelstein et al. [1] as well as Sun et al. [2] consider single-stage designs based on the one-sample log-rank test which was first proposed by Breslow [3]. This test allows for comparing the survival curve of a new treatment arm to that of a historic control. Finkelstein et al. [1] and Sun et al. [2] both give a sample size formula for power requirements of the one-sample log-rank test based on the number D of events to be observed: According to their approach, in order to achieve an aspired power for allocated significance level and treatment effect, the analysis is to be performed as soon as a critical number D of events has been reached. Here, an improved method of power calculation for the one-sample log-rank test is proposed.

Methods: Based on a counting process approach to the one-sample log-rank test, we derive and discuss a new stopping criterion that can be followed in order to achieve approximately a desired power for the one-sample log-rank test: Either the number of events D, as proposed by Finkelstein et al. [1], or the sum of cumulative hazards E of the patients can be monitored. In order to achieve a desired power, the analysis is to be performed as soon as D or E reaches an a priori determined critical value d or e, respectively.

We focus on examining the differences between these two sample size approaches. The asymptotic properties (as sample size becomes formally infinite) of the two approaches are explored theoretically, and the properties for (small) finite sample numbers are studied by means of simulations. Amongst others, we will compare both approach under a range of scenarios with regard to power performance, type I error control, study duration and appropriateness of the normal approximation for the one-sample log-rank test statistic. In order to assess the impact of sample size, simulations are performed under a range of different scenarios with different underlying sample sizes.

Results: It may be proven theoretically that the two underlying power approximations of the one-sample log-rank test are asymptotically equivalent. This asymptotic equivalence is confirmed by simulation in case of large sample sizes. We reveal that the two approximations work differently well for (small) finite sample numbers. In our simulations, the power curve of the test procedure based on following E proceeds close to the expected power curve. In contrast, the power curve of the test procedure based on following D proceeds considerably below the expected power curve if sample size is small. Besides, the test procedure based on following D is quite conservative here, whereas the test procedure based on following E exhausts the aspired significance level much better.

Discussion: We compared two different approaches to calculate power for a one-sample log-rank test. Standard approaches as proposed by [1], provide the number of events needed to obtain a desired power for a fixed effect size. On the first sight, monitoring of the number of events during a trial appears more practicable and thus more convenient than monitoring the value of the statistic E. However, simulations indicate that power performance of the one-sample log-rank test is better if the schedule of the analysis of data is based on E instead of being based on the number D of observed events. Robustness of our results in a wider range of simulations scenarios will be contents of further research.


References

1.
Finkelstein DM, Muzikansky A, and Schoenfeld, DA. Comparing survival of a sample to that of a standard population. Journal of the National Cancer Institute. 2003;95:1434-9.
2.
Sun X, Peng P, Tu D. Phase II cancer clinical trials with a one-sample log-rank test and its corrections based on the Edgeworth expansion. Contemporary Clinical Trials. 2011;32:108-13.
3.
Breslow NE. Analysis of Survival Data under the Proportional Hazards Model. International Statistics Review. 1975;43:45-58.