gms | German Medical Science

MAINZ//2011: 56. GMDS-Jahrestagung und 6. DGEpi-Jahrestagung

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V.
Deutsche Gesellschaft für Epidemiologie e. V.

26. - 29.09.2011 in Mainz

Measuring inter-observer agreement in contour delineation of PET-CT imaging using Fleiss’ Kappa

Meeting Abstract

Suche in Medline nach

  • Gerta Rücker - Institut für Medizinische Biometrie und Medizinische Informatik, Freiburg

Mainz//2011. 56. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (gmds), 6. Jahrestagung der Deutschen Gesellschaft für Epidemiologie (DGEpi). Mainz, 26.-29.09.2011. Düsseldorf: German Medical Science GMS Publishing House; 2011. Doc11gmds024

doi: 10.3205/11gmds024, urn:nbn:de:0183-11gmds0249

Veröffentlicht: 20. September 2011

© 2011 Rücker.
Dieser Artikel ist ein Open Access-Artikel und steht unter den Creative Commons Lizenzbedingungen ( Er darf vervielfältigt, verbreitet und öffentlich zugänglich gemacht werden, vorausgesetzt dass Autor und Quelle genannt werden.



Background: In PET-CT imaging, observers delineate contours of a given tumor volume based on a series of images of uniform slice thickness. An observer characterises each pixel in a plane by allocating it to one of two categories, inside (category 1) or outside (category 0) the contoured region. The area of each region corresponds to the number of pixels in the corresponding category. Zijdenbos et al. [1] proposed a version of Cohen’s kappa for measuring agreement in contour delineation for the special case of two observers, called Kappa Index.

Statistical methods: We propose a generalisation of this method by replacing Cohen’s Kappa with Fleiss’ Kappa. Fleiss’ Kappa, denoted by Kappa(n), is a measure of association that generalizes Cohen’s kappa for n≥2 indistinguishable observers. First, for each pixel we determined the number of observers (0, 1, 2, . . . , n) who delineated the pixel. We then stratified the pixels in order to obtain A(i), the total union of areas selected by exactly i (i = 1, . . . , n) observers. We show how Kappa(n) is representable as a quotient of two weighted sums of the A(i) (i = 1, . . . , n).

Application and results: Kappa(n) was applied and compared to other methods, such as an average pairwise Kappa Index and Average Sensitivity, to analyse inter-centre variations in a multicentre trial on radiotherapy planning in locally advanced lung cancer (PET-Plan). A contouring Dummy Run was performed as part of the quality assurance program. Participating study centres were asked to define protocol-compliant clinical target volumes of the primary tumor and the involved mediastinal lymph node areas. Contouring was done twice, before and after a training program, while observers were not necessarily the same for both runs. Contours were imported into the ARTIVIEW software (AQUILAB SAS) for evaluation. Imaging data were then exported from the software and observer agreement was analysed using the freely available software R. Observer agreement was enhanced from Kappa(n) = 0.59 before training to Kappa(n) = 0.69 after training. Confidence intervals will be developed.

Conclusion: By contrast to average pairwise indices, Kappa(n) measures observer agreement for more than two observers using the full information about overlapping areas, while not distinguishing between observers. Thus it is particularly adequate for measuring observer agreement when identification of observers is not possible or desirable.


Zijdenbos AP, Dawant BM, Margolin RA, Palmer AC. Morphometric analysis of white matter lesions in MR images: Method and validation. IEEE Transactions of Medical Imaging. 1994;13(4):716-724.
Krummenauer F. Methoden zur Evaluation bildgebender Verfahren von begrenzter Reproduzierbarkeit. Aachen: Shaker Verlag; 2005. ISBN 3-8322-4000-4