gms | German Medical Science

21. Jahrestagung des Deutschen Netzwerks Evidenzbasierte Medizin e. V.

Deutsches Netzwerk Evidenzbasierte Medizin e. V.

13. - 15.02.2020, Basel, Schweiz

The reliability, usability, and applicability of tools to appraise quality and risk of bias in systematic reviews: a prospective evaluation of AMSTAR, AMSTAR 2 and ROBIS

Meeting Abstract

  • Michelle Gates - University of Alberta, Alberta Research Centre for Health Evidence, Department of Pediatrics, Alberta, Kanada
  • Allison Gates - University of Alberta, Alberta Research Centre for Health Evidence, Department of Pediatrics, Alberta, Kanada
  • Barbara Prediger - Universität Witten/Herdecke, Institut für Forschung in der Operativen Medizin, Department für Humanmedizin, Deutschland
  • Monika Becker - Universität Witten/Herdecke, Institut für Forschung in der Operativen Medizin, Department für Humanmedizin, Deutschland
  • Gonçalo Duarte - University of Lisbon, Clinical Pharmacology Unit, Instituto de Medicina Molecular, Lisbon, Portugal
  • Maria Cary - University of Lisbon, Clinical Pharmacology Unit, Instituto de Medicina Molecular, Lisbon, Portugal
  • Ben Vandermeer - University of Alberta, Alberta Research Centre for Health Evidence, Department of Pediatrics, Alberta, Kanada
  • Ricardo Fernandes - University of Lisbon, Clinical Pharmacology Unit, Instituto de Medicina Molecular, Lisbon, Portugal; Santa Maria Hospital, Department of Pediatrics, Portugal
  • Dawid Pieper - Universität Witten/Herdecke, Institut für Forschung in der Operativen Medizin, Department für Humanmedizin, Deutschland
  • Lisa Hartling - University of Alberta, Alberta Research Centre for Health Evidence, Department of Pediatrics, Alberta, Kanada

Nützliche patientenrelevante Forschung. 21. Jahrestagung des Deutschen Netzwerks Evidenzbasierte Medizin. Basel, Schweiz, 13.-15.02.2020. Düsseldorf: German Medical Science GMS Publishing House; 2020. Doc20ebmPP8-03

doi: 10.3205/20ebm106, urn:nbn:de:0183-20ebm1062

Published: February 12, 2020

© 2020 Gates et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Background/research question: Readers of systematic reviews (SRs) and overview authors require valid, reliable, and practical means to evaluate the methodological quality and risk of bias of SRs.

To evaluate and compare the interrater and inter-centre reliability, usability, and applicability of three available tools for SRs: AMSTAR, AMSTAR 2, and ROBIS.

Methods: Using a random sample of 30 SRs of randomized trials, two reviewers at each of three collaborating centres (Canada, Germany, and Portugal) independently applied AMSTAR, AMSTAR 2, and ROBIS and reached consensus. To test for inter-rater reliability between pairs of reviewers and consensus decisions between centres, we used Gwet’s AC1 statistic. To estimate usability, we calculated the median (interquartile range (IQR)) time to complete the appraisal and reach consensus for each tool.

Results: The median (IQR) time for reviewers to complete the assessments was 15.7 (11.3), 19.7 (12.1), and 28.7 (17.4) minutes for AMSTAR, AMSTAR 2, and ROBIS respectively. The time to reach consensus was 2.6 (3.2), 4.6 (5.3), and 10.9 (10.8) minutes for AMSTAR, AMSTAR 2, and ROBIS, respectively. Interrater reliability varied by centre, but across all centres was substantial to almost perfect for 8/11 (73%) AMSTAR, 8/16 (50%) AMSTAR 2, and 12/24 (50%) ROBIS items. Inter-centre reliability was substantial to almost perfect for 6/11 (55%) AMSTAR, 10/16 (63%) AMSTAR 2, and 7/24 (29%) ROBIS items. Agreement on confidence in the results of the review (AMSTAR 2) ranged from slight (AC1 0.05, 95% CI -0.17 to 0.27) to perfect (1.00) between reviewers and moderate (AC1 0.58, 95% CI 0.30 to 0.85) to substantial (AC1 0.74, 95% CI 0.30 to 0.85) across centres. Agreement on overall risk of bias in the SR (ROBIS) ranged from moderate (AC1 0.47, 95% CI 0.17 to 0.77) to almost perfect (AC1 0.96, 95% CI 0.89 to 1.00) between reviewers and from poor (AC1 -0.21, 95% CI -0.55 to 0.13) to moderate (AC1 0.56, 95% CI 0.30 to 0.83) between centres.

Conclusion: Compared to AMSTAR 2 and ROBIS, reviewers completed AMSTAR appraisals the quickest and obtained substantial agreement for a greater number (most) of items. Low levels of inter-centre reliability, particularly on overall AMSTAR 2 and ROBIS ratings, is concerning as it limits readers’ ability to interpret the ratings applied by varied review groups. Improved documentation may be needed to assist reviewers in consistently interpreting and applying each tool’s supporting guidance.