gms | German Medical Science

G-I-N Conference 2012

Guidelines International Network

22.08 - 25.08.2012, Berlin

The Grading of Recommendations Assessment, Development and Evaluation Reliability Study (the GRADERS)

Meeting Abstract

  • R. Mustafa - Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Canada
  • N. Santesso - Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Canada
  • J. Brozek - Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Canada; Department of Medicine, McMaster University, Hamilton, Canada
  • E. Akl - Department of Medicine, State University of New York at Buffalo, Buffalo, United States; Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Canada
  • H. Schünemann - Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Canada; Department of Medicine, McMaster University, Hamilton, Canada

Guidelines International Network. G-I-N Conference 2012. Berlin, 22.-25.08.2012. Düsseldorf: German Medical Science GMS Publishing House; 2012. DocO20

DOI: 10.3205/12gin052, URN: urn:nbn:de:0183-12gin0525

Published: July 10, 2012

© 2012 Mustafa et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc-nd/3.0/deed.en). You are free: to Share – to copy, distribute and transmit the work, provided the original author and source are credited.


Outline

Text

Background: The Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach has been widely adopted by guideline developers for summarizing, grading and presenting evidence.

Objectives:

  • Evaluate the inter-rater reliability of assessing quality of evidence (QoE) using the GRADE approach.
  • Evaluate the effect of various baseline characteristics on predicting reliability.
  • Evaluate the effect of assessing QoE in duplicate on the reliability.

Methods: In the first exercise, raters independently assessed the QoE of 4 outcomes from 4 systematic reviews. We then randomly paired raters and asked them to submit a rating of the QoE based on consensus. Investigators, data abstractors and data analyzers were all blinded to raters’ identification. The primary statistical analysis for inter-rater reliability will be based on both crude agreement and chance-corrected agreement using weighted kappa statistics. We will adjust our analysis to level of training and experience with the GRADE approach.

Results: 28 volunteer raters participated in this study. They had a range of background knowledge about systematic review methodology and GRADE. Results of the analyses will be presented.

Discussion: It is challenging to account for the close-call judgments when assigning quality of evidence in four categories.

Implications for guideline developers: This study will help identifying sources of poor reliability and confusion about the GRADE approach. It will inform future development of training materials. Additionally, this study will inform the decision about the minimal required training for raters to reliably use the GRADE approach and about the need for duplicate assessment of QoE when using GRADE.