gms | German Medical Science

Information Retrieval Meeting (IRM 2022)

10.06. - 11.06.2022, Cologne

Automation of duplicate detection for systematic reviews

Meeting Abstract

Search Medline for

  • corresponding author presenting/speaker Justin Michael Clark - Institute for Evidence-Based Healthcare, Australia
  • Hannah Greenwood - Institute for Evidence-Based Healthcare, Australia
  • Connor Forbes - Institute for Evidence-Based Healthcare, Australia

Information Retrieval Meeting (IRM 2022). Cologne, 10.-11.06.2022. Düsseldorf: German Medical Science GMS Publishing House; 2022. Doc22irm16

doi: 10.3205/22irm16, urn:nbn:de:0183-22irm160

Published: June 8, 2022

© 2022 Clark et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction/Background: Systematic reviews (SRs) are considered the best way to answer a research question. However, they are resource intensive, taking on average, five staff, 67 weeks to complete at an average cost of USD $141,000. To overcome this resource burden, systematic review automation (SRA) tools have been developed to improve the speed of SR tasks, without compromising quality. A time-consuming task is to remove duplicate records from search results. This can take even experienced searchers hours to complete. We have designed an SRA tool “the Deduplicator” with the goal of greatly speeding up this process.

Methods: To evaluate the Deduplicator we will compare deduplication done manually and done with the Deduplicator on the following outcomes: 1) time required to deduplicate; 2) numbers of duplicates missed 3) number of non-duplicates removed. Two screeners will independently deduplicate 10 sets of search results. The first screener will do sets 1 to 5 manually, then sets 6 to 10 with the Deduplicator. The second screener will do the opposite, e.g., sets 1 to 5 with the Deduplicator, then sets 6 to 10 manually. If these results are promising, the evaluation will be expanded to a stronger study design, include additional sets of search results and more participants.

Results: The Deduplicator has been tested internally, on a test set of search results from published SRs, 9835 references in total. This testing shows a combined accuracy of 99.04%, (9741 out of 9835 references correctly classified). There was also a substantial time saving, with time for duplicate removal being reduced from one hour to 10 minutes, when done by an experienced person.

Conclusion: Early testing shows the Deduplicator increases the speed of duplicate detection, with no loss of quality. More robust results will be presented at the research conference.

Keywords: systematic reviews, automation, deduplication