gms | German Medical Science

Information Retrieval Meeting (IRM 2022)

10.06. - 11.06.2022, Köln

Automation of duplicate detection for systematic reviews

Meeting Abstract

Suche in Medline nach

  • corresponding author presenting/speaker Justin Michael Clark - Institute for Evidence-Based Healthcare, Australia
  • Hannah Greenwood - Institute for Evidence-Based Healthcare, Australia
  • Connor Forbes - Institute for Evidence-Based Healthcare, Australia

Information Retrieval Meeting (IRM 2022). Cologne, 10.-11.06.2022. Düsseldorf: German Medical Science GMS Publishing House; 2022. Doc22irm16

doi: 10.3205/22irm16, urn:nbn:de:0183-22irm160

Veröffentlicht: 8. Juni 2022

© 2022 Clark et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Introduction/Background: Systematic reviews (SRs) are considered the best way to answer a research question. However, they are resource intensive, taking on average, five staff, 67 weeks to complete at an average cost of USD $141,000. To overcome this resource burden, systematic review automation (SRA) tools have been developed to improve the speed of SR tasks, without compromising quality. A time-consuming task is to remove duplicate records from search results. This can take even experienced searchers hours to complete. We have designed an SRA tool “the Deduplicator” with the goal of greatly speeding up this process.

Methods: To evaluate the Deduplicator we will compare deduplication done manually and done with the Deduplicator on the following outcomes: 1) time required to deduplicate; 2) numbers of duplicates missed 3) number of non-duplicates removed. Two screeners will independently deduplicate 10 sets of search results. The first screener will do sets 1 to 5 manually, then sets 6 to 10 with the Deduplicator. The second screener will do the opposite, e.g., sets 1 to 5 with the Deduplicator, then sets 6 to 10 manually. If these results are promising, the evaluation will be expanded to a stronger study design, include additional sets of search results and more participants.

Results: The Deduplicator has been tested internally, on a test set of search results from published SRs, 9835 references in total. This testing shows a combined accuracy of 99.04%, (9741 out of 9835 references correctly classified). There was also a substantial time saving, with time for duplicate removal being reduced from one hour to 10 minutes, when done by an experienced person.

Conclusion: Early testing shows the Deduplicator increases the speed of duplicate detection, with no loss of quality. More robust results will be presented at the research conference.

Keywords: systematic reviews, automation, deduplication