gms | German Medical Science

65th Annual Meeting of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS), Meeting of the Central European Network (CEN: German Region, Austro-Swiss Region and Polish Region) of the International Biometric Society (IBS)

06.09. - 09.09.2020, Berlin (online conference)

Can we use automated preprint screening to improve data visualization?

Meeting Abstract

Suche in Medline nach

  • Nico Riedel - QUEST – Center, Berlin Institute of Health, Berlin, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 65th Annual Meeting of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS), Meeting of the Central European Network (CEN: German Region, Austro-Swiss Region and Polish Region) of the International Biometric Society (IBS). Berlin, 06.-09.09.2020. Düsseldorf: German Medical Science GMS Publishing House; 2021. DocAbstr. 167

doi: 10.3205/20gmds100, urn:nbn:de:0183-20gmds1002

Veröffentlicht: 26. Februar 2021

© 2021 Riedel.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Background: Preprints rest on the idea that authors will receive and implement feedback before publication; yet this theory has not been systematically tested. Comments on pre-prints are rare, and the proportion of authors who receive feedback privately is unknown. Automated screening tools could fill this gap by identifying papers with common problems. These tools also offer opportunities for interventions.

One common problem is the use of bar graphs to present continuous data. Many datasets can lead to the same bar graph. The actual data may suggest different conclusions from the summary statistics alone.

Methods: We developed Barzooka, an automated screening tool that detects bar graphs and more informative alternatives (dot plots, box plots, violin plots, histograms) in scientific publications. The training dataset included 1,000-5,000 PDF pages containing each graph type, extracted from 14,000 publications. This dataset was used to train a convolutional neural network to predict the graph types present on each page. The network was then integrated into a workflow that predicts which graph types are present in a PDF.

We will use this tool to determine whether the use of bar graphs, dot plots, box plots, violin plots and histograms differs among disciplines. Given recent efforts to encourage authors to use more informative graphics, we will also determine whether the use of different graphs is changing over time.

Additionally, we will conduct randomized controlled trials to determine whether preprint screening improves reporting. Our collaborative network includes investigators who have developed different screening tools. We will run parallel trials for each tool. Our tool will be used to identify new preprints that include bar graphs of continuous data. Publications will be randomized to three groups: no intervention, a comment on bioRxiv, or an email to the corresponding author. The email or comment will describe the problems with bar graphs, and provide information on how to replace bar graphs with more informative graphics.

Results: Our automated screening tool identifies the different graph types with high accuracy (F1 scores 0.80-0.94 per category).

The trial will determine whether interventions increase the likelihood that authors will replace bar graphs with more informative graphics in the published article.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.