Article
Can we use automated preprint screening to improve data visualization?
Search Medline for
Authors
Published: | February 26, 2021 |
---|
Outline
Text
Background: Preprints rest on the idea that authors will receive and implement feedback before publication; yet this theory has not been systematically tested. Comments on pre-prints are rare, and the proportion of authors who receive feedback privately is unknown. Automated screening tools could fill this gap by identifying papers with common problems. These tools also offer opportunities for interventions.
One common problem is the use of bar graphs to present continuous data. Many datasets can lead to the same bar graph. The actual data may suggest different conclusions from the summary statistics alone.
Methods: We developed Barzooka, an automated screening tool that detects bar graphs and more informative alternatives (dot plots, box plots, violin plots, histograms) in scientific publications. The training dataset included 1,000-5,000 PDF pages containing each graph type, extracted from 14,000 publications. This dataset was used to train a convolutional neural network to predict the graph types present on each page. The network was then integrated into a workflow that predicts which graph types are present in a PDF.
We will use this tool to determine whether the use of bar graphs, dot plots, box plots, violin plots and histograms differs among disciplines. Given recent efforts to encourage authors to use more informative graphics, we will also determine whether the use of different graphs is changing over time.
Additionally, we will conduct randomized controlled trials to determine whether preprint screening improves reporting. Our collaborative network includes investigators who have developed different screening tools. We will run parallel trials for each tool. Our tool will be used to identify new preprints that include bar graphs of continuous data. Publications will be randomized to three groups: no intervention, a comment on bioRxiv, or an email to the corresponding author. The email or comment will describe the problems with bar graphs, and provide information on how to replace bar graphs with more informative graphics.
Results: Our automated screening tool identifies the different graph types with high accuracy (F1 scores 0.80-0.94 per category).
The trial will determine whether interventions increase the likelihood that authors will replace bar graphs with more informative graphics in the published article.
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.