Article
Towards FAIR Research Data: Automatic Extraction of Metadata from Statistical Analysis
Search Medline for
Authors
Published: | February 26, 2021 |
---|
Outline
Text
Research Data Management (RDM) and FAIR (Findable, Accessible, Interoperable, and Reuseable) research data gain increasing importance in research projects. There are not only growing requirements imposed by both funders and publishers, but good RDM has also many advantages for the individual researcher. Although RDM saves time and scientific community resources in the long run, it is time-consuming and requires additional skills, tying up researchers' resources.
A central aspect of RDM and FAIR research data are metadata – data used to describe the research data itself – and these are often difficult to obtain. Yet, during statistical analysis, researchers already specify many of these metadata indirectly. We propose to automatically extract this information to obtain structured metadata with limited additional burden to the researcher.
We evaluate two strategies. First, we provide wrapper functions for standard statistical procedures. For example, we provide a function to obtain a basic description of the dataset, commonly used as “table one” in medical publications, which extracts information like important variable names and labels, variable coding and units, and the primary research focus in the background. Second, we provide a method for automatic extraction of information from standardized statistical result outputs in R. We propose to use Jupyter notebooks for this purpose and exemplarily demonstrate the integration into an RDM system, i.e. the connection to data storage and documentation.
In conclusion, we provide tools to automatically collect metadata during statistical analysis. This increases the quality of RDM with limited additional burden to the researcher. We plan to extend this framework to cover other steps of RDM.
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.