gms | German Medical Science

65th Annual Meeting of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS), Meeting of the Central European Network (CEN: German Region, Austro-Swiss Region and Polish Region) of the International Biometric Society (IBS)

06.09. - 09.09.2020, Berlin (online conference)

Towards FAIR Research Data: Automatic Extraction of Metadata from Statistical Analysis

Meeting Abstract

  • Patric Tippmann - University Medical Center Freiburg, Freiburg, Germany
  • Jochen Knaus - University Medical Center Freiburg, Freiburg, Germany
  • Urs Alexander Fichtner - University Medical Center Freiburg, Freiburg, Germany
  • Boris A. Brühmann - University Medical Center Freiburg, Freiburg, Germany
  • Harald Binder - University Medical Center Freiburg, Freiburg, Germany
  • Martin Boeker - University Medical Center Freiburg, Freiburg, Germany
  • Daniela Zöller - University Medical Center Freiburg, Freiburg, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 65th Annual Meeting of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS), Meeting of the Central European Network (CEN: German Region, Austro-Swiss Region and Polish Region) of the International Biometric Society (IBS). Berlin, 06.-09.09.2020. Düsseldorf: German Medical Science GMS Publishing House; 2021. DocAbstr. 334

doi: 10.3205/20gmds172, urn:nbn:de:0183-20gmds1721

Veröffentlicht: 26. Februar 2021

© 2021 Tippmann et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Research Data Management (RDM) and FAIR (Findable, Accessible, Interoperable, and Reuseable) research data gain increasing importance in research projects. There are not only growing requirements imposed by both funders and publishers, but good RDM has also many advantages for the individual researcher. Although RDM saves time and scientific community resources in the long run, it is time-consuming and requires additional skills, tying up researchers' resources.

A central aspect of RDM and FAIR research data are metadata – data used to describe the research data itself – and these are often difficult to obtain. Yet, during statistical analysis, researchers already specify many of these metadata indirectly. We propose to automatically extract this information to obtain structured metadata with limited additional burden to the researcher.

We evaluate two strategies. First, we provide wrapper functions for standard statistical procedures. For example, we provide a function to obtain a basic description of the dataset, commonly used as “table one” in medical publications, which extracts information like important variable names and labels, variable coding and units, and the primary research focus in the background. Second, we provide a method for automatic extraction of information from standardized statistical result outputs in R. We propose to use Jupyter notebooks for this purpose and exemplarily demonstrate the integration into an RDM system, i.e. the connection to data storage and documentation.

In conclusion, we provide tools to automatically collect metadata during statistical analysis. This increases the quality of RDM with limited additional burden to the researcher. We plan to extend this framework to cover other steps of RDM.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.