gms | German Medical Science

63. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

02. - 06.09.2018, Osnabrück

Wikidata as semantic representation platform of the scientific achievements of the biomedical Collaborative Research Centre 1002

Meeting Abstract

Search Medline for

  • Markus Suhr - University Medical Center Göttingen, Department of Medical Informatics, Göttingen, Deutschland
  • Najko Jahn - Göttingen State and University Library, Göttingen, Deutschland
  • Daniel Mietchen - Data Science Institute, University of Virginia, Charlottesville, USA
  • Harald Kusch - University Medical Center Göttingen, Department of Medical Informatics, Göttingen, Deutschland

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 63. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Osnabrück, 02.-06.09.2018. Düsseldorf: German Medical Science GMS Publishing House; 2018. DocAbstr. 195

doi: 10.3205/18gmds173, urn:nbn:de:0183-18gmds1734

Published: August 27, 2018

© 2018 Suhr et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introducation: Digitization of research data bears a great potential for citizen science in terms of public participation in generation of knowledge. In recent years, the popular Wikipedia ecosystem is increasingly applied as dissemination and public representation platform of research activities. Wikidata is a collaborative knowledge base serving Wikipedia and related projects from the Wikimedia foundation. Sharing knowledge via Wikidata facilitates public engagement to review, edit, and enhance scientific findings. Here, we present a pilot data model and integration process to add the publication record of the cardiologic Collaborative Research Centre 1002 (CRC1002) to Wikidata in order to enable publicly available cross-linking of publications with authors, funders, projects and topics.

State of the Art: Data in Wikidata can be leveraged for semantic queries using SPARQL (https://query.wikidata.org). Representing scholarly communication in Wikidata is an ongoing effort which produced a set of helpful tools for data entry, curation, query and display [1]. The CRC1002 is an active research consortium with a dedicated “INF” sub-project developing IT infrastructure for FAIR research data management. Publication output by the CRC1002 is available as a curated interactive list on the project's joint website (https://sfb1002.med.uni-goettingen.de) [2]. Information on research funding resources is usually available as unstandardised note within the "acknowledgement" section in journal articles. Metadata interoperability projects like OpenAIRE aim to semantically harmonize and preserve these associations [3].

Concept: Wikidata stores data as triplets of "item", "property" and "value", e.g. item "CRC1002" (unique Wikidata identifier Q48693816), property "instance of" (P31), value "Collaborative Research Center" (Q2300983). Scholarly publications are increasingly being indexed in Wikidata, where they can be annotated with the "sponsor" property (P859). Using available Wikidata associated tools in a semi-automated workflow, publication records of projects can be added into Wikidata.

Implementation: We cross-referenced the list of CRC1002 publications with Wikidata items using the existing tool set. Articles not yet part of Wikidata could be added through API calls. The “sponsor” property was finally appended to all items using the “Quick Statements” tool that performs bulk-operations based on tabular input. Authors of articles can be identified and cross-linked using ORCID. Immediate output of the initial case study is the representation of all 205 scientific articles currently reported by the CRC1002 in Wikidata (https://tools.wmflabs.org/scholia/sponsor/Q48693816). An exemplary conference paper from the INF sub-project fully interconnected with Wikidata properties is https://wikidata.org/wiki/Q48775419.

Lessons Learned: The proposed workflow leverages a set of metadata aggregation and editing tools in a way that could be automated using API calls with minimal programming effort. The emergent nature of Wikidata as openly editable collaborative knowledge base imposes challenges for mapping real-world relations like identifying best suitable properties and reconstructing already available related data models. Possible solutions are constraint and validation languages, increasing data quality at the cost of schema modelling [4]. Representing scholarly outputs of research consortia in Wikidata is a useful prerequisite to facilitate the generation of Wikidata based articles in Wikipedia to further increase public visibility of scientific knowledge.

Funding by the DFG for the Collaborative Research Centre 1002 on Modulatory Units in Heart Failure, subproject INF.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Nielsen FĊ, Mietchen D, Willighagen E. Scholia, Scientometrics and Wikidata. In: The Semantic Web: ESWC 2017 Satellite Events. Springer; 2017. (Lecture Notes in Computer Science book series; 10577). p. 237-59. DOI:10.1007/978-3-319-70407-4_36 External link
2.
Kusch H, Schmitt O, Marzec B, Nussbeck SY. Datenorganisation eines klinischen Sonderforschungsbereiches in einer integrierten, langfristig verfügbaren Forschungsdatenplattform. In: GMDS 2015. 60. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Krefeld, 06.-09.09.2015. Düsseldorf: German Medical Science GMS Publishing House; 2015. DocAbstr. 123. DOI:10.3205/15gmds104 External link
3.
Houssos N, Jörg B, Dvořák J, Príncipe P, Rodrigues E, Manghi P, et al. OpenAIRE Guidelines for CRIS Managers: Supporting Interoperability of Open Research Information through Established Standards. Procedia Computer Science. 2014;33:33–8. DOI: 10.1016/j.procs.2014.06.006 External link
4.
Thornton K, Solbrig H, Stupp GS, Labra Gayo JE, Mietchen D, Prud’hommeaux E, et al. Using Shape Expressions (ShEx) to Share RDF Data Models and to Guide Curation with Rigorous Validation. Zenodo; 2018. DOI: 10.5281/zenodo.1214521 External link