gms | German Medical Science

66. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 12. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e. V. (TMF)

26. - 30.09.2021, online

COVID-19 Image Open Repository meets GECCO – building an open and interoperable image dataset for COVID-19 research

Meeting Abstract

  • Katharina Cohrs - Hochschule Hannover (HsH) - University of Applied Sciences and Arts, Faculty III, Media, Information and Design, Hannover, Germany
  • Ranim Ashkar - Hochschule Hannover (HsH) - University of Applied Sciences and Arts, Faculty III, Media, Information and Design, Hannover, Germany
  • Maja Blazevic - Hochschule Hannover (HsH) - University of Applied Sciences and Arts, Faculty III, Media, Information and Design, Hannover, Germany
  • Philip Falkewitz - Hochschule Hannover (HsH) - University of Applied Sciences and Arts, Faculty III, Media, Information and Design, Hannover, Germany
  • Mahshid Ghasempour Moghaddam - Hochschule Hannover (HsH) - University of Applied Sciences and Arts, Faculty III, Media, Information and Design, Hannover, Germany
  • Grebe Laura - Hochschule Hannover (HsH) - University of Applied Sciences and Arts, Faculty III, Media, Information and Design, Hannover, Germany
  • Hossna Khoskam - Hochschule Hannover (HsH) - University of Applied Sciences and Arts, Faculty III, Media, Information and Design, Hannover, Germany
  • Souzan Murad - Hochschule Hannover (HsH) - University of Applied Sciences and Arts, Faculty III, Media, Information and Design, Hannover, Germany
  • Maryna Sermus - Hochschule Hannover (HsH) - University of Applied Sciences and Arts, Faculty III, Media, Information and Design, Hannover, Germany
  • Hans Laser - Hannover Medical School (MHH), Centre for Information Management (ZIMt), Hannover, Germany
  • Svetlana Gerbel - Hannover Medical School (MHH), Centre for Information Management (ZIMt), Hannover, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 66. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 12. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). sine loco [digital], 26.-30.09.2021. Düsseldorf: German Medical Science GMS Publishing House; 2021. DocAbstr. 151

doi: 10.3205/21gmds110, urn:nbn:de:0183-21gmds1102

Published: September 24, 2021

© 2021 Cohrs et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction: Currently the COVID-19 pandemic remains clinically unpredictable and shows potential to quickly overload healthcare infrastructure [1], [2].

Innovative artificial intelligence (AI) techniques save medical staff time and can provide diagnoses cheaper and faster than standard laboratory methods [3]. Medical imaging modalities such as CT scans, MRI, etc. are essential for automated COVID-19 diagnosis based on AI [3]. Institutions donate data under open-data licence [3], [4]. In this context, a working group at Hannover Medical School has been continuously providing an anonymised repository containing extensive metadata such as admission, discharge, ICU, laboratory and patient master data using the COVID-19 DataMart of the Enterprise Clinical Research Data Warehouse (ECRDW) [5], [6], [7] since 05/2020.

The objective of the project was to extend and annotate this open data image repository [7] with the items of the German Corona Consensus dataset (GECCO) [8], [9] to achieve syntactic and semantic interoperability as well as standardisation for research purposes.

Methods: First, an investigation was conducted to identify an overview of relevant items and standards for interoperability in COVID-19 research [3], [9], [10], [11], [12], [13], [14], [15].

The identified items and candidate classification systems from GECCO [16] were prioritised based on previous projects and a self-defined scale [10], [17], [18]. The scale was defined according to the best practice principle based on empirical values and was differentiated from 5 "highly relevant" (substantial, general or specific relevance to COVID-19) to 1 ("not relevant").

The difference between the COVID-19 DataMart image repository and GECCO [9], published a few months later, was reconciled using the ECRDW metadata repository (master data on laboratory analyses, persons, vital signs, etc.) and the prioritised list. Suggested mappings of clinical data were developed using RELMA and SNOMED-CT.

Results: The search revealed that none of the open access repositories on medical imaging contained international terminologies or nomenclatures (ICD, LOINC or SNOMED-CT). Other COVID-19-relevant dataset definitions on medical facts [9], [10] predominantly used international terminologies and nomenclatures such as ICD-10, SNOMED-CT, LOINC, HL7, UCUM and ATC/DDD. A comparable project [10] to GECCO has been identified and used to prioritise items.

For items already available in the image repository, corresponding codes from GECCO (including ICD, LOINC, SNOMED-CT) were assigned.

Next, the difference between the data source (ECRDW COVID-19-DataMart) and GECCO was identified. Clinical data without assigned codes were proposed for inclusion in the COVID-19-DataMart in a structured report based on the prioritised list. Based on the available metadata, corresponding SNOMED-CT and LOINC codes were also suggested.

Discussion: An approach for enriching an published open data repository was demonstrated. The German-specific GECCO dataset was used as a prototype for this purpose.

eCRFs from the Pa-COVID-19 study and the LEOSS dataset [9] are either not available online or only available to a limited extent and could therefore not be considered in the prioritisation.

The difference between the ECRDW COVID-19 data mart and GECCO should be further minimised with the aim that other research projects can also benefit from a centrally available annotated data mart. After the expansion of the data mart, it is planned to republish the COVID-19 image repository on GitHub.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Wiersinga WJ, Rhodes A, Cheng AC, Peacock SJ, Prescott HC. Pathophysiology, Transmission, Diagnosis, and Treatment of Coronavirus Disease 2019 (COVID-19): A Review. JAMA. 2020;324(8):782–793.
2.
Nadkarni GN. An ounce of public health for COVID-19?. Sci Transl Med. 2020;12(541):eabb5675.
3.
Shuja J, Alanazi E, Alasmary W, Alashaikh A. COVID-19 open source data sets: a comprehensive survey. Appl Intell. 2021;51:1296–1325. DOI: 10.1007/s10489-020-01862-6 External link
4.
European Institute for Biomedical Imaging Research. COVID-19 imaging datasets. [Accessed 28 April 2021]. Available from: https://www.eibir.org/covid-19-imaging-datasets/ External link
5.
Gerbel S, Laser H, Schönfeld N, Rassmann T. The hannover medical school enterprise clinical research data warehouse: 5 years of experience. In: Auer S, Vidal ME, editors. Data Integration in the Life Sciences. 13th International Conference, DILS 2018. Hannover, Germany, November 20-21, 2018. Proceedings. Cham: Springer International Publishing; 2019. (Lecture Notes in Bioinformatics; 11371). p. 182–194. DOI: 10.1007/978-3-030-06016-9 External link
6.
Medizinische Hochschule Hannover (MHH). Projekte aus Eigenmitteln der MHH. [Accessed 24 April 2021]. Available from: https://www.mhh.de/forschung/covid-19/mhh-projekte/eigenmittel External link
7.
Winther HB, Laser H, Gerbel S, Maschke SK, Hinrichs JB, Vogel-Claussen J, et al. COVID-19 Image Repository. 2020. DOI: 10.6084/m9.figshare.12275009.v1 External link
8.
Nationales Forschungsnetzwerk der Universitätsmedizin zu Covid-19. [Accessed 28 April 2021]. Available from: https://www.netzwerk-universitaetsmedizin.de/ External link
9.
Sass J, Bartschke A, Lehne M, Essenwanger A, Rinaldi E, Rudolph S, et al. The German Corona Consensus Dataset (GECCO): a standardized dataset for COVID-19 research in university medicine and beyond. BMC Med Inform Decis Mak. 2020;20:341. DOI: 10.1186/s12911-020-01374-w External link
10.
Pedrera-Jiménez M, García-Barrio N, Cruz-Rojo J, Terriza-Torres AI, López-Jiménez EA, Calvo-Boyero F, et al. Obtaining EHR-derived datasets for COVID-19 research within a short time: a flexible methodology based on Detailed Clinical Models. Journal of Biomedical Informatics. 2021;115:103697. DOI: 10.1016/j.jbi.2021.103697 External link
11.
Task force italiana. Risorse dati su Covid-19. [Accessed 28 April 2021]. Available from: https://dati-covid.italia.it/. External link
12.
European Institute for Biomedical Imaging Research (EIBIR). COVID-19 imaging datasets. [Accessed 28 April 2021]. Available from: https://www.eibir.org/covid-19-imaging-datasets/ External link
13.
Cohen JP, Morrison P, Dao L. COVID-19 Image Data Collection: Prospective Predictions Are the Future [Preprint]. ArxXiv. 2020. arXiv:2006.11988. Available from: https://arxiv.org/abs/2006.11988 External link
14.
Cohen JP. covid-chestxray-dataset. Available from: https://github.com/ieee8023/covid-chestxray-dataset External link
15.
Tsai EB, Simpson S, Lungren MP, Hershman M, Roshkovan L, Colak E, et al. The RSNA International COVID-19 Open Radiology Database (RICORD). Radiology. 2021;299(1)::E204–E213. DOI: 10.1148/radiol.2021203957 External link
16.
The German Corona Consensus Dataset (GECCO): a standardized dataset for COVID-19 research in university medicine and beyond. 2020 [Accessed 28 April 2021]. Available from: https://art-decor.org/art-decor/decor-datasets--covid19f-?id=2.16.840.1.113883.3.1937.777.53.1.1&effectiveDate=2020-04-08T13%3A04%3A13&language=de-DE External link
17.
WHO tool for behavioural insights on COVID-19. [Accessed 28 April 2021]. Available from: https://www.euro.who.int/en/health-topics/health-emergencies/coronavirus-covid-19/technical-guidance/who-tool-for-behavioural-insights-on-covid-19 External link
18.
ISARIC. COVID-19 CRF. [Accessed 24 April 2021]. Available from: https://isaric.org/research/covid-19-clinical-research-resources/covid-19-crf/ External link