Artikel
COVID-19 Image Open Repository meets GECCO – building an open and interoperable image dataset for COVID-19 research
Suche in Medline nach
Autoren
Veröffentlicht: | 24. September 2021 |
---|
Gliederung
Text
Introduction: Currently the COVID-19 pandemic remains clinically unpredictable and shows potential to quickly overload healthcare infrastructure [1], [2].
Innovative artificial intelligence (AI) techniques save medical staff time and can provide diagnoses cheaper and faster than standard laboratory methods [3]. Medical imaging modalities such as CT scans, MRI, etc. are essential for automated COVID-19 diagnosis based on AI [3]. Institutions donate data under open-data licence [3], [4]. In this context, a working group at Hannover Medical School has been continuously providing an anonymised repository containing extensive metadata such as admission, discharge, ICU, laboratory and patient master data using the COVID-19 DataMart of the Enterprise Clinical Research Data Warehouse (ECRDW) [5], [6], [7] since 05/2020.
The objective of the project was to extend and annotate this open data image repository [7] with the items of the German Corona Consensus dataset (GECCO) [8], [9] to achieve syntactic and semantic interoperability as well as standardisation for research purposes.
Methods: First, an investigation was conducted to identify an overview of relevant items and standards for interoperability in COVID-19 research [3], [9], [10], [11], [12], [13], [14], [15].
The identified items and candidate classification systems from GECCO [16] were prioritised based on previous projects and a self-defined scale [10], [17], [18]. The scale was defined according to the best practice principle based on empirical values and was differentiated from 5 "highly relevant" (substantial, general or specific relevance to COVID-19) to 1 ("not relevant").
The difference between the COVID-19 DataMart image repository and GECCO [9], published a few months later, was reconciled using the ECRDW metadata repository (master data on laboratory analyses, persons, vital signs, etc.) and the prioritised list. Suggested mappings of clinical data were developed using RELMA and SNOMED-CT.
Results: The search revealed that none of the open access repositories on medical imaging contained international terminologies or nomenclatures (ICD, LOINC or SNOMED-CT). Other COVID-19-relevant dataset definitions on medical facts [9], [10] predominantly used international terminologies and nomenclatures such as ICD-10, SNOMED-CT, LOINC, HL7, UCUM and ATC/DDD. A comparable project [10] to GECCO has been identified and used to prioritise items.
For items already available in the image repository, corresponding codes from GECCO (including ICD, LOINC, SNOMED-CT) were assigned.
Next, the difference between the data source (ECRDW COVID-19-DataMart) and GECCO was identified. Clinical data without assigned codes were proposed for inclusion in the COVID-19-DataMart in a structured report based on the prioritised list. Based on the available metadata, corresponding SNOMED-CT and LOINC codes were also suggested.
Discussion: An approach for enriching an published open data repository was demonstrated. The German-specific GECCO dataset was used as a prototype for this purpose.
eCRFs from the Pa-COVID-19 study and the LEOSS dataset [9] are either not available online or only available to a limited extent and could therefore not be considered in the prioritisation.
The difference between the ECRDW COVID-19 data mart and GECCO should be further minimised with the aim that other research projects can also benefit from a centrally available annotated data mart. After the expansion of the data mart, it is planned to republish the COVID-19 image repository on GitHub.
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.
References
- 1.
- Wiersinga WJ, Rhodes A, Cheng AC, Peacock SJ, Prescott HC. Pathophysiology, Transmission, Diagnosis, and Treatment of Coronavirus Disease 2019 (COVID-19): A Review. JAMA. 2020;324(8):782–793.
- 2.
- Nadkarni GN. An ounce of public health for COVID-19?. Sci Transl Med. 2020;12(541):eabb5675.
- 3.
- Shuja J, Alanazi E, Alasmary W, Alashaikh A. COVID-19 open source data sets: a comprehensive survey. Appl Intell. 2021;51:1296–1325. DOI: 10.1007/s10489-020-01862-6
- 4.
- European Institute for Biomedical Imaging Research. COVID-19 imaging datasets. [Accessed 28 April 2021]. Available from: https://www.eibir.org/covid-19-imaging-datasets/
- 5.
- Gerbel S, Laser H, Schönfeld N, Rassmann T. The hannover medical school enterprise clinical research data warehouse: 5 years of experience. In: Auer S, Vidal ME, editors. Data Integration in the Life Sciences. 13th International Conference, DILS 2018. Hannover, Germany, November 20-21, 2018. Proceedings. Cham: Springer International Publishing; 2019. (Lecture Notes in Bioinformatics; 11371). p. 182–194. DOI: 10.1007/978-3-030-06016-9
- 6.
- Medizinische Hochschule Hannover (MHH). Projekte aus Eigenmitteln der MHH. [Accessed 24 April 2021]. Available from: https://www.mhh.de/forschung/covid-19/mhh-projekte/eigenmittel
- 7.
- Winther HB, Laser H, Gerbel S, Maschke SK, Hinrichs JB, Vogel-Claussen J, et al. COVID-19 Image Repository. 2020. DOI: 10.6084/m9.figshare.12275009.v1
- 8.
- Nationales Forschungsnetzwerk der Universitätsmedizin zu Covid-19. [Accessed 28 April 2021]. Available from: https://www.netzwerk-universitaetsmedizin.de/
- 9.
- Sass J, Bartschke A, Lehne M, Essenwanger A, Rinaldi E, Rudolph S, et al. The German Corona Consensus Dataset (GECCO): a standardized dataset for COVID-19 research in university medicine and beyond. BMC Med Inform Decis Mak. 2020;20:341. DOI: 10.1186/s12911-020-01374-w
- 10.
- Pedrera-Jiménez M, García-Barrio N, Cruz-Rojo J, Terriza-Torres AI, López-Jiménez EA, Calvo-Boyero F, et al. Obtaining EHR-derived datasets for COVID-19 research within a short time: a flexible methodology based on Detailed Clinical Models. Journal of Biomedical Informatics. 2021;115:103697. DOI: 10.1016/j.jbi.2021.103697
- 11.
- Task force italiana. Risorse dati su Covid-19. [Accessed 28 April 2021]. Available from: https://dati-covid.italia.it/.
- 12.
- European Institute for Biomedical Imaging Research (EIBIR). COVID-19 imaging datasets. [Accessed 28 April 2021]. Available from: https://www.eibir.org/covid-19-imaging-datasets/
- 13.
- Cohen JP, Morrison P, Dao L. COVID-19 Image Data Collection: Prospective Predictions Are the Future [Preprint]. ArxXiv. 2020. arXiv:2006.11988. Available from: https://arxiv.org/abs/2006.11988
- 14.
- Cohen JP. covid-chestxray-dataset. Available from: https://github.com/ieee8023/covid-chestxray-dataset
- 15.
- Tsai EB, Simpson S, Lungren MP, Hershman M, Roshkovan L, Colak E, et al. The RSNA International COVID-19 Open Radiology Database (RICORD). Radiology. 2021;299(1)::E204–E213. DOI: 10.1148/radiol.2021203957
- 16.
- The German Corona Consensus Dataset (GECCO): a standardized dataset for COVID-19 research in university medicine and beyond. 2020 [Accessed 28 April 2021]. Available from: https://art-decor.org/art-decor/decor-datasets--covid19f-?id=2.16.840.1.113883.3.1937.777.53.1.1&effectiveDate=2020-04-08T13%3A04%3A13&language=de-DE
- 17.
- WHO tool for behavioural insights on COVID-19. [Accessed 28 April 2021]. Available from: https://www.euro.who.int/en/health-topics/health-emergencies/coronavirus-covid-19/technical-guidance/who-tool-for-behavioural-insights-on-covid-19
- 18.
- ISARIC. COVID-19 CRF. [Accessed 24 April 2021]. Available from: https://isaric.org/research/covid-19-clinical-research-resources/covid-19-crf/