Article
omopRds: transfer of data models from OMOP to DataSHIELD/Opal
Search Medline for
Authors
Published: | September 6, 2019 |
---|
Outline
Text
Introduction: Distributed data analysis across university hospitals is greatly facilitated by a common data model (CDM) and shared vocabulary; and privacy-preserving data analysis. The former is implemented in the OMOP (Observational Medical Outcomes Partnership) CDM [1], and the latter can be achieved with the DataSHIELD platform [2], [3] on top of the Opal data warehouse (http://www.obiba.org/pages/products/opal). The German MIRACUM research consortium uses these two technologies in unison to achieve distributed privacy-preserving data analysis [4], [5], [6], [7].
With a uniform data model for a given clinical use case in the OMOP CDM, it still needs to be made available in the DataSHIELD/Opal infrastructure. The objective of this work is to describe the omopRds package which enables the transfer of data models from OMOP CDM to DataSHIELD/Opal in a seamless manner.
Implementation: R was chosen for the implementation, as it is an essential component of both OMOP/OHDSI (Observational Health Data Sciences and Informatics) and DataSHIELD. Furthermore, it allows one to integrate the current implementation directly into the existing OMOP/OHDSI user interface at a later stage.
The herewith described implementation enables not only the transfer of the data but also the associated concepts (e.g. data types) from the OMOP CDM. To that end, methods and data structures for (i) querying the OMOP database, (ii) transferring concepts from OMOP to DataSHIELD/Opal, and (iii) uploading the data to DataSHIELD/Opal were implemented. The omopRds functionality builds upon the OHDSI DatabaseConnector (https://github.com/OHDSI/DatabaseConnector) and SqlRender (https://github.com/OHDSI/SqlRender/) packages to enable querying the OMOP database directly from R in (i), and opalr (https://cran.r-project.org/web/packages/opalr/index.html) package to talk to the DataSHIELD/Opal server REST interface in (iii).
Well-defined OMOP domains and vocabularies (e.g. gender, ethnicity) could be handled automatically, whereas it was necessary to build more complex data types step by step. In particular, the functional programming paradigm of R was exploited to map OMOP domains to DataSHIELD/Opal variables.
Discussion: Our omopRds package facilitates the transfer of data models from OMOP to DataSHIELD/Opal. A notable feature of omopRds is the ability to transfer data models at different levels of complexity: e.g. for the OMOP "Gender" domain, the standard IDs (8507, 8532) are difficult to interpret without the associated codes ("M", "F") and/or name ("Male", "Female"). Unfortunately, the OMOP vocabulary does not provide a standard code for "diverse". Instead, the generic concept (ID = 0) can be annotated with the "Diverse" description. All of this information: domain, ID, code, name, and description - can be mapped to Opal variables, attributes, and categories using omopRds.
The omopRds provides a holistic solution in R, either by integrating API calls directly into the package or by providing a wrapper function around the Opal command-line scripting tool (http://opaldoc.obiba.org/en/latest/python-user-guide/index.html; e.g. to import the DataSHIELD compatible uploaded datasets into Opal data schemas).
Following the plan outlined in the MI-I-Core-Dataset(8), the initial application of omopRds is in transferring the Base Module from OMOP to DataSHIELD/Opal.
Acknowledgements: We thank Raphael Scheible for his comment on citation formatting. MIRACUM is funded by the German Federal Ministry of Education and Research (BMBF) in the Medical Informatics Initiative (FKZ 01ZZ1606H).
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.
References
- 1.
- Hripcsak G, Duke JD, Shah NH, Reich CG, Huser V, Schuemie MJ, Suchard MA, Park RW, Wong IC, Rijnbeek PR, Van Der Lei J. Observational Health Data Sciences and Informatics (OHDSI): opportunities for observational researchers. Studies in health technology and informatics. 2015;216:574.
- 2.
- Wilson RC, Butters OW, Avraam D, Baker J, Tedds JA, Turner A, Murtagh M, Burton PR. DataSHIELD – new directions and dimensions. Data Science Journal. 2017 Apr 19;16(21):1-21. DOI: 10.5334/dsj-2017-021
- 3.
- Wolfson M, Wallace SE, Masca N, Rowe G, Sheehan NA, Ferretti V, LaFlamme P, Tobin MD, Macleod J, Little J, Fortier I. DataSHIELD: resolving a conflict in contemporary bioscience — performing a pooled analysis of individual-level data without sharing the data. International journal of epidemiology. 2010 Jul 14;39(5):1372-82. DOI: 10.1093/ije/dyq111
- 4.
- Lenz S, Zöller D, Hess M, Binder H. Architectures for distributed privacy-preserving deep learning. In: Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie, Hrsg. 63. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Osnabrück, 02.-06.09.2018. Düsseldorf: German Medical Science GMS Publishing House; 2018. DocAbstr. 207. DOI: 10.3205/18gmds097
- 5.
- Gründner J. A queue-Poll Extension - standardised, monitored, indirect and secure DataSHIELD access to your data. 2018 DataSHIELD Workshop; 2018; Newcastle.
- 6.
- Maier C, Lang L, Storf H, Vormstein P, Bieber R, Bernarding J, Herrmann T, Haverkamp C, Horki P, Laufer J, Berger F. Towards implementation of OMOP in a German university hospital consortium. Applied clinical informatics. 2018 Jan;9(01):054-61. DOI: 10.1055/s-0037-1617452
- 7.
- Sedlmayr M, Prokosch HU. Datenaustausch in der Forschung via OMOP/OHDSI. 2018 [Accessed 16 July 2019]. Available from: https://www.miracum.org/wp-content/uploads/2018/06/OMOP_MIRACUM_eHealth.com_Juli.2018.pdf
- 8.
- Redaktionsgruppe Kerndatensatz. MI-I-Kerndatensatz. 2017 [Accessed 16 July 2019]. Available from: https://www.medizininformatik-initiative.de/sites/default/files/inline-files/MII_04_Kerndatensatz_1-0.pdf