gms | German Medical Science

66. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 12. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e. V. (TMF)

26. - 30.09.2021, online

Non-Disclosive ROC Analysis and Calibration Using DataSHIELD

Meeting Abstract

Search Medline for

  • Daniel Schalk - Ludwig-Maximilians-Universität München, Department of Statistics, München, Germany; DIFUTURE (Data Integration for Future Medicine), München, Germany
  • Ulrich Mansmann - Ludwig-Maximilians-Universität München, Institute for Medical Information Processing, Biometry, and Epidemiology (IBE), München, Germany; DIFUTURE (Data Integration for Future Medicine), München, Germany
  • Verena S. Hoffmann - Ludwig-Maximilians-Universität München, Institute for Medical Information Processing, Biometry, and Epidemiology (IBE), München, Germany; DIFUTURE (Data Integration for Future Medicine), München, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 66. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 12. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). sine loco [digital], 26.-30.09.2021. Düsseldorf: German Medical Science GMS Publishing House; 2021. DocAbstr. 165

doi: 10.3205/21gmds071, urn:nbn:de:0183-21gmds0717

Published: September 24, 2021

© 2021 Schalk et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Non-disclosive federated analysis conducts statistical analysis without sharing patient data between sites. As a tool for distributed computing, we use DataSHIELD [1] and extend its functionality towards the evaluation of prediction models on distributed data [2]. The objective is to calculate the Area Under the ROC curve (AUC), its confidence intervals, and to assess model calibration on distributed data. This presentation focuses on the technical implementation of the methodological details that were already presented elsewhere [3]. This work is part of the ProVal-MS study (https://difuture.de/, DRKS: 00014034) on patients with multiple sclerosis (MS) and aims to assess a newly developed treatment decision score with respect to the effect of an individual treatment on the patient’s disease progression. The patients were treated at five different sites.

The pooled analysis of distributed data commonly requires pooling data over sites, patient consent, and compliance with various data protection laws. To overcome data protection issues we developed R packages to enable distributed ROC analysis within the DataSHIELD framework. The analysis uses aggregated or noisy instead of sensitive individual data.

The package ds.predict.base encodes arbitrary R objects, send them from one to a second server. There they are decoded and loaded into the server’s R environment. This way, sharing of not just simple R objects (vectors) but also of data sets and models is possible. The package also enables calculation of prediction values or scores based on models and data locally available.

The second package ds.calibration builds on the first package and allows (https://github.com/difuture-lmu/ds.calibration) to assess calibration of pediction models. It implements the Brier score as well as plots of calibration curves of the respective scores. Here, we make use of the aggregation mechanism provided by DataSHIELD. The received aggregated values can be processed further locally to get global results.

Finally, ds.roc.glm (https://github.com/difuture-lmu/ds.roc.glm) conduts a distributed ROC analysis. It uses the ROC-GLM algorithm introduced by Pepe [4]. The main challenges of the distributed ROC-GLM algorithm is to calculate a global survivor function from individual scores and a distributed probit regression. Solutions are provided by aggregation techniques as well as methods from differential privacy [5].

We especially reevaluated the role of the DataSHIELD administrator. He is registering the assign and aggregate methods at the central server, defining what is allowed to get shared by these functions. If not set carefully, sensitive data could be shared. It is of utmost importance to assign an experienced scientist to that role. Nevertheless, the usage of DataSHIELD to enable distributed analysis is a milestone in data-driven medicine and can unlock a plethora of medical information for research.

Acknowledgement: DIFUTURE is funded as a part of the MI-I by the Bundesministerium für Bildung und Forschung (BMBF): 01ZZ1804C.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Gaye A, Marcon Y, Isaeva J, LaFlamme P, Turner A, Jones E, et al. DataSHIELD: taking the analysis to the data, not the data to the analysis. International Journal of Epidemiology. 2014;43  6:1929–1944. DOI: 10.1093/ije/dyu188 External link
2.
Buchka S, Schalk D, Zhang G, Hoffmann V. Validierung des DIFUTURE-Treatment Decision Scores für Patienten mit Multipler Sklerose auf Basis verteilter Daten – eine Machbarkeitsstudie mit DataSHIELD. In: 65th Annual Meeting of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS), Meeting of the Central European Network (CEN: German Region, Austro-Swiss Region and Polish Region) of the International Biometric Society (IBS). Berlin, 06.-09.09.2020. Düsseldorf: German Medical Science GMS Publishing House; 2021. DocAbstr. 472. Available from: DOI: 10.3205/20gmds241 External link
3.
Schalk D, Buchka S, Mansmann U, Hoffmann V. Distributed Computation of the AUROC-GLM Confidence Intervals Using DataSHIELD. In: 67th Biometric Colloquium of the German Region of the International Biometric Society. 2021 [accessed 30.04.2021]. p. 114. Available from: https://www.biometrisches-kolloquium2021.de/wp-content/uploads/2021/03/BK2021_Book_of_Abstracts_updated.pdf External link
4.
Pepe MS. The statistical evaluation of medical tests for classification and prediction. Oxford University Press; 2003. p. 120-125.
5.
Dwork C. Differential privacy: A survey of results. In: International Conference on Theory and Applications of Models of Computation TAMC 2008. Springer; 2008. (Lecture Notes in Computer Science; 4978). p. 1-19. DOI: 10.1007/978-3-540-79228-4_1 External link