Artikel
Multivariate regression modelling with global and cohort-specific effects in a federated setting with data protection constraints
Suche in Medline nach
Autoren
Veröffentlicht: | 24. September 2021 |
---|
Gliederung
Text
Multi-cohort studies are an important tool to study effects on a large sample size and to identify cohort-specific effects. Thus, researchers would like to share information between cohorts and research institutes. However, data protection constraints sometimes forbid the exchange of individual-level data between different research institutes. To circumvent this problem, only non-disclosive aggregated data is exchanged, which is often done manually and requires explicit permission before transfer. The framework DataSHIELD enables automatic exchange in iterative calls and thus facilitates the use of methods for performing more complex tasks such as federated optimisation.
We propose a federated method for multivariate regression models aiming to improve the model for a specific cohort of interest by including the information form other cohorts even in the presence of cohort-specific effects. This approach is solely based on non-disclosive aggregated data from different institutions and should be applicable in a setting with high-dimensional data with complex correlation structures. Nonetheless, the amount of transferred data is limited to enable manual confirmation of data protection compliance.
Our approach implements an iterative procedure between the cohort-specific model and a global model using data from other cohorts in addition to the ones from the cohort of interest. Herein, the linear predictor of the global model will act as a covariate in the cohort-specific model estimation. Subsequently, the linear predictor of the updated cohort-specific model is included in the global model estimation. The procedure is repeated until the combined model converges with respect to the cohort-specific model estimates.
In different simulation settings, we aim to show that our approach improves cohort-specific predictions by reducing overfitting and preserving the globally found effect structure. In a more complex simulation setting, we test our approach under more realistic conditions which allow further generalization of the results. Herein, three different roles of cohort-specific effects are studied – namely no cohort-specific effect, an independent cohort-specific effect and a confounding cohort-specific effect. As a consequence, the method can be evaluated for different assumptions regarding the underlying effect structure.
In general, all gradient-based methods can be adapted easily to a federated setting under data protection constraints. The here presented method can be used in this setting to obtain better predictions and can thus aid in the process of understanding cohort-specific estimates.
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.