Article
Causal discovery of gene regulation with incomplete data
Search Medline for
Authors
Published: | February 26, 2021 |
---|
Outline
Text
Background: Causal discovery algorithms aim at finding causal relations from observational data and have become popular to investigate, for instance, the causal structure of genetic regulatory systems. However, most methods of causal discovery require fully observed data. Our objective is to develop an approach for causal discovery that can be used on data with missing values.
Methods: We consider constraint-based causal discovery algorithms which take as input a series of conditional independence tests and output a class of causal graphs. To combine this with multiple imputation of missing values the required conditional independence tests are pooled using Rubin's rules [1]. We assess the robustness of our results by a number of sensitivity analyses, including a non-parametric bootstrap to quantify the variability of the estimated causal structures.
We apply our method to investigate how the HMGA2 (high mobility group AT-Hook 2) gene is incorporated into the protein 53 signaling pathway which is thought to play an important role in head and neck squamous cell carcinoma (HNSCC).
Results: Our procedure for combining constraint-based search with multiple imputation is implemented as a modifications of the PC-stable and FCI-stable algorithms [2]; it can be obtained at https://github.com/bips-hb/micd. The findings of our study are relatively stable and point to direct associations between HMGA2 and other relevant proteins, but do not provide clear support for the claim that HMGA2 itself plays a causal role as a key regulator gene.
Conclusion: The combination of constraint-based algorithms with multiple imputation presents an efficient and flexible approach to causal discovery with incomplete data. In our application to the protein 53 signaling pathway, the results do not suggest that HMGA2 would be useful as therapeutic target in HNSCC.
The authors declare that they have no competing interests.
The authors declare that a positive ethics committee vote has been obtained.
References
- 1.
- Foraita R, Friemel J, Günther K, Behrens T, Bullerdiek J, Nimzyk R, Ahrens W, Didelez V. Causal discovery of gene regulation with incomplete data. Journal of the Royal Statistical Society: Series A. 2020. Accepted.
- 2.
- Kalisch M, Mächler M, Colombo D, Maathuis MH, Bühlmann P. Causal inference using graphical models with the R package pcalg. Journal of Statistical Software. 2012;47:1–26.