gms | German Medical Science

65th Annual Meeting of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS), Meeting of the Central European Network (CEN: German Region, Austro-Swiss Region and Polish Region) of the International Biometric Society (IBS)

06.09. - 09.09.2020, Berlin (online conference)

Causal discovery of gene regulation with incomplete data

Meeting Abstract

Search Medline for

  • Ronja Foraita - Leibniz Institute for Prevention Research and Epidemiology – BIPS, Bremen, Germany
  • Vanessa Didelez - Leibniz Institute for Prevention Research and Epidemiology – BIPS, Bremen, Germany; Faculty of Mathematics and Computer Science, University Bremen, Bremen, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 65th Annual Meeting of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS), Meeting of the Central European Network (CEN: German Region, Austro-Swiss Region and Polish Region) of the International Biometric Society (IBS). Berlin, 06.-09.09.2020. Düsseldorf: German Medical Science GMS Publishing House; 2021. DocAbstr. 126

doi: 10.3205/20gmds287, urn:nbn:de:0183-20gmds2871

Published: February 26, 2021

© 2021 Foraita et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Background: Causal discovery algorithms aim at finding causal relations from observational data and have become popular to investigate, for instance, the causal structure of genetic regulatory systems. However, most methods of causal discovery require fully observed data. Our objective is to develop an approach for causal discovery that can be used on data with missing values.

Methods: We consider constraint-based causal discovery algorithms which take as input a series of conditional independence tests and output a class of causal graphs. To combine this with multiple imputation of missing values the required conditional independence tests are pooled using Rubin's rules [1]. We assess the robustness of our results by a number of sensitivity analyses, including a non-parametric bootstrap to quantify the variability of the estimated causal structures.

We apply our method to investigate how the HMGA2 (high mobility group AT-Hook 2) gene is incorporated into the protein 53 signaling pathway which is thought to play an important role in head and neck squamous cell carcinoma (HNSCC).

Results: Our procedure for combining constraint-based search with multiple imputation is implemented as a modifications of the PC-stable and FCI-stable algorithms [2]; it can be obtained at https://github.com/bips-hb/micd. The findings of our study are relatively stable and point to direct associations between HMGA2 and other relevant proteins, but do not provide clear support for the claim that HMGA2 itself plays a causal role as a key regulator gene.

Conclusion: The combination of constraint-based algorithms with multiple imputation presents an efficient and flexible approach to causal discovery with incomplete data. In our application to the protein 53 signaling pathway, the results do not suggest that HMGA2 would be useful as therapeutic target in HNSCC.

The authors declare that they have no competing interests.

The authors declare that a positive ethics committee vote has been obtained.


References

1.
Foraita R, Friemel J, Günther K, Behrens T, Bullerdiek J, Nimzyk R, Ahrens W, Didelez V. Causal discovery of gene regulation with incomplete data. Journal of the Royal Statistical Society: Series A. 2020. Accepted.
2.
Kalisch M, Mächler M, Colombo D, Maathuis MH, Bühlmann P. Causal inference using graphical models with the R package pcalg. Journal of Statistical Software. 2012;47:1–26.