gms | German Medical Science

Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH)

08.09. - 13.09.2024, Dresden

Leveraging machine learning to predict the functional effects of genetic variants in ion channels

Meeting Abstract

  • Pia Francesca Rissom - Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
  • Jordan Safer - Broad Institute of MIT and Harvard, Center for the Development of Therapeutics, Cambridge, United States
  • Paulo Yanez Sarmiento - Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
  • Andreas Brunklaus - School of Health and Wellbeing, University of Glasgow, Glasgow, United Kingdom; Paediatric Neurosciences Research Group, Royal Hospital for Children, Glasgow, United Kingdom
  • Damian Balaz - University of Tübingen, Tübingen, Germany
  • Roberta Castelli - Department of Biosciences, University of Milan, Milano, Italy
  • Connor W. Coley - Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, United States; Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, United States
  • Christel Depienne - Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
  • Alfred L. Geroge - Department of Pharmacology, Feinberg School of Medicine, Northwestern University, Chicago, United States
  • Andrew Glazer - Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, United States; Department of Medicine, Vanderbilt University Medical Center, Nashville, United States
  • Erkin Kurganov - Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, United States
  • Carla Marini - Child Neurology and Psychiatric Unit, Pediatric Hospital G. Salesi, Azienda Ospedaliero Universitaria delle Marche, Ancona, Italy
  • Rikke Steensbjerre M\u248 ?ller - Department of Epilepsy Genetics and Personalized Medicine, Danish Epilepsy Centre, Dianalund, Denmark; Department of Regional Health Research, University of Southern Denmark, Odense, Denmark
  • Anna Moroni - Department of Biosciences, University of Milan, Milano, Italy
  • Jen Q. Pan - Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, United States
  • Bina Santoro - Department of Neuroscience, Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, United States
  • Bernhard Y. Renard - Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
  • Sumaiya Iqbal - Broad Institute of MIT and Harvard, Center for the Development of Therapeutics, Cambridge, United States; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, United States; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, United States; Cancer Data Sciences, Dana-Farber/Harvard Cancer Center, Boston, United States
  • Henrike Heyne - Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany

Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH). Dresden, 08.-13.09.2024. Düsseldorf: German Medical Science GMS Publishing House; 2024. DocAbstr. 982

doi: 10.3205/24gmds140, urn:nbn:de:0183-24gmds1402

Published: September 6, 2024

© 2024 Rissom et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Voltage-gated ion channels are pivotal for rapid signal propagation within organisms and are targeted for treating diseases like epilepsy and cardiac arrhythmias. Understanding the molecular effect of disease-causing variants on these channels is crucial for effective therapy. For example, a channel blocker can be fatal for an individual with a loss-of-function variant but be beneficial for a patient with a gain-of-function variant. However, the impact of many genetic variants on channel function remains uncertain as experiments are resource-intensive. Machine learning offers a promising approach for supporting therapeutic management in this field [1], [2].

In this study, we explore the use of machine learning approaches, including classical machine learning as well as pre-trained Protein Language Models (PLM), to predict the functional effects of genetic missense variants caused by point mutations across a wide range of voltage-gated ion channels including Nav, Cav, Kv, Kir, and HCN channels (corresponding to genes such as SCNxA, CACNA1x, KCNx, and HCNx).

We first characterize ion channel proteins by collecting a comprehensive dataset comprising more than 180 protein features for over 140 proteins from large public data sources such as the Genomics 2 Proteins Portal [3]. The dataset includes for example structural information, protein stability predictions, post-translational modification sites, and protein-protein interactions for all potential missense mutation positions. Furthermore, in collaboration with experimental and clinical experts, we assemble and curate a dataset on the functional effect of over 3,000 pathogenic missense variants affecting these proteins. This dataset is derived from both experimental and clinical data. The experimental data, primarily obtained from electrophysiological experiments, is sourced from published literature as well as unpublished studies from collaborators. The clinical data, which includes clinically observed associations between genetic variants and phenotypes, is collected from high confidence submissions to large databases such as ClinVar [4] and single gene registries from collaborators. The phenotype data serves as a proxy for the functional effects of variants in genes, where associations between the gene, phenotype, and functional effect have been established in the literature.

We assess the performance of different machine learning approaches for distinguishing between loss- and gain-of-function mutations. These methods encompass a spectrum, ranging from those that incorporate data from various experiments and expert knowledge to those primarily reliant on data-driven prior information. Specifically, we train classical machine learning models on the collected protein features as well as PLM-embeddings of the ESM model [5], a PLM, pre-trained on large-scale amino acid sequence data. Here, we assess the performance for specific variant groups of shared characteristics (e.g. protein-protein interaction sites). Additionally, we fine-tune PLMs directly. Finally, we validate our models on large-scale clinical and functional datasets.

With our results, we aim to advance the understanding of ion channels as well as provide an open-source tool that supports research for ion-channel related diseases, such as epilepsy or cardiac arrhythmias. By providing high-confidence predictions about the functional effect of genetic variants, this tool aims to ultimately also support personalized therapeutic management.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.

The contribution has already been published: Parts of this work have been submitted as abstracts to the Mutational Scanning Symposium 2024 in Boston and the International Conference on Intelligent Systems in Molecular Biology 2024 in Montreal.


References

1.
Heyne HO, Baez-Nieto D, Iqbal S, Palmer DS, Brunklaus A, May P, et al. Predicting functional effects of missense variants in voltage-gated sodium and calcium channels. Sci Transl Med. 2020 Aug 12;12(556):eaay6848. DOI: 10.1126/scitranslmed.aay6848 External link
2.
Boßelmann CM, Hedrich UBS, Müller P, Sonnenberg L, Parthasarathy S, Helbig I, Lerche H, Pfeifer N. Predicting the functional effects of voltage-gated potassium channel missense variants with multi-task learning. EBioMedicine. 2022 Jul;81:104115. DOI: 10.1016/j.ebiom.2022.104115 External link
3.
Kwon S, Safer J, Nguyen DT, Hoksza D, May P, Arbesfeld JA, Rubin AF, Campbell AJ, Burgin A, Iqbal S. Genomics 2 Proteins portal: A resource and discovery tool for linking genetic screening outputs to protein sequences and structures. bioRxiv [Preprint]. 2024 Jan 2:2024.01.02.573913. DOI: 10.1101/2024.01.02.573913 External link
4.
Landrum MJ, Lee JM, Riley GR, Jang W, Rubinstein WS, Church DM, Maglott DR. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 2014 Jan 1;42(1):D980-5. DOI: 10.1093/nar/gkt1113 External link
5.
Rives A, Meier J, Sercu T, Goyal S, Lin Z, Liu J, Guo D, Ott M, Zitnick CL, Ma J, Fergus R. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc Natl Acad Sci U S A. 2021 Apr 13;118(15):e2016239118. DOI: 10.1073/pnas.2016239118 External link