Article
Leveraging machine learning to predict the functional effects of genetic variants in ion channels
Search Medline for
Authors
Published: | September 6, 2024 |
---|
Outline
Text
Voltage-gated ion channels are pivotal for rapid signal propagation within organisms and are targeted for treating diseases like epilepsy and cardiac arrhythmias. Understanding the molecular effect of disease-causing variants on these channels is crucial for effective therapy. For example, a channel blocker can be fatal for an individual with a loss-of-function variant but be beneficial for a patient with a gain-of-function variant. However, the impact of many genetic variants on channel function remains uncertain as experiments are resource-intensive. Machine learning offers a promising approach for supporting therapeutic management in this field [1], [2].
In this study, we explore the use of machine learning approaches, including classical machine learning as well as pre-trained Protein Language Models (PLM), to predict the functional effects of genetic missense variants caused by point mutations across a wide range of voltage-gated ion channels including Nav, Cav, Kv, Kir, and HCN channels (corresponding to genes such as SCNxA, CACNA1x, KCNx, and HCNx).
We first characterize ion channel proteins by collecting a comprehensive dataset comprising more than 180 protein features for over 140 proteins from large public data sources such as the Genomics 2 Proteins Portal [3]. The dataset includes for example structural information, protein stability predictions, post-translational modification sites, and protein-protein interactions for all potential missense mutation positions. Furthermore, in collaboration with experimental and clinical experts, we assemble and curate a dataset on the functional effect of over 3,000 pathogenic missense variants affecting these proteins. This dataset is derived from both experimental and clinical data. The experimental data, primarily obtained from electrophysiological experiments, is sourced from published literature as well as unpublished studies from collaborators. The clinical data, which includes clinically observed associations between genetic variants and phenotypes, is collected from high confidence submissions to large databases such as ClinVar [4] and single gene registries from collaborators. The phenotype data serves as a proxy for the functional effects of variants in genes, where associations between the gene, phenotype, and functional effect have been established in the literature.
We assess the performance of different machine learning approaches for distinguishing between loss- and gain-of-function mutations. These methods encompass a spectrum, ranging from those that incorporate data from various experiments and expert knowledge to those primarily reliant on data-driven prior information. Specifically, we train classical machine learning models on the collected protein features as well as PLM-embeddings of the ESM model [5], a PLM, pre-trained on large-scale amino acid sequence data. Here, we assess the performance for specific variant groups of shared characteristics (e.g. protein-protein interaction sites). Additionally, we fine-tune PLMs directly. Finally, we validate our models on large-scale clinical and functional datasets.
With our results, we aim to advance the understanding of ion channels as well as provide an open-source tool that supports research for ion-channel related diseases, such as epilepsy or cardiac arrhythmias. By providing high-confidence predictions about the functional effect of genetic variants, this tool aims to ultimately also support personalized therapeutic management.
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.
The contribution has already been published: Parts of this work have been submitted as abstracts to the Mutational Scanning Symposium 2024 in Boston and the International Conference on Intelligent Systems in Molecular Biology 2024 in Montreal.
References
- 1.
- Heyne HO, Baez-Nieto D, Iqbal S, Palmer DS, Brunklaus A, May P, et al. Predicting functional effects of missense variants in voltage-gated sodium and calcium channels. Sci Transl Med. 2020 Aug 12;12(556):eaay6848. DOI: 10.1126/scitranslmed.aay6848
- 2.
- Boßelmann CM, Hedrich UBS, Müller P, Sonnenberg L, Parthasarathy S, Helbig I, Lerche H, Pfeifer N. Predicting the functional effects of voltage-gated potassium channel missense variants with multi-task learning. EBioMedicine. 2022 Jul;81:104115. DOI: 10.1016/j.ebiom.2022.104115
- 3.
- Kwon S, Safer J, Nguyen DT, Hoksza D, May P, Arbesfeld JA, Rubin AF, Campbell AJ, Burgin A, Iqbal S. Genomics 2 Proteins portal: A resource and discovery tool for linking genetic screening outputs to protein sequences and structures. bioRxiv [Preprint]. 2024 Jan 2:2024.01.02.573913. DOI: 10.1101/2024.01.02.573913
- 4.
- Landrum MJ, Lee JM, Riley GR, Jang W, Rubinstein WS, Church DM, Maglott DR. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 2014 Jan 1;42(1):D980-5. DOI: 10.1093/nar/gkt1113
- 5.
- Rives A, Meier J, Sercu T, Goyal S, Lin Z, Liu J, Guo D, Ott M, Zitnick CL, Ma J, Fergus R. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc Natl Acad Sci U S A. 2021 Apr 13;118(15):e2016239118. DOI: 10.1073/pnas.2016239118