Article
Comparison of classic polygenic scores with machine learning algorithms to predict hypertension
Search Medline for
Authors
Published: | September 6, 2024 |
---|
Outline
Text
Hypertension is the leading risk factor for the development of cardiovascular disease and, since blood pressure is a frequently measured clinical parameter, it is frequently available. Based on the polygenic heritability shown for complex traits like hypertension, polygenic scores (PGS) are increasingly being used in preclinical and clinical research to stratify individuals according to their genetic susceptibility for targeted prevention, therapy, or prognosis. However, classic PGS use a simple sum of individual genotypes, weighted by the association estimated from single variant genome-wide association studies (GWAS). Thus, multivariable and non-linear effects are not taken into account. Since classical statistical methods reach their limits when including a large numbers of independent variables, machine learning (ML) algorithms can alternatively be used for score construction.
Machine learning algorithms have not yet been applied to construct polygenic scores to predict hypertension. Therefore, it is unclear whether more complex algorithms are better able to predict hypertension than classic scores. This study aims to evaluate different ML algorithms suitable for classification problems such as random forest, LASSO, elastic net, and support vector classifier. For the benchmarking, data from the UK Biobank will be used, a biomedical database containing genetic and health information from half a million participants from the United Kingdom. Hypertension will be defined as taking blood pressure lowering medication, a diastolic blood pressure above 90 mmHg, or a systolic blood pressure above 140 mmHg at the initial assessment visit. The data set will repeatedly and randomly split into training and test data sets. The training data set will be used to generate a simple weighted PGS for hypertension by performing a GWAS and to train ML models. Hyperparameter tuning will be performed as well as variable selection where applicable. Prediction performances of the resulting models will be compared on the independent test data set by the area under the receiver operating curve (AUC).
Results will be presented at the conference. The study results provide better insight into whether compressed genetic information obtained by complex machine learning algorithms perform better than classic PGS to predict hypertension and which model performs best for classification. Additionally, the results will allow conclusions on the genetic structure of hypertension.
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.