gms | German Medical Science

66. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 12. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e. V. (TMF)

26. - 30.09.2021, online

PepFuse: network-based data integration of label-free proteomics data by mixed graphical modeling with peptide grouping

Meeting Abstract

  • Robin Kosch - Research Group Computational Biology, University of Hohenheim, Stuttgart, Germany
  • Katharina Limm - Chair and Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
  • Nadine Kurz - Research Group Computational Biology, University of Hohenheim, Stuttgart, Germany
  • Sahar Ghasemi - Department of Psychiatry and Psychotherapy, University Medicine Greifswald, Greifswald, Germany
  • Helena Zacharias - Department of Internal Medicine I, University Medical Center Schleswig-Holstein, Campus Kiel, Kiel, Germany; Institute of Clinical Molecular Biology, Kiel University and University Medical Center Schleswig-Holstein, Campus Kiel, Kiel, Germany
  • Annette M. Staiger - Department of Clinical Pathology, Robert-Bosch-Krankenhaus, Stuttgart, Germany; Dr Margarete Fischer-Bosch Institute of Clinical Pharmacology, Stuttgart, Germany; University of Tuebingen, Tuebingen, Germany
  • German Ott - Department of Clinical Pathology, Robert-Bosch-Krankenhaus, Stuttgart, Germany
  • Peter, J. Oefner - Chair and Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
  • Michael Altenbuchinger - Research Group Computational Biology, University of Hohenheim, Stuttgart, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 66. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 12. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). sine loco [digital], 26.-30.09.2021. Düsseldorf: German Medical Science GMS Publishing House; 2021. DocAbstr. 176

doi: 10.3205/21gmds114, urn:nbn:de:0183-21gmds1145

Published: September 24, 2021

© 2021 Kosch et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction: Mass spectrometry (MS)-based proteomics data cover 1,000s of proteins and facilitate the study of a multitude of post-translational modifications (PTMs) such as ubiquitination, phosphorylation, and acetylation. The latter are core in our current understanding of disease patho-mechanisms and for final guidance of therapeutic intervention. In practice, proteins are fragmented into peptides, which can be detected and quantified in their modified and un-modified state. Analyses can be performed either on the level of single peptides or peptides aggregated to proteins.

Methods: Biological network inference methods allow for a holistic view of biological mechanisms such as pathways and interactions. Among these methods, probabilistic graphical models are particularly suited, because they allow the distinction of direct from indirect relationships. They were extended in recent years to model both continuous and categorical variables simultaneously, making them versatile analysis tools to integrate complex, categorical data, e.g., clinical or genotypic variables, with continuous, high-dimensional -omics readouts, e.g., protein abundances. These so-called Mixed Graphical Models (MGMs), introduced by [1], can be directly applied to analyze peptide/protein abundances across several samples. However, they either deal with individual peptides and ignore the information about their protein affiliation, or they deal with peptides aggregated to proteins (Figure 1 [Fig. 1]), hiding important readouts such as regulatory effects from post-translational protein modifications. PepFuse accounts for peptide groups (i.e., protein affiliation) by combining the graphical lasso [2] with group lasso regularization terms [3]. The corresponding optimization problem is efficiently solved using proximal operators and Nesterov’s acceleration [4], [5].

Results: Here, we propose a framework to infer MGMs from proteomics data, which inherently groups peptides according to their underlying protein. This approach can be applied to disentangle regulatory effects of protein modifications from respective protein abundances. First, we demonstrate in simulation studies that PepFuse improves edge recovery compared to the standard graphical lasso and in particular, that it allows to disentangle effects of PTMs and protein abundances. Second, we investigated a dataset of 360 Diffuse Large B-Cell Lymphomas (DLBCL) from prospective clinical trials of the German High-Grade Lymphoma Study Group (DSHNHL), which were measured using microLC-SWATH-MS [6], which further substantiated these findings. Here, we also analyzed key proteins and protein modifications in the context of the cell of origin of DLBCL according to their stratification as Activated B-cell (ABC) like and Germinal B-Cell (GCB) like DLBCLs and clinical variables such as fever, night sweats, weight loss, and bone marrow involvement.

Discussion: Network inference methods for analyzing SWATH-MS data are rare. We introduce a method that considers simultaneously both, proteins and peptides, and in addition, integrates clinical data with proteomics data. This method facilitates the study of the impact of post-translational modifications on phenotypes and clinical variables.

The authors declare that they have no competing interests.

The authors declare that a positive ethics committee vote has been obtained.


References

1.
Lee JD, Hastie TJ. Learning the structure of mixed graphical models. Journal of Computational and Graphical Statistics. 2015; 24(1):230-253.
2.
Friedman J, Hastie TJ, Tibshirani R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics. 2008; 9(3):432-441.
3.
Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2006; 68(1):49-67.
4.
Parikh N, Boyd S. Proximal algorithms. Foundations and Trends in optimization. 2014; 1(3):127-239.
5.
Nesterov Y. A method of solving a convex programming problem with convergence rate O(1/k2). Sov Math Dokl. 1983; 27(2):372–376.
6.
Reinders J, Altenbuchinger M, Limm K, Schwarzfischer P, Scheidt T, Strasser L, Richter J, Szczepanowski M, Huber CG, Klapper W, Spang R, Oefner PJ. Platform independent protein-based cell-of-origin subtyping of diffuse large B-cell lymphoma in formalin-fixed paraffin-embedded tissue. Scientific Reports. 2020; 10(1):7876.