gms | German Medical Science

GMDS 2015: 60. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

06.09. - 09.09.2015, Krefeld

Data Management and Processing in the time of Next Generation Sequencing and Precision Oncology

Meeting Abstract

Suche in Medline nach

  • Christian Lawerenz - Deutsches Krebsforschungszentrum, Heidelberg, Deutschland
  • Jürgen Eils - DKFZ, Heidelberg, Deutschland
  • Roland Eils - DKFZ, Heidelberg, Deutschland; Universität Heidelberg, Heidelberg, Deutschland

GMDS 2015. 60. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Krefeld, 06.-09.09.2015. Düsseldorf: German Medical Science GMS Publishing House; 2015. DocAbstr. 278

doi: 10.3205/15gmds013, urn:nbn:de:0183-15gmds0131

Veröffentlicht: 27. August 2015

© 2015 Lawerenz et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Introduction: Personalized Oncology by means of newest applied technologies will increasingly become a data intensive field. Nowadays, the major sequencing centers boost their sequencing capacities to be capable of generating dozens of whole human genome sequences within a few days. Access to know-how but also resources for efficiently collecting, organizing and analyzing this highly complex sequencing data has become a critical factor for success in applied human genetics and genomics research. Furthermore fast calculation of genetic and genomic aberrations to speed-up the prediction of the specific vulnerabilities of the tumor is of major importance. It is one of the goals of the eilslabs Data Management Group at the German Cancer Research Center (DKFZ) in Heidelberg to make the critical resources available in a short timeframe to the groups at the Heidelberg Center for Personalized Oncology – DKFZ-HIPO and the German Consortium for Translational Cancer Research - DKTK involved in the clinical interpretation, therapy prediction and in the oncological therapy optimization.

Materials und Methods: The Heidelberg research area offers long-established expertise in the analysis of next-generation sequencing (NGS). Through its contribution to all three major German projects in the context of the International Cancer Genome Consortium (ICGC) and the ICGC - The Cancer Genome Atlas (TCGA)-Pancancer analysis of more than 2350 tumor/control genomes, eilslabs is substantially involved in the extension of already existing and the development of new analysis methods for single small indels, somatic variants, copy number alterations and new pancancer variant calling pipelines. The eilslabs group provides these and other to establish bioinformatics methods, tools and pipelines for the analysis of next-generation sequencing data of various kinds, such as exome and whole-genome sequences, RNA sequences, and whole-genome bisulfite sequences. Next-generation sequencing (NGS) for cancer-related genome projects generate data in a triple-digit terabyte range within a few weeks.

Results: High-performance data management and processing infrastructure in terms of both hardware and software is a prerequisite for any large-scale sequencing project in the scientific and medical field. The applied best practice software and the technical infrastructure have to be optimized to handle large resources within short time frames to enable quick access to the cancer patient’s result data sets.

The Data Management Group has developed a software application, One Touch Pipeline (OTP), for processing of Next-Generation Sequencing (NGS) data produced by different sequencing centers. OTP provides support for, inter alia, data transfer from temporary to final storage, data quality checks, alignment of reads, and variant calling. Different workflows, e.g., the golden standard for alignment in terms of cancer cases, the PanCancer Workflow, is in production mode. This workflow includes BWA-MEM alignment, merging and quality assessment steps for single lane bams, merged bams and per chromosome. The SNV pipeline of eilslabs, applied in PanCancer, generates somatic, germline, and high confidence VCF files for subsequent analysis.

In total, the sequences from more than 5.000 patients have been processed via OTP. An added value is the high scalability not only in terms of the constantly growing number of case depending reads, but also of NGS projects (currently 91 projects). The highly structured organization of all crucial specifications via OTP is markedly different from other frameworks. Project depending specifications, sequence meta-information, QC values, and statistical performance values derived from the sequence processing steps are directly accessible at a glance via the GUI.

In order to tackle the challenges in terms of increasing read numbers (currently more than 28.000 lanes), OTP has been employed to process, store and analyze NGS data in a completely automated manner. A substantial archievement in this regard is, that the framework is applicable for operators as non-experts. Finally, the complex 10 Petabyte NGS storage system, and the high performance computing clusters have been extended and optimized to be able to process the sequence data sets. The DKFZ has established one of the largest sequence facilities in Europe in conjunction with the required IT infrastructure. Complex storage systems, computing clusters and major programs are in place to process the massive NGS data of various projects. Since the data production in the context of personalized medicine will continue to increase extremely, the flexibility and the scalability of the central applications to process, store, and analyze the result sets cannot be overemphasized.

Discussion: New achievements in intelligent computing and software technologies have been realized in order to manage and process the immense amount of genomic data properly. The technical infrastructure and the developed frameworks for NGS, established in the context of the personalized oncology projects of the DKFZ and the National Center for Tumor Diseases (NCT), are in place for the next extension phase. The size of data sets will grow continuously by more than an order of magnitude in the next years, enabled by means of new sequence technologies (e.g., Illumina X10). New methods reflecting new knowledge in bio-medical informatics, or new reference genomes, require from time to time the reprocessing of sequence data sets of complete patients cohorts. This leads to enormous peak loads, which are solved to a large extent by our scalable frameworks and partially by significant extensions of the High Performance Clusters (HPC) and the data center in Heidelberg. However, these performance peaks require furthermore the realization of extended concepts for external HPCs and cloud computing.


References

1.
Northcott PA, Lee C, Zichner T, Stütz AM, Erkek S, Kawauchi D, Shih DJ, Hovestadt V, Zapatka M, Sturm D, Jones DT, Kool M, Remke M, Cavalli FM, Zuyderduyn S, Bader GD, VandenBerg S, Esparza LA, Ryzhova M, Wang W, Wittmann A, Stark S, Sieber L, Seker-Cin H, Linke L, Kratochwil F, Jäger N, Buchhalter I, Imbusch CD, Zipprich G, Raeder B, Schmidt S, Diessl N, Wolf S, Wiemann S, Brors B, Lawerenz C, Eils J, Warnatz HJ, Risch T, Yaspo ML, Weber UD, Bartholomae CC, von Kalle C, Turányi E, Hauser P, Sanden E, Darabi A, Siesjö P, Sterba J, Zitterbart K, Sumerauer D, van Sluis P, Versteeg R, Volckmann R, Koster J, Schuhmann MU, Ebinger M, Grimes HL, Robinson GW, Gajjar A, Mynarek M, von Hoff K, Rutkowski S, Pietsch T, Scheurlen W, Felsberg J, Reifenberger G, Kulozik AE, von Deimling A, Witt O, Eils R, Gilbertson RJ, Korshunov A, Taylor MD, Lichter P, Korbel JO, Wechsler-Reya RJ, Pfister SM. Enhancer hijacking activates GFI1 family oncogenes in medulloblastoma. Nature. 2014 Jul 24;511(7510):428-34. DOI: 10.1038/nature13379 Externer Link
2.
Hovestadt V, Jones DT, Picelli S, Wang W, Kool M, Northcott PA, Sultan M, Stachurski K, Ryzhova M, Warnatz HJ, Ralser M, Brun S, Bunt J, Jäger N, Kleinheinz K, Erkek S, Weber UD, Bartholomae CC, von Kalle C, Lawerenz C, Eils J, Koster J, Versteeg R, Milde T, Witt O, Schmidt S, Wolf S, Pietsch T, Rutkowski S, Scheurlen W, Taylor MD, Brors B, Felsberg J, Reifenberger G, Borkhardt A, Lehrach H, Wechsler-Reya RJ, Eils R, Yaspo ML, Landgraf P, Korshunov A, Zapatka M, Radlwimmer B, Pfister SM, Lichter P. Decoding the regulatory landscape of medulloblastoma using DNA methylation sequencing. Nature. 2014 Jun 26;510(7506):537-41. DOI: 10.1038/nature13268 Externer Link
3.
Kool M, Jones DT, Jäger N, Northcott PA, Pugh TJ, Hovestadt V, Piro RM, Esparza LA, Markant SL, Remke M, Milde T, Bourdeaut F, Ryzhova M, Sturm D, Pfaff E, Stark S, Hutter S, Seker-Cin H, Johann P, Bender S, Schmidt C, Rausch T, Shih D, Reimand J, Sieber L, Wittmann A, Linke L, Witt H, Weber UD, Zapatka M, König R, Beroukhim R, Bergthold G, van Sluis P, Volckmann R, Koster J, Versteeg R, Schmidt S, Wolf S, Lawerenz C, Bartholomae CC, von Kalle C, Unterberg A, Herold-Mende C, Hofer S, Kulozik AE, von Deimling A, Scheurlen W, Felsberg J, Reifenberger G, Hasselblatt M, Crawford JR, Grant GA, Jabado N, Perry A, Cowdrey C, Croul S, Zadeh G, Korbel JO, Doz F, Delattre O, Bader GD, McCabe MG, Collins VP, Kieran MW, Cho YJ, Pomeroy SL, Witt O, Brors B, Taylor MD, Schüller U, Korshunov A, Eils R, Wechsler-Reya RJ, Lichter P, Pfister SM; ICGC PedBrain Tumor Project. Genome sequencing of SHH medulloblastoma predicts genotype-related response to smoothened inhibition. Cancer Cell. 2014 Mar 17;25(3):393-405. DOI: 10.1016/j.ccr.2014.02.004 Externer Link
4.
Jones DT, Hutter B, Jäger N, Korshunov A, Kool M, Warnatz HJ, Zichner T, Lambert SR, Ryzhova M, Quang DA, Fontebasso AM, Stütz AM, Hutter S, Zuckermann M, Sturm D, Gronych J, Lasitschka B, Schmidt S, Seker-Cin H, Witt H, Sultan M, Ralser M, Northcott PA, Hovestadt V, Bender S, Pfaff E, Stark S, Faury D, Schwartzentruber J, Majewski J, Weber UD, Zapatka M, Raeder B, Schlesner M, Worth CL, Bartholomae CC, von Kalle C, Imbusch CD, Radomski S, Lawerenz C, van Sluis P, Koster J, Volckmann R, Versteeg R, Lehrach H, Monoranu C, Winkler B, Unterberg A, Herold-Mende C, Milde T, Kulozik AE, Ebinger M, Schuhmann MU, Cho YJ, Pomeroy SL, von Deimling A, Witt O, Taylor MD, Wolf S, Karajannis MA, Eberhart CG, Scheurlen W, Hasselblatt M, Ligon KL, Kieran MW, Korbel JO, Yaspo ML, Brors B, Felsberg J, Reifenberger G, Collins VP, Jabado N, Eils R, Lichter P, Pfister SM; International Cancer Genome Consortium PedBrain Tumor Project. Recurrent somatic alterations of FGFR1 and NTRK2 in pilocytic astrocytoma. Nat Genet. 2013 Aug;45(8):927-32. DOI: 10.1038/ng.2682 Externer Link
5.
Richter J, Schlesner M, Hoffmann S, Kreuz M, Leich E, Burkhardt B, Rosolowski M, Ammerpohl O, Wagener R, Bernhart SH, Lenze D, Szczepanowski M, Paulsen M, Lipinski S, Russell RB, Adam-Klages S, Apic G, Claviez A, Hasenclever D, Hovestadt V, Hornig N, Korbel JO, Kube D, Langenberger D, Lawerenz C, Lisfeld J, Meyer K, Picelli S, Pischimarov J, Radlwimmer B, Rausch T, Rohde M, Schilhabel M, Scholtysik R, Spang R, Trautmann H, Zenz T, Borkhardt A, Drexler HG, Möller P, MacLeod RA, Pott C, Schreiber S, Trümper L, Loeffler M, Stadler PF, Lichter P, Eils R, Küppers R, Hummel M, Klapper W, Rosenstiel P, Rosenwald A, Brors B, Siebert R; ICGC MMML-Seq Project. Recurrent mutation of the ID3 gene in Burkitt lymphoma identified by integrated genome, exome and transcriptome sequencing. Nat Genet. 2012 Dec;44(12):1316-20. DOI: 10.1038/ng.2469 Externer Link