TY - JOUR
T1 - Pipeline Olympics
T2 - continuable benchmarking of computational workflows for DNA methylation sequencing data against an experimental gold standard
AU - Lin, Yu-Yu
AU - Breuer, Kersten
AU - Weichenhan, Dieter
AU - Lafrenz, Pascal
AU - Sarnataro, Antonella
AU - Wilk, Agata
AU - Chepeleva, Maryna
AU - Mücke, Oliver
AU - Schönung, Maximilian
AU - Petermann, Franziska
AU - Kensche, Philip Reiner
AU - Weiser, Lena
AU - Thommen, Frank
AU - Giacomelli, Gideon
AU - Nordstroem, Karl
AU - Gonzalez-Avalos, Edahi
AU - Merkel, Angelika
AU - Kretzmer, Helene
AU - Fischer, Jonas
AU - Krämer, Stephen
AU - Iskar, Murat
AU - Wolf, Stephan
AU - Buchhalter, Ivo
AU - Esteller, Manel
AU - Lawerenz, Christian
AU - Twardziok, Sven
AU - Zapatka, Marc
AU - Hovestadt, Volker
AU - Schlesner, Matthias
AU - Schulz, Marcel H
AU - Hoffmann, Steve
AU - Gerhauser, Clarissa
AU - Walter, Jörn
AU - Hartmann, Mark
AU - Lipka, Daniel B
AU - Assenov, Yassen
AU - Bock, Christoph
AU - Plass, Christoph
AU - Toth, Reka
AU - Lutsik, Pavlo
N1 - Funding:
This study was supported by grants from the German Min-
istry of Education and Science (BMBF) for the consor-
tium BSmadeEZ (031L0162B to P.L., R.T., and Y.A. and
031L0162A to C.L.) and from Horizon Europe project
EOSC4Cancer (101058427). P.L. was supported by the
BMBF-funded German Network for Bioinformatics In-
frastructure (de.NBI) within its partner project de.NBI-
epi/Heidelberg (031L0101A). de.NBI also provides compu-
tational resources and hosts web services. P.L. and C.P. re-
ceived funding from the German Cancer Aid (DKH) for
the project CO-CLL (70113869). P.L. was furthermore sup-
ported by a BOFZAP starting grant (STG/22/024) from KU
Leuven. D.B.L. received funding from the Wilhelm Sander-
Stiftung (2022.010.1) and from the BMBF (HEROES-AYA
consortium, subproject 3, 01KD2207A). This research has re-
ceived funding from the European Union’s Horizon 2020 re-
search and innovation program under Grant Agreement No.
824110—EASI-Genomics (to M.Schle. and S.K.). Additional
support to M.H.S. by the Cardio-Pulmonary Institute (CPI)
(EXC 2026, 390649896) and the German Center for Car-
diovascular Research (D ZHK) (81Z0200101). M.Schoe. was
supported by the Joachim Herz Foundation (Add-on Fellow-
ships for Interdisciplinary Life Science). Funding to pay the
Open Access publication charges for this article was provided
by Luxembourg Institute of Health (to R.T.) and KU Leuven
BOFZAP starting grant (to P.L.).
© The Author(s) 2025. Published by Oxford University Press.
PY - 2025/10/14
Y1 - 2025/10/14
N2 - DNA methylation is a widely studied epigenetic mark and a powerful biomarker of cell type, age, environmental exposures, and disease. Whole-genome sequencing following selective conversion of unmethylated cytosines into thymines via bisulfite treatment or enzymatic methods remains the reference method for DNA methylation profiling genome-wide. While numerous software tools facilitate processing of DNA methylation sequencing reads, a comprehensive benchmarking study has been lacking. In this study, we systematically compared complete computational workflows for processing DNA methylation sequencing data using a dedicated benchmarking dataset generated with five whole-genome profiling protocols. As an evaluation reference, we employed accurate locus-specific measurements from our previous benchmark of targeted DNA methylation assays. Based on this experimental gold-standard assessment and multiple performance metrics, we identified workflows that consistently demonstrated superior performance and revealed major workflow development trends. To ensure the long-term utility of our benchmark, we implemented an interactive workflow execution and data presentation platform, adaptable to user-defined criteria and readily expandable to future software.
AB - DNA methylation is a widely studied epigenetic mark and a powerful biomarker of cell type, age, environmental exposures, and disease. Whole-genome sequencing following selective conversion of unmethylated cytosines into thymines via bisulfite treatment or enzymatic methods remains the reference method for DNA methylation profiling genome-wide. While numerous software tools facilitate processing of DNA methylation sequencing reads, a comprehensive benchmarking study has been lacking. In this study, we systematically compared complete computational workflows for processing DNA methylation sequencing data using a dedicated benchmarking dataset generated with five whole-genome profiling protocols. As an evaluation reference, we employed accurate locus-specific measurements from our previous benchmark of targeted DNA methylation assays. Based on this experimental gold-standard assessment and multiple performance metrics, we identified workflows that consistently demonstrated superior performance and revealed major workflow development trends. To ensure the long-term utility of our benchmark, we implemented an interactive workflow execution and data presentation platform, adaptable to user-defined criteria and readily expandable to future software.
KW - DNA Methylation
KW - Workflow
KW - Software
KW - Benchmarking
KW - Humans
KW - Whole Genome Sequencing/methods
KW - Sequence Analysis, DNA/methods
KW - Computational Biology/methods
KW - Epigenesis, Genetic
KW - Epigenomics/methods
UR - https://pubmed.ncbi.nlm.nih.gov/41118575/
U2 - 10.1093/nar/gkaf970
DO - 10.1093/nar/gkaf970
M3 - Article
C2 - 41118575
SN - 0305-1048
VL - 53
JO - Nucleic Acids Research
JF - Nucleic Acids Research
IS - 19
ER -