TY - JOUR
T1 - Guidelines for cell-type heterogeneity quantification based on a comparative analysis of reference-free DNA methylation deconvolution software
AU - Decamps, Clémentine
AU - Privé, Florian
AU - Bacher, Raphael
AU - Jost, Daniel
AU - Waguet, Arthur
AU - Achard, Sophie
AU - Amblard, Elise
AU - Bacher, Raphael
AU - Bergmann, Fabian
AU - Blum, Michael
AU - Blum, Yuna
AU - Bottaz-Bosson, Guillaume
AU - Broseus, Lucile
AU - Chuffart, Florent
AU - Decamps, Clémentine
AU - Devijver, Emilie
AU - Durif, Ghislain
AU - Feofanov, Vassili
AU - Houseman, Eugene Andres
AU - Gallopin, Melina
AU - Jedynak, Paulina
AU - Jonchere, Vincent
AU - Van De Geer, Ellen
AU - Jumentier, Basile
AU - Kaoma, Tony
AU - Lurie, Eugene
AU - Lutsik, Pavlo
AU - Markowski, Julia
AU - Melnykova, Anna
AU - Merlevede, Jane
AU - Nazarov, Petr
AU - Nguyen, Ngoc Ha
AU - Permiakova, Olga
AU - Privé, Florian
AU - Richard, Magali
AU - Rolland, Matthieu
AU - Scherer, Michael
AU - Spill, Yannick
AU - Houseman, Eugene Andres
AU - Lurie, Eugene
AU - Lutsik, Pavlo
AU - Milosavljevic, Aleksandar
AU - Scherer, Michael
AU - Blum, Michael G.B.
AU - Richard, Magali
N1 - Publisher Copyright:
© 2020 The Author(s).
PY - 2020/1/13
Y1 - 2020/1/13
N2 - Background: Cell-type heterogeneity of tumors is a key factor in tumor progression and response to chemotherapy. Tumor cell-type heterogeneity, defined as the proportion of the various cell-types in a tumor, can be inferred from DNA methylation of surgical specimens. However, confounding factors known to associate with methylation values, such as age and sex, complicate accurate inference of cell-type proportions. While reference-free algorithms have been developed to infer cell-type proportions from DNA methylation, a comparative evaluation of the performance of these methods is still lacking. Results: Here we use simulations to evaluate several computational pipelines based on the software packages MeDeCom, EDec, and RefFreeEWAS. We identify that accounting for confounders, feature selection, and the choice of the number of estimated cell types are critical steps for inferring cell-type proportions. We find that removal of methylation probes which are correlated with confounder variables reduces the error of inference by 30-35%, and that selection of cell-type informative probes has similar effect. We show that Cattell's rule based on the scree plot is a powerful tool to determine the number of cell-types. Once the pre-processing steps are achieved, the three deconvolution methods provide comparable results. We observe that all the algorithms' performance improves when inter-sample variation of cell-type proportions is large or when the number of available samples is large. We find that under specific circumstances the methods are sensitive to the initialization method, suggesting that averaging different solutions or optimizing initialization is an avenue for future research. Conclusion: Based on the lessons learned, to facilitate pipeline validation and catalyze further pipeline improvement by the community, we develop a benchmark pipeline for inference of cell-type proportions and implement it in the R package medepir.
AB - Background: Cell-type heterogeneity of tumors is a key factor in tumor progression and response to chemotherapy. Tumor cell-type heterogeneity, defined as the proportion of the various cell-types in a tumor, can be inferred from DNA methylation of surgical specimens. However, confounding factors known to associate with methylation values, such as age and sex, complicate accurate inference of cell-type proportions. While reference-free algorithms have been developed to infer cell-type proportions from DNA methylation, a comparative evaluation of the performance of these methods is still lacking. Results: Here we use simulations to evaluate several computational pipelines based on the software packages MeDeCom, EDec, and RefFreeEWAS. We identify that accounting for confounders, feature selection, and the choice of the number of estimated cell types are critical steps for inferring cell-type proportions. We find that removal of methylation probes which are correlated with confounder variables reduces the error of inference by 30-35%, and that selection of cell-type informative probes has similar effect. We show that Cattell's rule based on the scree plot is a powerful tool to determine the number of cell-types. Once the pre-processing steps are achieved, the three deconvolution methods provide comparable results. We observe that all the algorithms' performance improves when inter-sample variation of cell-type proportions is large or when the number of available samples is large. We find that under specific circumstances the methods are sensitive to the initialization method, suggesting that averaging different solutions or optimizing initialization is an avenue for future research. Conclusion: Based on the lessons learned, to facilitate pipeline validation and catalyze further pipeline improvement by the community, we develop a benchmark pipeline for inference of cell-type proportions and implement it in the R package medepir.
KW - Cell heterogeneity
KW - DNA methylation
KW - Deconvolution
KW - Epigenetics
KW - Matrix factorization
KW - R package/pipeline
UR - http://www.scopus.com/inward/record.url?scp=85077786039&partnerID=8YFLogxK
U2 - 10.1186/s12859-019-3307-2
DO - 10.1186/s12859-019-3307-2
M3 - Article
C2 - 31931698
AN - SCOPUS:85077786039
SN - 1471-2105
VL - 21
JO - BMC Bioinformatics
JF - BMC Bioinformatics
IS - 1
M1 - 16
ER -