Random forest-based modelling to detect biomarkers for prostate cancer progression

Reka Toth, Heiko Schiffmann, Claudia Hube-Magg, Franziska Büscheck, Doris Höflmayer, Sören Weidemann, Patrick Lebok, Christoph Fraune, Sarah Minner, Thorsten Schlomm, Guido Sauter, Christoph Plass, Yassen Assenov, Ronald Simon, Jan Meiners, Clarissa Gerhäuser*

*Corresponding author for this work

Research output: Contribution to journalArticleResearchpeer-review

48 Citations (Scopus)


Background: The clinical course of prostate cancer (PCa) is highly variable, demanding an individualized approach to therapy. Overtreatment of indolent PCa cases, which likely do not progress to aggressive stages, may be associated with severe side effects and considerable costs. These could be avoided by utilizing robust prognostic markers to guide treatment decisions. Results: We present a random forest-based classification model to predict aggressive behaviour of prostate cancer. DNA methylation changes between PCa cases with good or poor prognosis (discovery cohort with n = 70) were used as input. DNA was extracted from formalin-fixed tumour tissue, and genome-wide DNA methylation differences between both groups were assessed using Illumina HumanMethylation450 arrays. For the random forest-based modelling, the discovery cohort was randomly split into a training (80%) and a test set (20%). Our methylation-based classifier demonstrated excellent performance in discriminating prognosis subgroups in the test set (Kaplan-Meier survival analyses with log-rank p value < 0.0001). The area under the receiver operating characteristic curve (AUC) for the sensitivity analysis was 95%. Using the ICGC cohort of early- and late-onset prostate cancer (n = 222) and the TCGA PRAD cohort (n = 477) for external validation, AUCs for sensitivity analyses were 77.1% and 68.7%, respectively. Cancer progression-related DNA hypomethylation was frequently located in 'partially methylated domains' (PMDs) - large-scale genomic areas with progressive loss of DNA methylation linked to mitotic cell division. We selected several candidate genes with differential methylation in gene promoter regions for additional validation at the protein expression level by immunohistochemistry in > 12,000 tissue micro-arrayed PCa cases. Loss of ZIC2 protein expression was associated with poor prognosis and correlated with significantly shorter time to biochemical recurrence. The prognostic value of ZIC2 proved to be independent from established clinicopathological variables including Gleason grade, tumour stage, nodal stage and prostate-specific-antigen. Conclusions: Our results highlight the prognostic relevance of methylation loss in PMD regions, as well as of several candidate genes not previously associated with PCa progression. Our robust and externally validated PCa classification model either directly or via protein expression analyses of the identified top-ranked candidate genes will support the clinical management of prostate cancer.

Original languageEnglish
Article number148
JournalClinical Epigenetics
Issue number1
Publication statusPublished - 22 Oct 2019
Externally publishedYes


Dive into the research topics of 'Random forest-based modelling to detect biomarkers for prostate cancer progression'. Together they form a unique fingerprint.

Cite this