Biomarker discovery studies for patient stratification using machine learning analysis of omics data: A scoping review

Enrico Glaab*, Armin Rauschenberger, Rita Banzi, Chiara Gerardi, Paula Garcia, Jacques Demotes

*Corresponding author for this work

Research output: Contribution to journalReview articlepeer-review

22 Citations (Scopus)


Objective To review biomarker discovery studies using omics data for patient stratification which led to clinically validated FDA-cleared tests or laboratory developed tests, in order to identify common characteristics and derive recommendations for future biomarker projects. Design Scoping review. Methods We searched PubMed, EMBASE and Web of Science to obtain a comprehensive list of articles from the biomedical literature published between January 2000 and July 2021, describing clinically validated biomarker signatures for patient stratification, derived using statistical learning approaches. All documents were screened to retain only peer-reviewed research articles, review articles or opinion articles, covering supervised and unsupervised machine learning applications for omics-based patient stratification. Two reviewers independently confirmed the eligibility. Disagreements were solved by consensus. We focused the final analysis on omics-based biomarkers which achieved the highest level of validation, that is, clinical approval of the developed molecular signature as a laboratory developed test or FDA approved tests. Results Overall, 352 articles fulfilled the eligibility criteria. The analysis of validated biomarker signatures identified multiple common methodological and practical features that may explain the successful test development and guide future biomarker projects. These include study design choices to ensure sufficient statistical power for model building and external testing, suitable combinations of non-targeted and targeted measurement technologies, the integration of prior biological knowledge, strict filtering and inclusion/exclusion criteria, and the adequacy of statistical and machine learning methods for discovery and validation. Conclusions While most clinically validated biomarker models derived from omics data have been developed for personalised oncology, first applications for non-cancer diseases show the potential of multivariate omics biomarker design for other complex disorders. Distinctive characteristics of prior success stories, such as early filtering and robust discovery approaches, continuous improvements in assay design and experimental measurement technology, and rigorous multicohort validation approaches, enable the derivation of specific recommendations for future studies.

Original languageEnglish
Article numbere053674
JournalBMJ Open
Issue number12
Publication statusPublished - 6 Dec 2021
Externally publishedYes


  • biomarkers
  • machine learning
  • omics
  • scoping review
  • stratification


Dive into the research topics of 'Biomarker discovery studies for patient stratification using machine learning analysis of omics data: A scoping review'. Together they form a unique fingerprint.

Cite this