TY - JOUR
T1 - Independent component analysis for unraveling the complexity of cancer omics datasets
AU - Sompairac, Nicolas
AU - Nazarov, Petr V.
AU - Czerwinska, Urszula
AU - Cantini, Laura
AU - Biton, Anne
AU - Molkenov, Askhat
AU - Zhumadilov, Zhaxybay
AU - Barillot, Emmanuel
AU - Radvanyi, Francois
AU - Gorban, Alexander
AU - Kairov, Ulykbek
AU - Zinovyev, Andrei
N1 - Funding Information:
This work was partially supported by the grant research projects ?Pan-cancer deconvolution of omics data using Independent Component Analysis? (IRN: AP05135430) and ?Investigation of esophageal cancer tissue gene expression derived from Kazakhstan patients by next-generation sequencing technology? (IRN: AP05134722) of the Ministry of Education and Science of the Republic of Kazakhstan, by the Ministry of Science and Higher Education of the Russian Federation (Project No. 14.Y26.31.0022), the European Union?s Horizon 2020 program (grant No. 826121, iPC project), by the European IMI IMMUCAN project, and by Luxembourg National Research Fund (C17/BM/11664971/DEMICS).
Funding Information:
Funding: This work was partially supported by the grant research projects “Pan-cancer deconvolution of omics data using Independent Component Analysis” (IRN: AP05135430) and “Investigation of esophageal cancer tissue gene expression derived from Kazakhstan patients by next-generation sequencing technology” (IRN: AP05134722) of the Ministry of Education and Science of the Republic of Kazakhstan, by the Ministry of Science and Higher Education of the Russian Federation (Project No. 14.Y26.31.0022), the European Union’s Horizon 2020 program (grant No. 826121, iPC project), by the European IMI IMMUCAN project, and by Luxembourg National Research Fund (C17/BM/11664971/DEMICS).
Publisher Copyright:
© 2019 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2019/9
Y1 - 2019/9
N2 - Independent component analysis (ICA) is a matrix factorization approach where the signals captured by each individual matrix factors are optimized to become as mutually independent as possible. Initially suggested for solving source blind separation problems in various fields, ICA was shown to be successful in analyzing functional magnetic resonance imaging (fMRI) and other types of biomedical data. In the last twenty years, ICA became a part of the standard machine learning toolbox, together with other matrix factorization methods such as principal component analysis (PCA) and non-negative matrix factorization (NMF). Here, we review a number of recent works where ICA was shown to be a useful tool for unraveling the complexity of cancer biology from the analysis of different types of omics data, mainly collected for tumoral samples. Such works highlight the use of ICA in dimensionality reduction, deconvolution, data pre-processing, meta-analysis, and others applied to different data types (transcriptome, methylome, proteome, single-cell data). We particularly focus on the technical aspects of ICA application in omics studies such as using different protocols, determining the optimal number of components, assessing and improving reproducibility of the ICA results, and comparison with other popular matrix factorization techniques. We discuss the emerging ICA applications to the integrative analysis of multi-level omics datasets and introduce a conceptual view on ICA as a tool for defining functional subsystems of a complex biological system and their interactions under various conditions. Our review is accompanied by a Jupyter notebook which illustrates the discussed concepts and provides a practical tool for applying ICA to the analysis of cancer omics datasets.
AB - Independent component analysis (ICA) is a matrix factorization approach where the signals captured by each individual matrix factors are optimized to become as mutually independent as possible. Initially suggested for solving source blind separation problems in various fields, ICA was shown to be successful in analyzing functional magnetic resonance imaging (fMRI) and other types of biomedical data. In the last twenty years, ICA became a part of the standard machine learning toolbox, together with other matrix factorization methods such as principal component analysis (PCA) and non-negative matrix factorization (NMF). Here, we review a number of recent works where ICA was shown to be a useful tool for unraveling the complexity of cancer biology from the analysis of different types of omics data, mainly collected for tumoral samples. Such works highlight the use of ICA in dimensionality reduction, deconvolution, data pre-processing, meta-analysis, and others applied to different data types (transcriptome, methylome, proteome, single-cell data). We particularly focus on the technical aspects of ICA application in omics studies such as using different protocols, determining the optimal number of components, assessing and improving reproducibility of the ICA results, and comparison with other popular matrix factorization techniques. We discuss the emerging ICA applications to the integrative analysis of multi-level omics datasets and introduce a conceptual view on ICA as a tool for defining functional subsystems of a complex biological system and their interactions under various conditions. Our review is accompanied by a Jupyter notebook which illustrates the discussed concepts and provides a practical tool for applying ICA to the analysis of cancer omics datasets.
KW - Cancer
KW - Data analysis
KW - Data integration
KW - Dimension reduction
KW - Independent component analysis
KW - Omics data
UR - http://www.scopus.com/inward/record.url?scp=85071977884&partnerID=8YFLogxK
U2 - 10.3390/ijms20184414
DO - 10.3390/ijms20184414
M3 - Review article
C2 - 31500324
AN - SCOPUS:85071977884
SN - 1661-6596
VL - 20
JO - International Journal of Molecular Sciences
JF - International Journal of Molecular Sciences
IS - 18
M1 - 4414
ER -