The Need of Standardised Metadata to Encode Causal Relationships: Towards Safer Data-Driven Machine Learning Biological Solutions

Beatriz Garcia Santa Cruz*, Carlos Vega, Frank Hertel

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)

Abstract

In this paper, we discuss the importance of considering causal relations in the development of machine learning solutions to prevent factors hampering the robustness and generalisation capacity of the models, such as induced biases. This issue often arises when the algorithm decision is affected by confounding factors. In this work, we argue that the integration of research assumptions as causal relationships can help identify potential confounders. Together with metadata information, it can enable meta-comparison of data acquisition pipelines. We call for standardised meta-information practices as a crucial step for proper machine learning solutions development, validation, and data sharing. Such practices include detailing the data acquisition process, aiming for automatic integration of causal relationships and actionable metadata.

Original languageEnglish
Title of host publicationComputational Intelligence Methods for Bioinformatics and Biostatistics - 17th International Meeting, CIBB 2021, Revised Selected Papers
EditorsDavide Chicco, Angelo Facchiano, Erica Tavazzi, Enrico Longato, Martina Vettoretti, Anna Bernasconi, Simone Avesani, Paolo Cazzaniga
PublisherSpringer Science and Business Media Deutschland GmbH
Pages200-216
Number of pages17
ISBN (Print)9783031208362
DOIs
Publication statusPublished - 2022
Externally publishedYes
Event17th International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics, CIBB 2021 - Virtual, Online
Duration: 15 Nov 202117 Nov 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13483 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference17th International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics, CIBB 2021
CityVirtual, Online
Period15/11/2117/11/21

Keywords

  • Causality
  • Confounders
  • Machine learning
  • Metadata
  • Systems biology

Fingerprint

Dive into the research topics of 'The Need of Standardised Metadata to Encode Causal Relationships: Towards Safer Data-Driven Machine Learning Biological Solutions'. Together they form a unique fingerprint.

Cite this