Multimodal Fusion for Vocal Biomarkers Using Vector Cross-Attention

Research output: Contribution to journalConference articlepeer-review

Abstract

Vocal biomarkers are measurable characteristics of person's voice that provide valuable insights into various aspects of their physiological and psychological state, or health status. The use of standardized voice tasks, such as reading, counting, or sustained vowel phonation are common in vocal biomarker research, but semi-spontaneous tasks where the person is instructed to talk about a particular topic, or spontaneous speech are also increasingly used. However, limited efforts were made to combine multiple voice modalities. In this paper, we propose a simple, yet efficient approach of fusing multiple standardized voice tasks based on vector cross-attention, showing improved predictive capacity for derived vocal biomarkers in comparison to single modalities. The multimodal approach is tested on the assessment of respiratory quality of life from reading and sustained vowel phonation recordings, outperforming single modalities up to 4.2% in terms of accuracy (relative increase of 7%).

Original languageEnglish
Pages (from-to)1435-1439
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
DOIs
Publication statusPublished - Sept 2024
Event25th Interspeech Conferece 2024 - Kos Island, Greece
Duration: 1 Sept 20245 Sept 2024

Keywords

  • attention mechanism
  • multimodal fusion
  • vocal biomarker

Fingerprint

Dive into the research topics of 'Multimodal Fusion for Vocal Biomarkers Using Vector Cross-Attention'. Together they form a unique fingerprint.

Cite this