An evaluation of unsupervised acoustic model training for a dysarthric speech interface

Oliver Walter*, Vladimir Despotovic, Reinhold Haeb-Umbach, Jort F. Gemmeke, Bart Ons, Hugo Van Hamme

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

9 Citations (Scopus)

Abstract

In this paper, we investigate unsupervised acoustic model training approaches for dysarthric-speech recognition. These models are first, frame-based Gaussian posteriorgrams, obtained from Vector Quantization (VQ), second, so-called Acoustic Unit Descriptors (AUDs), which are hidden Markov models of phone-like units, that are trained in an unsupervised fashion, and, third, posteriorgrams computed on the AUDs. Experiments were carried out on a database collected from a home automation task and containing nine speakers, of which seven are considered to utter dysarthric speech. All unsupervised modeling approaches delivered significantly better recognition rates than a speaker-independent phoneme recognition baseline, showing the suitability of unsupervised acoustic model training for dysarthric speech. While the AUD models led to the most compact representation of an utterance for the subsequent semantic inference stage, posteriorgram-based representations resulted in higher recognition rates, with the Gaussian posteriorgram achieving the highest slot filling F-score of 97.02%.

Original languageEnglish
Pages (from-to)1013-1017
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication statusPublished - 2014
Externally publishedYes
Event15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 - Singapore, Singapore
Duration: 14 Sep 201418 Sep 2014

Keywords

  • Acoustic unit descriptors
  • Dysarthric speech
  • Non-negative matrix factorization
  • Unsupervised learning

Fingerprint

Dive into the research topics of 'An evaluation of unsupervised acoustic model training for a dysarthric speech interface'. Together they form a unique fingerprint.

Cite this