TY - JOUR
T1 - Digital Vocal Biomarker of Smoking Status Using Ecological Audio Recordings
T2 - Results from the Colive Voice Study
AU - Ayadi, Hanin
AU - Elbéji, Abir
AU - Despotovic, Vladimir
AU - Fagherazzi, Guy
N1 - Funding Sources
The Luxembourg Institute of Health funds the Colive Voice study and this research work
Publisher Copyright:
© 2024 The Author(s).
© 2024 The Author(s). Published by S. Karger AG, Basel.
PY - 2024/8/28
Y1 - 2024/8/28
N2 - INTRODUCTION: The complex health, social, and economic consequences of tobacco smoking underscore the importance of incorporating reliable and scalable data collection on smoking status and habits into research across various disciplines. Given that smoking impacts voice production, we aimed to develop a gender and language-specific vocal biomarker of smoking status.METHODS: Leveraging data from the Colive Voice study, we used statistical analysis methods to quantify the effects of smoking on voice characteristics. Various voice feature extraction methods combined with machine learning algorithms were then used to produce a gender and language-specific (English and French) digital vocal biomarker to differentiate smokers from never-smokers.RESULTS: A total of 1,332 participants were included after propensity score matching (mean age = 43.6 [13.65], 64.41% are female, 56.68% are English speakers, 50% are smokers and 50% are never-smokers). We observed differences in voice features distribution: for women, the fundamental frequency F0, the formants F1, F2, and F3 frequencies and the harmonics-to-noise ratio were lower in smokers compared to never-smokers (
p < 0.05) while for men no significant disparities were noted between the two groups. The accuracy and AUC of smoking status prediction reached 0.71 and 0.76, respectively, for the female participants, and 0.65 and 0.68, respectively, for the male participants.
CONCLUSION: We have shown that voice features are impacted by smoking. We have developed a novel digital vocal biomarker that can be used in clinical and epidemiological research to assess smoking status in a rapid, scalable, and accurate manner using ecological audio recordings.
AB - INTRODUCTION: The complex health, social, and economic consequences of tobacco smoking underscore the importance of incorporating reliable and scalable data collection on smoking status and habits into research across various disciplines. Given that smoking impacts voice production, we aimed to develop a gender and language-specific vocal biomarker of smoking status.METHODS: Leveraging data from the Colive Voice study, we used statistical analysis methods to quantify the effects of smoking on voice characteristics. Various voice feature extraction methods combined with machine learning algorithms were then used to produce a gender and language-specific (English and French) digital vocal biomarker to differentiate smokers from never-smokers.RESULTS: A total of 1,332 participants were included after propensity score matching (mean age = 43.6 [13.65], 64.41% are female, 56.68% are English speakers, 50% are smokers and 50% are never-smokers). We observed differences in voice features distribution: for women, the fundamental frequency F0, the formants F1, F2, and F3 frequencies and the harmonics-to-noise ratio were lower in smokers compared to never-smokers (
p < 0.05) while for men no significant disparities were noted between the two groups. The accuracy and AUC of smoking status prediction reached 0.71 and 0.76, respectively, for the female participants, and 0.65 and 0.68, respectively, for the male participants.
CONCLUSION: We have shown that voice features are impacted by smoking. We have developed a novel digital vocal biomarker that can be used in clinical and epidemiological research to assess smoking status in a rapid, scalable, and accurate manner using ecological audio recordings.
KW - Machine learning
KW - Public health
KW - Smoking
KW - Tobacco
KW - Vocal biomarkers
UR - http://www.scopus.com/inward/record.url?scp=85203080732&partnerID=8YFLogxK
U2 - 10.1159/000540327
DO - 10.1159/000540327
M3 - Article
C2 - 39473806
AN - SCOPUS:85203080732
SN - 2504-110X
VL - 8
SP - 159
EP - 170
JO - Digital Biomarkers
JF - Digital Biomarkers
IS - 1
ER -