TY - JOUR
T1 - Digital Vocal Biomarker of Smoking Status Using Ecological Audio Recordings
T2 - Results from the Colive Voice Study
AU - Ayadi, Hanin
AU - Elbéji, Abir
AU - Despotovic, Vladimir
AU - Fagherazzi, Guy
N1 - Funding Sources
The Luxembourg Institute of Health funds the Colive Voice study and this research work
Publisher Copyright:
© 2024 The Author(s).
PY - 2024/8/28
Y1 - 2024/8/28
N2 - Introduction: The complex health, social, and economic consequences of tobacco smoking underscore the importance of incorporating reliable and scalable data collection on smoking status and habits into research across various disciplines. Given that smoking impacts voice production, we aimed to develop a gender and language-specific vocal biomarker of smoking status. Methods: Leveraging data from the Colive Voice study, we used statistical analysis methods to quantify the effects of smoking on voice characteristics. Various voice feature extraction methods combined with machine learning algorithms were then used to produce a gender and language-specific (English and French) digital vocal biomarker to differentiate smokers from never-smokers. Results: A total of 1,332 participants were included after propensity score matching (mean age = 43.6 [13.65], 64.41% are female, 56.68% are English speakers, 50% are smokers and 50% are never-smokers). We observed differences in voice features distribution: for women, the fundamental frequency F0, the formants F1, F2, and F3 frequencies and the harmonics-to-noise ratio were lower in smokers compared to never-smokers (p < 0.05) while for men no significant disparities were noted between the two groups. The accuracy and AUC of smoking status prediction reached 0.71 and 0.76, respectively, for the female participants, and 0.65 and 0.68, respectively, for the male participants. Conclusion: We have shown that voice features are impacted by smoking. We have developed a novel digital vocal biomarker that can be used in clinical and epidemiological research to assess smoking status in a rapid, scalable, and accurate manner using ecological audio recordings. The objective of this study was to develop a tool for determining the smoking status of a person from their voice. Using data from Colive Voice, an international digital health study led by the Luxembourg Institute of Health, we investigated the impact of smoking on voice characteristics utilizing statistical methods. We then employed artificial intelligence algorithms to identify gender and language-specific digital vocal biomarkers, which are combinations of voice features associated, in the context of this project, with the outcome of smoking status. After analyzing data from 1,332 participants, we found differences in voice features between smokers and never-smokers, particularly among women. For example, the pitch and certain frequencies were lower in female smokers compared to never-smokers. We managed to differentiate between smokers and never-smokers with a 71% accuracy for women and 65% for men. This research demonstrates that smoking affects voice and that it is possible to predict its status using audio recorded in real-life settings. This tool could be valuable in clinical and research settings for studying smoking habits in a rapid and scalable manner.
AB - Introduction: The complex health, social, and economic consequences of tobacco smoking underscore the importance of incorporating reliable and scalable data collection on smoking status and habits into research across various disciplines. Given that smoking impacts voice production, we aimed to develop a gender and language-specific vocal biomarker of smoking status. Methods: Leveraging data from the Colive Voice study, we used statistical analysis methods to quantify the effects of smoking on voice characteristics. Various voice feature extraction methods combined with machine learning algorithms were then used to produce a gender and language-specific (English and French) digital vocal biomarker to differentiate smokers from never-smokers. Results: A total of 1,332 participants were included after propensity score matching (mean age = 43.6 [13.65], 64.41% are female, 56.68% are English speakers, 50% are smokers and 50% are never-smokers). We observed differences in voice features distribution: for women, the fundamental frequency F0, the formants F1, F2, and F3 frequencies and the harmonics-to-noise ratio were lower in smokers compared to never-smokers (p < 0.05) while for men no significant disparities were noted between the two groups. The accuracy and AUC of smoking status prediction reached 0.71 and 0.76, respectively, for the female participants, and 0.65 and 0.68, respectively, for the male participants. Conclusion: We have shown that voice features are impacted by smoking. We have developed a novel digital vocal biomarker that can be used in clinical and epidemiological research to assess smoking status in a rapid, scalable, and accurate manner using ecological audio recordings. The objective of this study was to develop a tool for determining the smoking status of a person from their voice. Using data from Colive Voice, an international digital health study led by the Luxembourg Institute of Health, we investigated the impact of smoking on voice characteristics utilizing statistical methods. We then employed artificial intelligence algorithms to identify gender and language-specific digital vocal biomarkers, which are combinations of voice features associated, in the context of this project, with the outcome of smoking status. After analyzing data from 1,332 participants, we found differences in voice features between smokers and never-smokers, particularly among women. For example, the pitch and certain frequencies were lower in female smokers compared to never-smokers. We managed to differentiate between smokers and never-smokers with a 71% accuracy for women and 65% for men. This research demonstrates that smoking affects voice and that it is possible to predict its status using audio recorded in real-life settings. This tool could be valuable in clinical and research settings for studying smoking habits in a rapid and scalable manner.
KW - Machine learning
KW - Public health
KW - Smoking
KW - Tobacco
KW - Vocal biomarkers
UR - http://www.scopus.com/inward/record.url?scp=85203080732&partnerID=8YFLogxK
U2 - 10.1159/000540327
DO - 10.1159/000540327
M3 - Article
AN - SCOPUS:85203080732
SN - 2504-110X
SP - 159
EP - 170
JO - Digital Biomarkers
JF - Digital Biomarkers
ER -