Use of artificial intelligence methods for the analysis of real-world and social media data in digital epidemiology

Research output: Types of ThesisDoctoral Thesis


Introduction Among all the digital data sources, social media have emerged as a significant source of health-related information, offering access to patient perspectives, outcomes and experiences. This rapid access to patients’ emotions and concerns at a large scale represents a unique opportunity to improve patient-centered research and care. Relying on its dynamic online community, this thesis will focus on people living with diabetes, with an aim to better describe and understand the burden of diabetes. As part of the World Diabetes Distress Study, this thesis explores the potential of using social media data for health research and digital epidemiology using artificial intelligence methods for chronic diseases to go beyond the historical analysis of online data to monitor infectious disease epidemics. The overall aim is to demonstrate how social media data can capture key insights from health-related discussions and shape and enhance healthcare strategies. Methods We first used a scoping review approach to identify all the different uses of social media for health research purposes. Second, a global analysis of diabetes-related tweets was conducted to identify the critical determinants of diabetes burden and the differences in how diabetes is perceived worldwide. Then, we developed the concept of a virtual digital cohort study (VDCS) and designed a specialized tool to standardize and analyze social media data as a typical cohort study. Results We have shown that social media platforms can be used for health research. It can be used for various tasks, from recruitment to the dissemination of information and data collection. The rich information shared by the communities of people with diabetes can be used as a complementary approach to traditional, questionnaire-based epidemiology. This project led to the analysis of 54 million diabetes-related tweets collected between 2017 and 2021, thereby enhancing our understanding of the diabetes burden worldwide. An open-source Python package ALTRUIST was created to standardize and simplify setting up VDCS using social media data. It allows researchers to effectively navigate through various stages of data collection, pre-processing, and analysis, simulating a traditional cohort study using social media data. Discussion/Conclusion This research highlights the potential of social media data in health research and digital epidemiology. Social media data can give a valuable, unbiased, unrestricted and unfiltered insight into patients' daily lives and experiences complementary to traditional approaches. The ALTRUIST package was designed to standardize the analysis of such data like a cohort and to help the research community to develop social-media-based research projects. Significant ethical, methodological and technical challenges remain to be addressed as we continue to deepen the field. Therefore, standardization of methodologies is necessary to gain the impact of results and trust from healthcare professionals. This work can be considered a first step towards a cohesive, standardized field to boost patient-centered care and global health strategies.
Original languageEnglish
Awarding Institution
  • University of Luxembourg
  • Fagherazzi, Guy, Supervisor
Award date5 Dec 2023
Publication statusPublished - 5 Dec 2023


Dive into the research topics of 'Use of artificial intelligence methods for the analysis of real-world and social media data in digital epidemiology'. Together they form a unique fingerprint.

Cite this