Abstract
Epidemiological cohort studies play a crucial role in identifying risk factors for various outcomes among participants. These studies are often time-consuming and costly due to recruitment and long-term follow-up. Social media (SM) data has emerged as a valuable complementary source for digital epidemiology and health research, as online communities of patients regularly share information about their illnesses. Unlike traditional clinical questionnaires, SM offer unstructured but insightful information about patients' disease burden. Yet, there is limited guidance on analyzing SM data as a prospective cohort. We presented the concept of virtual digital cohort studies (VDCS) as an approach to replicate cohort studies using SM data. In this paper, we introduce ALTRUIST, an open-source Python package enabling standardized generation of VDCS on SM. ALTRUIST facilitates data collection, preprocessing, and analysis steps that mimic a traditional cohort study. We provide a practical use case focusing on diabetes to illustrate the methodology. By leveraging SM data, which offers large-scale and cost-effective information on users' health, we demonstrate the potential of VDCS as an essential tool for specific research questions. ALTRUIST is customizable and can be applied to data from various online communities of patients, complementing traditional epidemiological methods and promoting minimally disruptive health research.
Original language | English |
---|---|
Pages (from-to) | 568-575 |
Number of pages | 7 |
Journal | IEEE Transactions on Big Data |
Volume | 10 |
Issue number | 4 |
DOIs | |
Publication status | Published - 5 Feb 2024 |
Keywords
- Blogs
- Cohort
- Computational modeling
- Digital Health
- Diseases
- Natural Language Processing
- Python
- Recruitment
- Social Media
- Social networking (online)
- Sociology
- Statistics