ALTRUIST: a Python package to emulate a Virtual Digital Cohort Study using social media data

Charline Bour, Abir Elbeji, Luigi De Giovanni, Adrian Ahne, Guy Fagherazzi

Research output: Contribution to journalArticleResearchpeer-review


Epidemiological cohort studies play a crucial role in identifying risk factors for various outcomes among participants. These studies are often time-consuming and costly due to recruitment and long-term follow-up. Social media (SM) data has emerged as a valuable complementary source for digital epidemiology and health research, as online communities of patients regularly share information about their illnesses. Unlike traditional clinical questionnaires, SM offer unstructured but insightful information about patients' disease burden. Yet, there is limited guidance on analyzing SM data as a prospective cohort. We presented the concept of virtual digital cohort studies (VDCS) as an approach to replicate cohort studies using SM data. In this paper, we introduce ALTRUIST, an open-source Python package enabling standardized generation of VDCS on SM. ALTRUIST facilitates data collection, preprocessing, and analysis steps that mimic a traditional cohort study. We provide a practical use case focusing on diabetes to illustrate the methodology. By leveraging SM data, which offers large-scale and cost-effective information on users' health, we demonstrate the potential of VDCS as an essential tool for specific research questions. ALTRUIST is customizable and can be applied to data from various online communities of patients, complementing traditional epidemiological methods and promoting minimally disruptive health research.

Original languageEnglish
Pages (from-to)1-7
Number of pages7
JournalIEEE Transactions on Big Data
Publication statusAccepted/In press - 5 Feb 2024


  • Blogs
  • Cohort
  • Computational modeling
  • Digital Health
  • Diseases
  • Natural Language Processing
  • Python
  • Recruitment
  • Social Media
  • Social networking (online)
  • Sociology
  • Statistics


Dive into the research topics of 'ALTRUIST: a Python package to emulate a Virtual Digital Cohort Study using social media data'. Together they form a unique fingerprint.

Cite this