TY - JOUR
T1 - On Evaluating Adversarial Robustness of Chest X-ray Classification
T2 - 2023 Workshop on Artificial Intelligence Safety, SafeAI 2023
AU - Ghamizi, Salah
AU - Cordy, Maxime
AU - Papadakis, Mike
AU - Le Traon, Yves
N1 - Publisher Copyright:
© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org)
PY - 2023
Y1 - 2023
N2 - Vulnerability to adversarial attacks is a well-known weakness of Deep Neural Networks. While most of the studies focus on natural images with standardized benchmarks like ImageNet and CIFAR, little research has considered real-world applications, in particular in the medical domain. Our research shows that, contrary to previous claims, the robustness of chest x-ray classification is much harder to evaluate and leads to very different assessments based on the dataset, the architecture, and robustness metric. We argue that previous studies did not take into account the peculiarity of medical diagnosis, like the co-occurrence of diseases, the disagreement of labellers (domain experts), the threat model of the attacks, and the risk implications for each successful attack. In this paper, we discuss the methodological foundations, review the pitfalls and best practices, and suggest new methodological considerations for evaluating the robustness of chest xray classification models. Our evaluation of three datasets, seven models, and 18 diseases is the largest evaluation of the robustness of chest X-ray classification models.
AB - Vulnerability to adversarial attacks is a well-known weakness of Deep Neural Networks. While most of the studies focus on natural images with standardized benchmarks like ImageNet and CIFAR, little research has considered real-world applications, in particular in the medical domain. Our research shows that, contrary to previous claims, the robustness of chest x-ray classification is much harder to evaluate and leads to very different assessments based on the dataset, the architecture, and robustness metric. We argue that previous studies did not take into account the peculiarity of medical diagnosis, like the co-occurrence of diseases, the disagreement of labellers (domain experts), the threat model of the attacks, and the risk implications for each successful attack. In this paper, we discuss the methodological foundations, review the pitfalls and best practices, and suggest new methodological considerations for evaluating the robustness of chest xray classification models. Our evaluation of three datasets, seven models, and 18 diseases is the largest evaluation of the robustness of chest X-ray classification models.
KW - Adversarial
KW - Chest X-ray
KW - CheXpert
KW - CXR
KW - Evasion
KW - NIH
KW - PadChest
KW - Radiograph
KW - Robustness
UR - http://www.scopus.com/inward/record.url?scp=85159378589&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85159378589
SN - 1613-0073
VL - 3381
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
Y2 - 13 February 2023 through 14 February 2023
ER -