TY - JOUR
T1 - A Functionally-Grounded Benchmark Framework for XAI Methods
T2 - Insights and Foundations from a Systematic Literature Review
AU - Canha, Dulce
AU - Kubler, Sylvain
AU - Främling, Kary
AU - Fagherazzi, Guy
N1 - Funding:
Dulce Canha is supported by the Luxembourg National Research Fund through the grant number PRIDE21/16749720.
Publisher Copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2025/7/14
Y1 - 2025/7/14
N2 - Artificial Intelligence (AI) is transforming industries, offering new opportunities to manage and enhance innovation. However, these advancements bring significant challenges for scientists and businesses, with one of the most critical being the ‘trustworthiness” of AI systems. A key requirement of trustworthiness is transparency, closely linked to explicability. Consequently, the exponential growth of eXplainable AI (XAI) has led to the development of numerous methods and metrics for explainability. Nevertheless, this has resulted in a lack of standardized and formal definitions for fundamental XAI properties (e.g., what do soundness, completeness, and faithfulness of an explanation entail? How is the stability of an XAI method defined?). This lack of consensus makes it difficult for XAI practitioners to establish a shared foundation, thereby impeding the effective benchmarking of XAI methods. This survey article addresses these challenges with two primary objectives. First, it systematically reviews and categorizes XAI properties, distinguishing them between human-centered (relying on empirical studies involving explainees) or functionally-grounded (quantitative metrics independent of explainees). Second, it expands this analysis by introducing a hierarchically structured, functionally grounded benchmark framework for XAI methods, providing formal definitions of XAI properties. The framework’s practicality is demonstrated by applying it to two widely used methods: LIME and SHAP.
AB - Artificial Intelligence (AI) is transforming industries, offering new opportunities to manage and enhance innovation. However, these advancements bring significant challenges for scientists and businesses, with one of the most critical being the ‘trustworthiness” of AI systems. A key requirement of trustworthiness is transparency, closely linked to explicability. Consequently, the exponential growth of eXplainable AI (XAI) has led to the development of numerous methods and metrics for explainability. Nevertheless, this has resulted in a lack of standardized and formal definitions for fundamental XAI properties (e.g., what do soundness, completeness, and faithfulness of an explanation entail? How is the stability of an XAI method defined?). This lack of consensus makes it difficult for XAI practitioners to establish a shared foundation, thereby impeding the effective benchmarking of XAI methods. This survey article addresses these challenges with two primary objectives. First, it systematically reviews and categorizes XAI properties, distinguishing them between human-centered (relying on empirical studies involving explainees) or functionally-grounded (quantitative metrics independent of explainees). Second, it expands this analysis by introducing a hierarchically structured, functionally grounded benchmark framework for XAI methods, providing formal definitions of XAI properties. The framework’s practicality is demonstrated by applying it to two widely used methods: LIME and SHAP.
KW - Artificial intelligence
KW - eXplainable AI (XAI)
KW - interpretability
KW - machine learning
KW - responsible AI
KW - transparency
KW - trustworthiness
UR - https://www.scopus.com/pages/publications/105011390896
U2 - 10.1145/3737445
DO - 10.1145/3737445
M3 - Article
AN - SCOPUS:105011390896
SN - 0360-0300
VL - 57
JO - ACM Computing Surveys
JF - ACM Computing Surveys
IS - 12
M1 - ART320
ER -