Explainable AI Agents for Autonomous System Evaluation: Integrating SHAP-Based Decision Attribution with Hierarchical Planning in Large Language Models
Keywords:
Explainable AI, SHAP, hierarchical planning, large language models, autonomous system evaluation, decision attribution, socio-technical governanceAbstract
The increasing deployment of autonomous systems in critical socio-technical domains such as healthcare, transportation, and energy infrastructure demands robust evaluation frameworks that are both transparent and adaptive. Large language models offer powerful reasoning capabilities but suffer from opacity and a lack of structured decision attribution, hindering their use in high-stakes evaluation tasks. This paper proposes a novel architecture for explainable AI agents that integrate SHAP-based decision attribution with hierarchical planning to enable autonomous system evaluation. The framework comprises three layers: a hierarchical planner that decomposes evaluation objectives into subgoals, a large language model that executes reasoning and generates explanations, and a SHAP attribution module that quantifies the contribution of each input feature to the agent’s decisions. By combining the structural clarity of hierarchical planning with the interpretability of SHAP values, the system provides both high-level strategic oversight and granular feature-level transparency. The paper examines structural trade-offs between explanation fidelity and computational efficiency, discusses deployment considerations across multi-agent environments, and analyzes governance implications including auditability, fairness, and regulatory compliance. Case illustrations from autonomous vehicle safety assessment and clinical decision support demonstrate the framework’s viability. Forward-looking perspectives address sustainability, robustness against adversarial inputs, and policy integration. The proposed approach advances the state of the art by unifying attribution methods with planning formalisms, offering a path toward trustworthy autonomous evaluation agents.
References
1. Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., ... & Zhang, Y. (2023). Sparks of artificial general intelligence: Early experiments with GPT-4. arXiv preprint arXiv:2303.12712.
2. Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, 30, 4765–4774.
3. Sacerdoti, E. D. (1974). Planning in a hierarchy of abstraction spaces. Artificial Intelligence, 5(2), 115–135.
4. Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). "Why should I trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–1144.
5. Jethani, N., Sudarsan, M., Apicella, A., Sontag, D., & Rajpurkar, P. (2022). FastSHAP: Real-time Shapley value estimation. In International Conference on Learning Representations.
6. Kulkarni, T. D., Narasimhan, K. R., Saeedi, A., & Tenenbaum, J. B. (2016). Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In Advances in Neural Information Processing Systems, 29, 3675–3683.
7. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., ... & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems, 35, 24824–24837.
8. Dou, Z., Zhao, Q., Wan, Z., Zhang, D., Wang, W., Raiyan, T., ... & Biswas, S. (2025). Plan Then Action: High-Level Planning Guidance Reinforcement Learning for LLM Reasoning. arXiv preprint arXiv:2510.01833.
9. Gao, H., Zeng, W., Zhang, J., & Liang, Y. (2025, December). A large model API response quality prediction model based on least squares vector machine and SHAP interpretability analysis. In 2025 5th International Symposium on Artificial Intelligence and Big Data (AIBDF) (pp. 438-442). IEEE.
10. Shui, Y., Jin, R., Dou, Z., & Gao, Z. (2026). ProtoGuard-SL: Prototype Consistency Based Backdoor Defense for Vertical Split Learning. arXiv preprint arXiv:2604.03595.
11. Zhou, D. (2025, December). M-VP2: Microservice-Oriented Vulnerability Patch Planning-A Cost-Aware Approachusing Multi-Agent Reinforcement Learning. In 2025 5th International Conference on Computer, Internet of Things and Control Engineering (CITCE) (pp. 248-254). IEEE.
12. Jethani, N., Sudarsan, M., Apicella, A., Sontag, D., & Rajpurkar, P. (2022). FastSHAP: Real-time Shapley value estimation. In International Conference on Learning Representations.
13. Molnar, C. (2022). Interpretable machine learning: A guide for making black box models explainable (2nd ed.). Leanpub.
14. European Commission. (2021). Proposal for a regulation laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). COM(2021) 206 final.
15. Goodfellow, I., Papernot, N., McDaniel, P., & Xu, K. (2018). Making machine learning robust against adversarial inputs. Communications of the ACM, 61(7), 56–66.
16. Adadi, A., & Berrada, M. (2018). Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE Access, 6, 52138–52160.
17. Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608.
18. Lipton, Z. C. (2018). The mythos of model interpretability. Queue, 16(3), 31–57.
19. Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215.
20. Arrieta, A. B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., ... & Herrera, F. (2020). Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82–115.
21. Holzinger, A., Saranti, A., Molnar, C., Biecek, P., & Samek, W. (2022). Explainable AI methods: A brief overview. In Machine Learning for Health Informatics (pp. 13–30). Springer.
22. Samek, W., Wiegand, T., & Müller, K. R. (2021). Explainable artificial intelligence: Understanding, summarizing and explaining the decisions of deep neural networks. In Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (pp. 5–22). Springer.
23. Pearl, J. (2009). Causality: Models, reasoning, and inference (2nd ed.). Cambridge University Press.
24. Russell, S., & Norvig, P. (2020). Artificial intelligence: A modern approach (4th ed.). Pearson.
25. Shneiderman, B. (2022). Human-centered AI. Oxford University Press.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Journal of Advanced Artificial Intelligence Research

This work is licensed under a Creative Commons Attribution 4.0 International License.
This article is published under the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.