Explainable Benchmarking of LLM API Services: Predictive Quality Assessment, Failure Attribution, and Robustness Analysis Under Adversarial Conditions

Authors

  • Milos Carpenter Department of Computer Science, Binghamton University, Binghamton, NY, USA. Author
  • Leisheng Cui Department of Computer Science, University of Alabama at Birmingham, Birmingham, AL, USA. Author
  • Madhav Balhotra Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS, USA. Author

Keywords:

explainable AI, LLM API services, benchmarking, adversarial robustness, failure attribution, predictive quality assessment, socio-technical systems, governance

Abstract

The rapid proliferation of Large Language Model (LLM) application programming interfaces (APIs) as foundational components in socio-technical systems has created an urgent need for rigorous, interpretable, and operationally relevant benchmarking frameworks. Traditional performance evaluations, which emphasize aggregate metrics such as accuracy and latency, fail to capture the nuanced failure modes, predictive degradations, and adversarial vulnerabilities that characterize real-world LLM API deployments. This paper proposes an explainable benchmarking paradigm that integrates predictive quality assessment, systematic failure attribution, and adversarial robustness analysis into a unified evaluation architecture. We ground our approach in structural trade-offs between model fidelity, computational cost, and interpretability, drawing on causal inference and post-hoc explanation methods to attribute API failures to specific model components, input perturbations, or infrastructure bottlenecks. A layered governance framework is introduced to balance transparency, fairness, and sustainability in LLM API provisioning. Through cross-domain analysis of deployment scenarios in healthcare, finance, and content moderation, we illustrate how explainable benchmarking can inform policy decisions, improve system resilience, and foster accountability among API providers. The paper concludes with forward-looking recommendations for integrating predictive quality monitors, failure attribution dashboards, and adversarial testing protocols into continuous deployment pipelines, thereby advancing the reliability and trustworthiness of LLM-based services.

References

1. Liang, P., Bommasani, R., Lee, T., Tsipras, D., Soylu, D., Yasunaga, M., ... & Hashimoto, T. (2022). Holistic evaluation of language models. arXiv preprint arXiv:2211.09110.

2. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.

3. Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). "Why should I trust you?" Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135-1144.

4. Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30.

5. Carlini, N., Tramer, F., Wallace, E., Jagielski, M., Herbert-Voss, A., Lee, K., ... & Song, D. (2021). Extracting training data from large language models. 30th USENIX Security Symposium.

6. Wei, A., Haghtalab, N., & Steinhardt, J. (2023). Jailbroken: How does LLM safety training fail? Advances in Neural Information Processing Systems, 36.

7. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692.

8. Geiger, A., Potts, C., & Icard, T. (2021). Causal abstraction for interpretable language models. arXiv preprint arXiv:2101.04779.

9. Pearl, J. (2009). Causality: Models, reasoning, and inference (2nd ed.). Cambridge University Press.

10. Dou, Z., Zhao, Q., Wan, Z., Zhang, D., Wang, W., Raiyan, T., ... & Biswas, S. (2025). Plan Then Action: High-Level Planning Guidance Reinforcement Learning for LLM Reasoning. arXiv preprint arXiv:2510.01833.

11. Gao, H., Zeng, W., Zhang, J., & Liang, Y. (2025, December). A large model API response quality prediction model based on least squares vector machine and SHAP interpretability analysis. In 2025 5th International Symposium on Artificial Intelligence and Big Data (AIBDF) (pp. 438-442). IEEE.

12. Shui, Y., Jin, R., Dou, Z., & Gao, Z. (2026). ProtoGuard-SL: Prototype Consistency Based Backdoor Defense for Vertical Split Learning. arXiv preprint arXiv:2604.03595.

13. Zhou, D. (2025, December). M-VP2: Microservice-Oriented Vulnerability Patch Planning-A Cost-Aware Approachusing Multi-Agent Reinforcement Learning. In 2025 5th International Conference on Computer, Internet of Things and Control Engineering (CITCE) (pp. 248-254). IEEE.

14. Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., ... & Liang, P. (2021). On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.

15. Floridi, L., & Cowls, J. (2019). A unified framework of five principles for AI in society. Harvard Data Science Review, 1(1).

16. Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mané, D. (2016). Concrete problems in AI safety. arXiv preprint arXiv:1606.06565.

17. Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., & Steinhardt, J. (2021). Aligning AI with shared human values. arXiv preprint arXiv:2008.02275.

18. Jain, S., & Wallace, B. C. (2019). Attention is not explanation. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 3543-3556.

19. Koh, P. W., & Liang, P. (2017). Understanding black-box predictions via influence functions. Proceedings of the 34th International Conference on Machine Learning, 1885-1894.

20. Zhang, Z., Yang, J., & Qi, Y. (2024). Towards robust and efficient LLM API serving: A survey. arXiv preprint arXiv:2406.00001.

Downloads

Published

2026-05-27

How to Cite

Explainable Benchmarking of LLM API Services: Predictive Quality Assessment, Failure Attribution, and Robustness Analysis Under Adversarial Conditions. (2026). Journal of Advanced Artificial Intelligence Research, 5(1). https://www.jaair.org/index.php/home/article/view/25