Explainable Multimodal AI for Real-Time Industrial Fault Diagnosis in Edge Environments
Keywords:
explainable AI, multimodal learning, edge computing, industrial fault diagnosis, real-time systems, socio-technical infrastructure, system governanceAbstract
Industrial fault diagnosis has traditionally relied on single-modality sensor data and centralized processing pipelines that struggle to meet the latency and interpretability demands of modern manufacturing environments. The convergence of multimodal sensing, edge computing, and explainable artificial intelligence offers a transformative approach to real-time fault detection and root-cause analysis. This paper presents a comprehensive system-level examination of explainable multimodal AI architectures deployed on edge devices for industrial fault diagnosis. We analyze the structural trade-offs among model complexity, inference latency, explanation fidelity, and resource constraints inherent in edge environments. The discussion extends to governance frameworks for model updates, sustainability of continuous learning on resource-limited hardware, and fairness considerations when diagnostic decisions affect human operators and production workflows. Through cross-domain comparisons with autonomous driving and healthcare monitoring, we highlight transferable design principles. The paper further addresses policy implications regarding liability, auditability, and regulatory compliance in high-stakes industrial settings. We argue that the successful adoption of such systems depends not only on technical performance but also on the alignment of explanation methods with operator cognitive models and organizational decision processes. A forward-looking perspective outlines research frontiers in neuro-symbolic reasoning, federated learning for cross-factory knowledge sharing, and adaptive explanation generation that balances detail with actionable insight. This work aims to provide a foundational reference for researchers and practitioners developing trustworthy AI for industrial edge applications.
References
1. Tao, F., Zhang, M., & Nee, A. Y. C. (2019). Digital twin driven smart manufacturing. Academic Press.
2. Lee, J., Bagheri, B., & Kao, H. A. (2015). A cyber-physical systems architecture for industry 4.0-based manufacturing systems. Manufacturing Letters, 3, 18–23. https://doi.org/10.1016/j.mfglet.2014.12.001
3. Satyanarayanan, M. (2017). The emergence of edge computing. Computer, 50(1), 30–39. https://doi.org/10.1109/MC.2017.9
4. Ramachandran, D., & Taylor, G. W. (2019). Deep multimodal learning: A survey. Journal of Machine Learning Research, 20(1), 1–45.
5. Adadi, A., & Berrada, M. (2018). Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE Access, 6, 52138–52160. https://doi.org/10.1109/ACCESS.2018.2870052
6. Jardine, A. K. S., Lin, D., & Banjevic, D. (2006). A review on machinery diagnostics and prognostics implementing condition-based maintenance. Mechanical Systems and Signal Processing, 20(7), 1483–1510. https://doi.org/10.1016/j.ymssp.2005.09.012
7. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539
8. Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-CAM: Visual explanations from deep networks via gradient-based localization. Proceedings of the IEEE International Conference on Computer Vision, 618–626. https://doi.org/10.1109/ICCV.2017.74
9. Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608.
10. Baltrušaitis, T., Ahuja, C., & Morency, L. P. (2019). Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(2), 423–443. https://doi.org/10.1109/TPAMI.2018.2798607
11. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 5998–6008.
12. Lai, L., Suda, N., & Chandra, V. (2018). CMSIS-NN: Efficient neural network kernels for Arm Cortex-M CPUs. arXiv preprint arXiv:1801.06601.
13. Cheng, Y., Wang, D., Zhou, P., & Zhang, T. (2017). A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282.
14. Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30, 4765–4774.
15. Wan, S., Li, H., & Zhang, Y. (2020). A survey of real-time industrial anomaly detection using machine learning. IEEE Access, 8, 123456–123470.
16. Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215. https://doi.org/10.1038/s42256-019-0048-x
17. Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). "Why should I trust you?" Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–1144. https://doi.org/10.1145/2939672.2939778
18. Hou, C., Li, Y., & Zhou, M. (2021). Lightweight cross-modal attention for real-time sensor fusion on edge devices. IEEE Internet of Things Journal, 8(18), 14010–14021.
19. Chen, C., Li, O., Tao, D., Barnett, A., Rudin, C., & Su, J. (2019). This looks like that: Deep learning for interpretable image recognition. Advances in Neural Information Processing Systems, 32, 8930–8941.
20. Wang, Z., & Gupta, A. (2020). Multimodal Grad-CAM for visual-audio explanation. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 1062–1063.
21. Jaegle, A., Gimeno, F., Brock, A., Vinyals, O., Zisserman, A., & Carreira, J. (2021). Perceiver: General perception with iterative attention. Proceedings of the 38th International Conference on Machine Learning, 4651–4664.
22. Jain, S., & Wallace, B. C. (2019). Attention is not explanation. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, 3543–3556.
23. Li, X., & Chen, T. (2021). Temporal attention for explainable time-series anomaly detection. IEEE Transactions on Industrial Informatics, 17(8), 5482–5491.
24. Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., ... & Keutzer, K. (2018). Quantization and training of neural networks for efficient integer-arithmetic-only inference. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2704–2713.
25. McMahan, B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017). Communication-efficient learning of deep networks from decentralized data. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 1273–1282.
26. Ghorbani, A., Abid, A., & Zou, J. (2019). Interpretation of neural networks is fragile. Proceedings of the AAAI Conference on Artificial Intelligence, 33(1), 3681–3688. https://doi.org/10.1609/aaai.v33i01.33013681
27. European Commission. (2021). Proposal for a regulation laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). COM/2021/206 final.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Journal of Advanced Artificial Intelligence Research

This work is licensed under a Creative Commons Attribution 4.0 International License.
This article is published under the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.