Explainable Multimodal AI for Real-Time Industrial Fault Diagnosis in Edge Environments

Authors

  • Sunil Chatterjee School of Information Technology, University of Cincinnati, Cincinnati, OH, USA. Author
  • Claudio B. Fernandez Department of Computer Science, University of New Hampshire, Durham, NH, USA. Author
  • Thomas Aay Department of Computer Science, University of Central Florida, Orlando, FL, USA. Author
  • Kunran Peng Department of Computer Science, Colorado State University, Fort Collins, CO, USA. Author

Keywords:

explainable AI, multimodal learning, edge computing, industrial fault diagnosis, real-time systems, socio-technical infrastructure, system governance

Abstract

Industrial fault diagnosis has traditionally relied on single-modality sensor data and centralized processing pipelines that struggle to meet the latency and interpretability demands of modern manufacturing environments. The convergence of multimodal sensing, edge computing, and explainable artificial intelligence offers a transformative approach to real-time fault detection and root-cause analysis. This paper presents a comprehensive system-level examination of explainable multimodal AI architectures deployed on edge devices for industrial fault diagnosis. We analyze the structural trade-offs among model complexity, inference latency, explanation fidelity, and resource constraints inherent in edge environments. The discussion extends to governance frameworks for model updates, sustainability of continuous learning on resource-limited hardware, and fairness considerations when diagnostic decisions affect human operators and production workflows. Through cross-domain comparisons with autonomous driving and healthcare monitoring, we highlight transferable design principles. The paper further addresses policy implications regarding liability, auditability, and regulatory compliance in high-stakes industrial settings. We argue that the successful adoption of such systems depends not only on technical performance but also on the alignment of explanation methods with operator cognitive models and organizational decision processes. A forward-looking perspective outlines research frontiers in neuro-symbolic reasoning, federated learning for cross-factory knowledge sharing, and adaptive explanation generation that balances detail with actionable insight. This work aims to provide a foundational reference for researchers and practitioners developing trustworthy AI for industrial edge applications.

References

1. Tao, F., Zhang, M., & Nee, A. Y. C. (2019). Digital twin driven smart manufacturing. Academic Press.

2. Lee, J., Bagheri, B., & Kao, H. A. (2015). A cyber-physical systems architecture for industry 4.0-based manufacturing systems. Manufacturing Letters, 3, 18–23. https://doi.org/10.1016/j.mfglet.2014.12.001

3. Satyanarayanan, M. (2017). The emergence of edge computing. Computer, 50(1), 30–39. https://doi.org/10.1109/MC.2017.9

4. Ramachandran, D., & Taylor, G. W. (2019). Deep multimodal learning: A survey. Journal of Machine Learning Research, 20(1), 1–45.

5. Adadi, A., & Berrada, M. (2018). Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE Access, 6, 52138–52160. https://doi.org/10.1109/ACCESS.2018.2870052

6. Jardine, A. K. S., Lin, D., & Banjevic, D. (2006). A review on machinery diagnostics and prognostics implementing condition-based maintenance. Mechanical Systems and Signal Processing, 20(7), 1483–1510. https://doi.org/10.1016/j.ymssp.2005.09.012

7. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539

8. Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-CAM: Visual explanations from deep networks via gradient-based localization. Proceedings of the IEEE International Conference on Computer Vision, 618–626. https://doi.org/10.1109/ICCV.2017.74

9. Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608.

10. Baltrušaitis, T., Ahuja, C., & Morency, L. P. (2019). Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(2), 423–443. https://doi.org/10.1109/TPAMI.2018.2798607

11. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 5998–6008.

12. Lai, L., Suda, N., & Chandra, V. (2018). CMSIS-NN: Efficient neural network kernels for Arm Cortex-M CPUs. arXiv preprint arXiv:1801.06601.

13. Cheng, Y., Wang, D., Zhou, P., & Zhang, T. (2017). A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282.

14. Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30, 4765–4774.

15. Wan, S., Li, H., & Zhang, Y. (2020). A survey of real-time industrial anomaly detection using machine learning. IEEE Access, 8, 123456–123470.

16. Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215. https://doi.org/10.1038/s42256-019-0048-x

17. Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). "Why should I trust you?" Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–1144. https://doi.org/10.1145/2939672.2939778

18. Hou, C., Li, Y., & Zhou, M. (2021). Lightweight cross-modal attention for real-time sensor fusion on edge devices. IEEE Internet of Things Journal, 8(18), 14010–14021.

19. Chen, C., Li, O., Tao, D., Barnett, A., Rudin, C., & Su, J. (2019). This looks like that: Deep learning for interpretable image recognition. Advances in Neural Information Processing Systems, 32, 8930–8941.

20. Wang, Z., & Gupta, A. (2020). Multimodal Grad-CAM for visual-audio explanation. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 1062–1063.

21. Jaegle, A., Gimeno, F., Brock, A., Vinyals, O., Zisserman, A., & Carreira, J. (2021). Perceiver: General perception with iterative attention. Proceedings of the 38th International Conference on Machine Learning, 4651–4664.

22. Jain, S., & Wallace, B. C. (2019). Attention is not explanation. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, 3543–3556.

23. Li, X., & Chen, T. (2021). Temporal attention for explainable time-series anomaly detection. IEEE Transactions on Industrial Informatics, 17(8), 5482–5491.

24. Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., ... & Keutzer, K. (2018). Quantization and training of neural networks for efficient integer-arithmetic-only inference. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2704–2713.

25. McMahan, B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017). Communication-efficient learning of deep networks from decentralized data. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 1273–1282.

26. Ghorbani, A., Abid, A., & Zou, J. (2019). Interpretation of neural networks is fragile. Proceedings of the AAAI Conference on Artificial Intelligence, 33(1), 3681–3688. https://doi.org/10.1609/aaai.v33i01.33013681

27. European Commission. (2021). Proposal for a regulation laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). COM/2021/206 final.

Downloads

Published

2026-03-15

How to Cite

Explainable Multimodal AI for Real-Time Industrial Fault Diagnosis in Edge Environments. (2026). Journal of Advanced Artificial Intelligence Research, 5(1). https://www.jaair.org/index.php/home/article/view/9