Hierarchical Multi-Scale Attention Networks for Integrating Hyperspectral, LiDAR, and Camera Data in Smart Cities
Keywords:
smart cities, multi-modal data fusion, hyperspectral imaging, LiDAR, attention mechanisms, hierarchical networks, urban infrastructure, fairness, governance, sustainabilityAbstract
The convergence of hyperspectral imaging, Light Detection and Ranging (LiDAR), and conventional camera data presents unprecedented opportunities for smart city applications, including environmental monitoring, infrastructure assessment, and urban planning. However, integrating these heterogeneous modalities remains challenging due to disparities in spatial resolution, spectral dimensionality, and geometric representation. This paper proposes a hierarchical multi-scale attention network architecture that systematically fuses hyperspectral, LiDAR, and camera data at multiple abstraction levels. The framework employs dedicated encoders for each modality, followed by cross-attention modules that align features across scales and sensor domains. A hierarchical aggregation mechanism then integrates local, regional, and global contextual cues, enabling robust urban feature extraction. Beyond technical design, the paper critically examines structural trade-offs between computational efficiency and model capacity, between centralised cloud processing and distributed edge deployment, and between interpretability and predictive accuracy. Governance and policy considerations are addressed, including data ownership, privacy preservation, and equitable sensor coverage across socioeconomically diverse urban zones. Sustainability aspects such as energy consumption during inference and sensor life-cycle management are analysed. The proposed architecture is positioned within the broader landscape of large-scale socio-technical systems, where robustness, fairness, and long-term maintainability are as important as algorithmic performance. Case illustrations from recent fusion benchmarks and real-world pilot deployments underscore the practical challenges and opportunities. By foregrounding system-level reasoning, this paper provides a comprehensive framework for designing trustworthy and scalable multi-modal sensing infrastructures in smart cities.
References
1. Bioucas-Dias, J. M., Plaza, A., Camps-Valls, G., Scheunders, P., Nasrabadi, N., & Chanussot, J. (2013). Hyperspectral remote sensing data analysis and future challenges. IEEE Geoscience and Remote Sensing Magazine, 1(2), 6–36.
2. Shan, J., & Toth, C. K. (2008). Topographic laser ranging and scanning: Principles and processing. CRC Press.
3. Ghamisi, P., Maggiori, E., Li, S., Souza, R., Tarabalka, Y., Moser, G., & Benediktsson, J. A. (2019). New frontiers in spectral-spatial hyperspectral image classification: The latest advances based on mathematical morphology and Markov random fields. IEEE Geoscience and Remote Sensing Magazine, 7(1), 10–43.
4. Hu, J., Shi, J., Zhao, Q., & Li, X. (2022). Multi-source remote sensing data fusion: A review. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 15, 10140–10158.
5. Xiong, Z., Song, Y., He, L., Xiong, W., Yuan, Y., Qiao, F., & Jacobs, N. (2026). PhysAlign: Physics-Coherent Image-to-Video Generation through Feature and 3D Representation Alignment. arXiv preprint arXiv:2603.13770.
6. Dalponte, M., Bruzzone, L., & Gianelle, D. (2008). Fusion of hyperspectral and LIDAR remote sensing data for classification of complex forest areas. IEEE Transactions on Geoscience and Remote Sensing, 46(5), 1416–1427.
7. Zhao, W., Du, L., & Zhang, B. (2021). Deep learning for hyperspectral and LiDAR data fusion: A review. IEEE Geoscience and Remote Sensing Letters, 18(3), 429–433.
8. Hong, D., Gao, L., Yao, J., Zhang, B., Plaza, A., & Chanussot, J. (2021). Spectral-spatial transformer for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 59(12), 10350–10363.
9. Wang, X., Girshick, R., Gupta, A., & He, K. (2018). Non-local neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7794–7803.
10. Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid scene parsing network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2881–2890.
11. Kyriazis, D., Varvarigou, T., & Romanovs, A. (2016). Smart cities: A survey on technologies, trends and open issues. IEEE Communications Surveys & Tutorials, 18(4), 2676–2712.
12. Cavoukian, A., & Dix, A. (2013). Privacy in smart cities: A Canadian perspective. Journal of Law, Information and Science, 22(1), 1–20.
13. Crawford, K., & Joler, V. (2018). Anatomy of an AI system: The Amazon Echo as a case study of technological entanglement. AI Now Institute.
14. Qi, C. R., Yi, L., Su, H., & Guibas, L. J. (2017). PointNet++: Deep hierarchical feature learning on point sets in a metric space. Advances in Neural Information Processing Systems, 30, 5099–5108.
15. Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2117–2125.
16. Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.
17. Yang, J. X., Wang, J., Li, Z., Sui, C., Long, Z., & Zhou, J. (2025). HSLiNets: Evaluating Band Ordering Strategies in Hyperspectral and LiDAR Fusion. IEEE Geoscience and Remote Sensing Letters.
18. Michel, P., Levy, O., & Neubig, G. (2019). Are sixteen heads really better than one? Advances in Neural Information Processing Systems, 32, 14014–14024.
19. Ma, M., Fan, J., & Tian, Q. (2020). Modality dropout for robust multimodal learning. Proceedings of the AAAI Conference on Artificial Intelligence, 34(7), 11949–11956.
20. Hilbert, M., & López, P. (2011). The world’s technological capacity to store, communicate, and compute information. Science, 332(6025), 60–65.
21. Bonawitz, K., et al. (2017). Practical secure aggregation for privacy-preserving machine learning. Proceedings of the ACM Conference on Computer and Communications Security, 1175–1191.
22. Chouldechova, A., & Roth, A. (2020). A snapshot of the frontiers of fairness in machine learning. Communications of the ACM, 63(5), 82–89.
23. McMahan, B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017). Communication-efficient learning of deep networks from decentralized data. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 1273–1282.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Journal of Advanced Artificial Intelligence Research

This work is licensed under a Creative Commons Attribution 4.0 International License.
This article is published under the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.