Hierarchical Multi-Scale Attention Networks for Integrating Hyperspectral, LiDAR, and Camera Data in Smart Cities

Otis Taylor; Rowan L. Lopez; Keguo Gu; Tobias Griksson

Authors

Otis Taylor Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA. Author
Rowan L. Lopez Department of Computer Science, University of Alabama at Birmingham, Birmingham, AL, USA. Author
Keguo Gu Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS, USA. Author
Tobias Griksson Department of Computer Science, University of Central Florida, Orlando, FL, USA. Author

Keywords:

smart cities, multi-modal data fusion, hyperspectral imaging, LiDAR, attention mechanisms, hierarchical networks, urban infrastructure, fairness, governance, sustainability

Abstract

The convergence of hyperspectral imaging, Light Detection and Ranging (LiDAR), and conventional camera data presents unprecedented opportunities for smart city applications, including environmental monitoring, infrastructure assessment, and urban planning. However, integrating these heterogeneous modalities remains challenging due to disparities in spatial resolution, spectral dimensionality, and geometric representation. This paper proposes a hierarchical multi-scale attention network architecture that systematically fuses hyperspectral, LiDAR, and camera data at multiple abstraction levels. The framework employs dedicated encoders for each modality, followed by cross-attention modules that align features across scales and sensor domains. A hierarchical aggregation mechanism then integrates local, regional, and global contextual cues, enabling robust urban feature extraction. Beyond technical design, the paper critically examines structural trade-offs between computational efficiency and model capacity, between centralised cloud processing and distributed edge deployment, and between interpretability and predictive accuracy. Governance and policy considerations are addressed, including data ownership, privacy preservation, and equitable sensor coverage across socioeconomically diverse urban zones. Sustainability aspects such as energy consumption during inference and sensor life-cycle management are analysed. The proposed architecture is positioned within the broader landscape of large-scale socio-technical systems, where robustness, fairness, and long-term maintainability are as important as algorithmic performance. Case illustrations from recent fusion benchmarks and real-world pilot deployments underscore the practical challenges and opportunities. By foregrounding system-level reasoning, this paper provides a comprehensive framework for designing trustworthy and scalable multi-modal sensing infrastructures in smart cities.

References

1. Bioucas-Dias, J. M., Plaza, A., Camps-Valls, G., Scheunders, P., Nasrabadi, N., & Chanussot, J. (2013). Hyperspectral remote sensing data analysis and future challenges. IEEE Geoscience and Remote Sensing Magazine, 1(2), 6–36.

2. Shan, J., & Toth, C. K. (2008). Topographic laser ranging and scanning: Principles and processing. CRC Press.

3. Ghamisi, P., Maggiori, E., Li, S., Souza, R., Tarabalka, Y., Moser, G., & Benediktsson, J. A. (2019). New frontiers in spectral-spatial hyperspectral image classification: The latest advances based on mathematical morphology and Markov random fields. IEEE Geoscience and Remote Sensing Magazine, 7(1), 10–43.

4. Hu, J., Shi, J., Zhao, Q., & Li, X. (2022). Multi-source remote sensing data fusion: A review. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 15, 10140–10158.

5. Xiong, Z., Song, Y., He, L., Xiong, W., Yuan, Y., Qiao, F., & Jacobs, N. (2026). PhysAlign: Physics-Coherent Image-to-Video Generation through Feature and 3D Representation Alignment. arXiv preprint arXiv:2603.13770.

6. Dalponte, M., Bruzzone, L., & Gianelle, D. (2008). Fusion of hyperspectral and LIDAR remote sensing data for classification of complex forest areas. IEEE Transactions on Geoscience and Remote Sensing, 46(5), 1416–1427.

7. Zhao, W., Du, L., & Zhang, B. (2021). Deep learning for hyperspectral and LiDAR data fusion: A review. IEEE Geoscience and Remote Sensing Letters, 18(3), 429–433.

8. Hong, D., Gao, L., Yao, J., Zhang, B., Plaza, A., & Chanussot, J. (2021). Spectral-spatial transformer for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 59(12), 10350–10363.

9. Wang, X., Girshick, R., Gupta, A., & He, K. (2018). Non-local neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7794–7803.

10. Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid scene parsing network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2881–2890.

11. Kyriazis, D., Varvarigou, T., & Romanovs, A. (2016). Smart cities: A survey on technologies, trends and open issues. IEEE Communications Surveys & Tutorials, 18(4), 2676–2712.

12. Cavoukian, A., & Dix, A. (2013). Privacy in smart cities: A Canadian perspective. Journal of Law, Information and Science, 22(1), 1–20.

13. Crawford, K., & Joler, V. (2018). Anatomy of an AI system: The Amazon Echo as a case study of technological entanglement. AI Now Institute.

14. Qi, C. R., Yi, L., Su, H., & Guibas, L. J. (2017). PointNet++: Deep hierarchical feature learning on point sets in a metric space. Advances in Neural Information Processing Systems, 30, 5099–5108.

15. Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2117–2125.

16. Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.

17. Yang, J. X., Wang, J., Li, Z., Sui, C., Long, Z., & Zhou, J. (2025). HSLiNets: Evaluating Band Ordering Strategies in Hyperspectral and LiDAR Fusion. IEEE Geoscience and Remote Sensing Letters.

18. Michel, P., Levy, O., & Neubig, G. (2019). Are sixteen heads really better than one? Advances in Neural Information Processing Systems, 32, 14014–14024.

19. Ma, M., Fan, J., & Tian, Q. (2020). Modality dropout for robust multimodal learning. Proceedings of the AAAI Conference on Artificial Intelligence, 34(7), 11949–11956.

20. Hilbert, M., & López, P. (2011). The world’s technological capacity to store, communicate, and compute information. Science, 332(6025), 60–65.

21. Bonawitz, K., et al. (2017). Practical secure aggregation for privacy-preserving machine learning. Proceedings of the ACM Conference on Computer and Communications Security, 1175–1191.

22. Chouldechova, A., & Roth, A. (2020). A snapshot of the frontiers of fairness in machine learning. Communications of the ACM, 63(5), 82–89.

23. McMahan, B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017). Communication-efficient learning of deep networks from decentralized data. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 1273–1282.

Hierarchical Multi-Scale Attention Networks for Integrating Hyperspectral, LiDAR, and Camera Data in Smart Cities

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Journal Information

Latest publications

Make a Submission

Information