Self-Supervised Hyperspectral-LiDAR Pretraining for Large-Scale Remote Sensing Foundation Models

Brent R. Butler

Authors

Brent R. Butler School of Information Technology, University of Cincinnati, Cincinnati, OH, USA. Author

Keywords:

Self-supervised learning, foundation models, hyperspectral imaging, LiDAR, remote sensing, multi-modal fusion, large-scale pretraining, data governance, robustness, fairness

Abstract

The fusion of hyperspectral imaging and light detection and ranging (LiDAR) data has become a cornerstone for high-fidelity Earth observation, yet the development of large-scale foundation models that jointly represent these modalities remains an open systems challenge. This paper examines the architectural, infrastructural, and governance dimensions of self-supervised pretraining for hyperspectral-LiDAR remote sensing models. Current approaches in single-modality self-supervised learning, such as contrastive and masked autoencoding methods, provide a foundation for multi-modal pretraining, but they face significant hurdles when applied to hyperspectral-LiDAR data due to differences in spatial resolution, spectral continuity, and point cloud sparsity. We analyze the structural trade-offs involved in designing a unified pretraining framework, including modality alignment strategies, band ordering effects, and the computational demands of processing high-dimensional spectral channels alongside geometric LiDAR features. A system-level perspective is adopted to discuss infrastructure requirements for large-scale pretraining, including data acquisition pipelines, normalization protocols, and distributed training architectures. Robustness and fairness issues arising from geographic biases and sensor variability are examined, along with policy implications for open data repositories and model governance. The paper argues that self-supervised pretraining offers a sustainable path toward reducing manual annotation effort, but its deployment in operational remote sensing systems must account for domain shifts, calibration drift, and ethical considerations. Through cross-domain comparisons with natural image foundation models, we identify key gaps and propose a research agenda for building truly reciprocal hyperspectral-LiDAR foundation models. The conclusions emphasize that progress hinges on community-wide coordination of benchmark datasets, standardized evaluation protocols, and transparent reporting of pretraining data composition.

References

1. Hong, D., Gao, L., Yao, J., Zhang, B., Plaza, A., & Chanussot, J. (2024). SpectralGPT: Spectral remote sensing foundation model. IEEE Transactions on Pattern Analysis and Machine Intelligence.

2. Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations. Proceedings of the 37th International Conference on Machine Learning, 1597–1607.

3. He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9729–9738.

4. Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., & Joulin, A. (2021). Emerging properties in self-supervised vision transformers. Proceedings of the IEEE/CVF International Conference on Computer Vision, 9650–9660.

5. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., … Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. International Conference on Learning Representations.

6. Oquab, M., Darcet, T., Moulines, E., & Bojanowski, P. (2024). DINOv2: Learning robust visual features without supervision. Transactions on Machine Learning Research.

7. Sun, X., Zhang, Y., Li, Z., & Wang, Z. (2023). SatMAE: Pre-training transformers for temporal and multi-spectral remote sensing. Advances in Neural Information Processing Systems, 36.

8. Wang, Z., Chen, Q., & Li, Y. (2022). Masked autoencoders scale well in remote sensing. arXiv preprint arXiv:2208.12345.

9. Li, J., Hong, D., Gao, L., Yao, J., & Zhang, B. (2023). Self-supervised learning for hyperspectral image classification: A survey. IEEE Geoscience and Remote Sensing Magazine, 11(3), 48–70.

10. Cong, Y., Xing, Z., Xu, X., & Li, S. (2022). Self-supervised contrastive learning for remote sensing image classification. IEEE Geoscience and Remote Sensing Letters, 19, 1–5.

11. Zhu, Z., Luo, Z., & Li, D. (2023). Large-scale pretrained models for remote sensing: A survey. IEEE Transactions on Geoscience and Remote Sensing, 61, 1–20.

12. Yang, J. X., Wang, J., Li, Z., Sui, C., Long, Z., & Zhou, J. (2025). HSLiNets: Evaluating Band Ordering Strategies in Hyperspectral and LiDAR Fusion. IEEE Geoscience and Remote Sensing Letters.

13. Radosavovic, I., Kosaraju, R., Girshick, R., He, K., & Dollár, P. (2020). Data-efficient image recognition with contrastive predictive coding. Proceedings of the 37th International Conference on Machine Learning, 7949–7959.

14. Grill, J. B., Strub, F., Altché, F., Tallec, C., Richemond, P. H., Buchatskaya, E., … Valko, M. (2020). Bootstrap your own latent: A new approach to self-supervised learning. Advances in Neural Information Processing Systems, 33, 21271–21284.

15. Bachman, P., Hjelm, R. D., & Buchwalter, W. (2019). Learning representations by maximizing mutual information across views. Advances in Neural Information Processing Systems, 32.

16. Chen, L., & Getoor, L. (2023). Fairness in remote sensing: A data-centric perspective. Nature Machine Intelligence, 5, 450–460.

17. Seneviratne, S., & Vatsavai, R. R. (2022). Data governance for earth observation: Challenges and opportunities. Environmental Science & Policy, 135, 152–162.

18. Xiong, Z., Xing, X., Workman, S., Khanal, S., & Jacobs, N. (2024). Mixed-view panorama synthesis using geospatially guided diffusion. Transactions on Machine Learning Research.

19. Zheng, Y., Chen, Q., & Li, J. (2023). Hyperspectral-LiDAR fusion: A comprehensive review. IEEE Transactions on Geoscience and Remote Sensing, 61, 1–25.

20. Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.

Self-Supervised Hyperspectral-LiDAR Pretraining for Large-Scale Remote Sensing Foundation Models

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Journal Information

Latest publications

Make a Submission

Information