Research outputs 2022 to 2026

Towards robust multimodal land use classification: A convolutional embedded transformer

Muhammad Zia Ur Rehman, Edith Cowan UniversityFollow
Syed Mohammed Shamsul Islam, Edith Cowan UniversityFollow
Anwaar Ulhaq
David Blake, Edith Cowan UniversityFollow
Naeem Janjua

Author Identifier (ORCID)

Muhammad Zia Ur Rehman: https://orcid.org/0000-0001-9531-1941

Syed Mohammed Shamsul Islam: https://orcid.org/0000-0002-3200-2903

David Blake: https://orcid.org/0000-0003-3747-2960

Abstract

Multisource remote sensing data has gained significant attention in land use classification. However, effectively extracting both local and global features from various modalities and fusing them to leverage their complementary information remains a substantial challenge. In this paper, we address this by exploring the use of transformers for simultaneous local and global feature extraction while enabling cross-modality learning to improve the integration of complementary information from HSI and LiDAR data modalities. We propose a spatial feature enhancer module (SFEM) that efficiently captures features across spectral bands while preserving spatial integrity for downstream learning tasks. Building on this, we introduce a cross-modal convolutional transformer, which extracts both local and global features using a multi-scale convolutional embedded encoder (MSCE). The convolutional layers embedded in the encoder facilitate the blending of local and global features. Additionally, cross-modal learning is incorporated to effectively capture complementary information from HSI and LiDAR modalities. Evaluation on the Trento dataset highlights the effectiveness of the proposed approach, achieving an average accuracy of 99.04% and surpassing comparable methods.

Document Type

Conference Proceeding

Date of Publication

1-1-2025

Volume

Publication Title

Proceedings of the International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications

Publisher

Science and Technology Publications

School

School of Science

Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Comments

Rehman, M. Z. U., Islam, S. M. S., UlHaq, A., Blake, D., & Janjua, N. (2025). Towards robust multimodal land use classification: A convolutional embedded transformer. In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP (pp. 143-153). SciTePress. https://doi.org/10.5220/0013191300003912

First Page

143

Last Page

153

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Link to publisher version (DOI)

10.5220/0013191300003912

Research outputs 2022 to 2026

Towards robust multimodal land use classification: A convolutional embedded transformer

Author Identifier (ORCID)

Abstract

Document Type

Date of Publication

Volume

Publication Title

Publisher

School

Creative Commons License

Comments

First Page

Last Page

Included in

Link to publisher version (DOI)

Search

Links

Browse

Author Information

Article Locations

Research outputs 2022 to 2026

Towards robust multimodal land use classification: A convolutional embedded transformer

Authors/Creators

Author Identifier (ORCID)

Abstract

Document Type

Date of Publication

Volume

Publication Title

Publisher

School

Creative Commons License

Comments

First Page

Last Page

Included in

Share

Link to publisher version (DOI)

Search

Links

Browse

Author Information

Article Locations