Research outputs 2022 to 2026

SFAN: Selective Filter and Alignment Network for cross-modal retrieval

Yongle Huang
Zedong Liu
Shijie Sun
Ningning Cui
Jianxin Li, Edith Cowan UniversityFollow

Author Identifier (ORCID)

Jianxin Li: https://orcid.org/0000-0002-9059-330X

Abstract

Bridging the gap between visual and textual modalities effectively has consistently been a key challenge in cross-modal retrieval. Fine-grained matching approaches improve performance by precisely aligning salient region features in visual modality with word embeddings in textual modality. However, how to effectively and efficiently filter out irrelevant features (e.g., irrelevant background regions and nonmeaningful prepositions) in multimodality remains a significant challenge. Furthermore, capturing key cross-modal relationships while minimizing misalignment interference is crucial for effective cross-modal retrieval. In this work, we propose a novel approach called the selective filter and alignment network (SFAN) to tackle these challenges. First, we propose modality-specific selective filter modules (SFMs) to selectively and implicitly filter out redundant information within each modality. We then propose the state-space models (SSMs)-based selective alignment module (SAM) to selectively capture key correspondences and reduce the disturbance of irrelevant associations. Finally, we utilize a fusion operation to combine these embeddings from both SFM and SAM to derive the final embeddings for similarity computation. Extensive experiments on the Flickr30k, MS-COCO, and MSR-VTT datasets reveal that our proposed SFAN can effectively learn robust patterns, significantly outperforming the state-of-the-art (SOTA) cross-modal retrieval methods by a wide margin.

Document Type

Journal Article

Date of Publication

1-1-2025

Volume

Issue

Publication Title

IEEE Transactions on Neural Networks and Learning Systems

Publisher

IEEE

School

School of Business and Law

Funders

National Key Research and Development Program of China (2023YFB4301800) / Natural Science Foundation of Shaanxi Province General Program Project (2025JC-YBMS-673) / New Generation Information Technology Innovation Project (2023IT080) / Basic Scientific Research Funds of Central Universities (300102404101)

Comments

Huang, Y., Liu, Z., Sun, S., Cui, N., & Li, J. (2025). SFAN: Selective Filter and Alignment Network for cross-modal retrieval. IEEE Transactions on Neural Networks and Learning Systems, 36(10), 18792-18804. https://doi.org/10.1109/TNNLS.2025.3577292

Copyright

subscription content

First Page

18792

Last Page

18804

Link to Full Text

COinS

Link to publisher version (DOI)

10.1109/TNNLS.2025.3577292

Research outputs 2022 to 2026

SFAN: Selective Filter and Alignment Network for cross-modal retrieval

Author Identifier (ORCID)

Abstract

Document Type

Date of Publication

Volume

Issue

Publication Title

Publisher

School

Funders

Comments

Copyright

First Page

Last Page

Link to publisher version (DOI)

Search

Links

Browse

Author Information

Article Locations

Research outputs 2022 to 2026

SFAN: Selective Filter and Alignment Network for cross-modal retrieval

Authors/Creators

Author Identifier (ORCID)

Abstract

Document Type

Date of Publication

Volume

Issue

Publication Title

Publisher

School

Funders

Comments

Copyright

First Page

Last Page

Share

Link to publisher version (DOI)

Search

Links

Browse

Author Information

Article Locations