Author Identifier
Document Type
Journal Article
Publication Title
Neurocomputing
Volume
637
Publisher
Elsevier
School
School of Science
Publication Unique Identifier
10.1016/j.neucom.2025.130077
RAS ID
78501
Abstract
Sign languages are the primary languages of the deaf community as well as hearing individuals who are unable to speak, which engage the visual-manual modality to convey meanings. In recent years, there has been an explosive growth of sign language videos available from video streaming and social media service platforms. Given the size of these corpora, sign language users often face significant challenges in effectively acquiring the information they need. Therefore, we propose a novel deep learning architecture, namely Graph Traverse Reference Network (GTRN), allowing visual signing queries to retrieve relevant sign language videos (documents) from a large corpus. GTRN introduces a traverse graph, which provides coarse-to-fine reference information in a hierarchical manner from frame-level to body-part-level observations. A reference-based attention is devised to obtain the embedding for a visual input of each level, which allows the computations to be allocated and processed at difference locations regarding local devices and central servers. A contrastive learning strategy optimizes GTRN in pursuit of a joint latent space for the queries and the documents by their meanings. Moreover, GTRN is compatible to leverage existing general visual representation foundation models, by which their resulted embeddings are used as the frame-level reference of GTRN. To the best of our knowledge, it is one of the first studies on using visual signing queries for retrieving sign language videos in a real-world setting and comprehensive experiments were conducted which demonstrated the effectiveness of our proposed method.
DOI
10.1016/j.neucom.2025.130077
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Comments
Hu, K., He, F., Schembri, A., & Wang, Z. (2025). Graph traverse reference network for sign language corpus retrieval in the wild. Neurocomputing, 637, 130077. https://doi.org/10.1016/j.neucom.2025.130077