PTSR: A unified Patch Tokenization, Selection and Representation framework for efficient micro-expression recognition

Author Identifier (ORCID)

Kun Hu: https://orcid.org/0000-0002-6891-8059

Abstract

Micro-expression recognition is a challenging task of identifying hidden emotion, as micro-expressions have brief durations and involve small-scale facial muscle movements. Although deep learning-based methods, especially transformer-based methods, have achieved impressive performance in this task, these methods exhibit high computational complexity and struggle to learn effective representations in the context of typically small-scale micro-expression datasets, due to the excess of tokens in the multi-head self-attention. Moreover, most existing methods do not differentiate the importance of local features, especially in micro-expression recognition with subtle changes. Therefore, we propose a novel unified Patch Tokenization, Selection and Representation framework (PTSR) with vision Transformer for micro-expression recognition. Specifically, PTSR first presents a dual norm shifted patch tokenization (DNSPT) module to learn spatial relations between neighboring pixels of the face region, which is implemented by elaborating spatial transformation and dual norm projection. Then, we employ a local-global attention module (LAM) to extract the local-global image feature, incorporating a dynamic token selection module (DTSM) to select important patches/tokens, thereby capturing more discriminative representations for the input clip. Extensive experiments are conducted on 4 widely used public datasets, i.e., CASME II, SAMM, SMIC, CAS(ME)3, and the experimental results indicate that our method can achieve clear performance improvements over the state-of-the-art methods, such as 8.37% improvement on the CAS(ME)3 dataset in terms of UF1 and 3.1% improvement on the SMIC dataset in terms of UAR metric.

Document Type

Conference Proceeding

Date of Publication

6-30-2025

Publisher

Association for Computing Machinery

School

School of Science

Funders

Fundamental Research Funds for the Central Universities (D5000250044, D5000250060) / National Natural Science Foundation of China (62201460, 62302093), Basic Research Programs of Taicang (TC2023JC22) / Jiangsu Province Natural Science Fund (BK20230833) / Big Data Computing Center of Southeast University / Edith Cowan University

Comments

Fu, L., Wang, J., Jin, Q., Zhu, Y., Wang, H., Li, Y., Wu, X., & Hu, K. (2025). PTSR: A unified Patch Tokenization, Selection and Representation framework for efficient micro-expression recognition. In Proceedings of the 2025 International Conference on Multimedia Retrieval (pp. 312-320). Association for Computing Machinery. https://doi.org/10.1145/3731715.3733415

Copyright

subscription content

Share

 
COinS