Author Identifier (ORCID)

Liang Qu: https://orcid.org/0000-0002-2755-7592

Jianxin Li: https://orcid.org/0000-0002-9059-330X

Abstract

Recommender systems are widely applied in numerous online platforms such as shopping and social media platforms. They typically utilize large embedding tables that map users and items to dense vectors of uniform sizes. As the number of users and items continues to grow, this design leads to significant memory consumption and computational inefficiencies. This challenge is particularly pronounced in scenarios such as federated learning, where model parameters are updated locally on edge devices with limited computational resources before being transmitted to a central server for aggregation. Numerous approaches have been proposed to address this issue, among which embedding pruning methods have emerged as a compelling solution. Compared to parameter-sharing and variable-size embedding techniques, embedding pruning methods offer lower training costs and leverage sparse embeddings for improved efficiency. Notably, embedding pruning methods based on the Dynamic Sparse Training (DST) paradigm maintain consistent sparsity throughout training and provide a controllable memory budget, establishing them as state-of-the-art lightweight embedding solutions for resource-constrained environments. However, embedding pruning methods are not without limitations. First, despite the use of sparse embeddings during forward passes, dense gradients are still computed in backward passes, introducing inefficiencies. Second, DST’s weight exploration mechanism tends to prioritize users or items from the most recent batch, reactivating pruned parameters that do not necessarily enhance overall performance. In this work, we introduce SparseRec, a lightweight embedding method designed to overcome these obstacles. SparseRec accumulates gradients to better identify inactive parameters that, when reactivated, contribute more meaningfully to model performance. Additionally, SparseRec avoids dense gradient computation during backpropagation by selectively sampling key vectors. Gradients are calculated only for parameters in this subset, ensuring sparsity throughout both forward and backward passes. Experiments on three benchmark datasets show that SparseRec achieves up to 11.79% performance gains across three base recommenders and multiple density configurations, highlighting its effectiveness in optimizing memory-constrained recommendation systems.

Document Type

Journal Article

Date of Publication

1-1-2026

Publication Title

Data Science and Engineering

Publisher

Springer

School

School of Business and Law

Funders

Australian Research Council

Grant Number

ARC Numbers : FT210100624, DE230101033, DP2401011081, DP240101814, LP230200892, LP240200546

Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Comments

Qu, Y., Qu, L., Chen, T., Zhao, X., Li, J., & Yin, H. (2026). Sparse gradient training for recommender systems. Data Science and Engineering. Advance online publication. https://doi.org/10.1007/s41019-025-00327-5

Share

 
COinS
 

Link to publisher version (DOI)

10.1007/s41019-025-00327-5