Game-based LLM inference task offloading for edge computing system
Author Identifier (ORCID)
Abstract
Large Language Models (LLMs), with their powerful capabilities, are fundamentally transforming society. Cloud-based LLM deployment has drawbacks, including latency, lack of offline functionality, and high long-term costs. Edge computing-based LLM solutions address these issues by offloading inference tasks to edge servers. This paper formulates and optimizes an LLM inference framework that jointly accounts for inference time and predictive quality. Specifically, to address the natural tendency of selfish users for personal utility maximization, we develop a Game-theoretic Offloading Algorithm (GOALIT) to optimize LLM inference offloading. The approach enables distributed users to iteratively adjust their strategies and converge to a Nash equilibrium. Compared to the optimal tree-based search (OT-GAH), the proposed approach yields a 27% increase in token throughput and shortens inference latency by 20%, while maintaining lower perplexity under dynamic system loads. These findings confirm the effectiveness of our approach in resource-constrained edge environments.
Keywords
Edge computing, game theory, LLM inference task offloading, quantization
Document Type
Journal Article
Date of Publication
1-1-2026
Volume
10
Publication Title
IEEE Transactions on Green Communications and Networking
Publisher
IEEE
School
School of Engineering
Funders
National Natural Science Foundation of China (62202060) / Guangxi Natural Science Foundation of China (2023GXNSFAA026270) / Young Elite Scientists Sponsorship Program by Beijing Association for Science and Technology (BYESS2023311)
Copyright
subscription content
First Page
2490
Last Page
2502
Comments
Hou, S., Gan, M., Ni, W., Zhai, Z., & Liu, X. (2026). Game-based LLM inference task offloading for edge computing system. IEEE Transactions on Green Communications and Networking, 10, 2490–2502. https://doi.org/10.1109/TGCN.2026.3676823