Research outputs 2022 to 2026

A hierarchical reinforcement learning method based on decision frequency and internal reward mechanism

Author Identifier (ORCID)

Jianxin Li: https://orcid.org/0000-0002-9059-330X

Abstract

Wargames require participants to make real-time reasoning and decisions based on complex battlefield environment changes, facing severe challenges brought by large-scale decision spaces and rapidly changing battlefield situations. In recent years, many reinforcement learning algorithms have been continuously applied to wargames to simulate confrontational situations. However, existing methods have not yet provided satisfactory solutions for wargames and have certain limitations. In this context, we propose a hierarchical decision reinforcement learning algorithm based on decision frequency and internal reward mechanism. The algorithm divides decisions into three layers, decomposes complex tasks to reduce decision complexity, and compresses the action space to accelerate convergence. In addition, the decision frequency and internal reward mechanism are introduced to improve decision stability and guide learning. To verify the performance of the algorithm, experiments are designed for wargame scenarios. The experimental results show that the hierarchical structure enables the agent to effectively learn strategies and accelerate convergence.

Keywords

Decision frequency, Internal reward mechanism, reinforcement learning, wargames

Document Type

Conference Proceeding

Date of Publication

1-1-2026

Volume

16115 LNCS

Publication Title

Lecture Notes in Computer Science

Publisher

Springer

School

School of Business and Law

Funders

China University of Geosciences (2023080)

Comments

Wang, C., Li, M., Wang, Y., Zhang, X., Huang, X., Li, B., Li, J., & Chen, Y. (2026). A hierarchical reinforcement learning method based on decision frequency and internal reward mechanism. Lecture Notes in Computer Science, 16115, 130–145. https://doi.org/10.1007/978-981-95-5719-6_9

Copyright

subscription content

First Page

130

Last Page

145

Link to Full Text

COinS

Link to publisher version (DOI)

10.1007/978-981-95-5719-6_9

Research outputs 2022 to 2026

A hierarchical reinforcement learning method based on decision frequency and internal reward mechanism

Author Identifier (ORCID)

Abstract

Keywords

Document Type

Date of Publication

Volume

Publication Title

Publisher

School

Funders

Comments

Copyright

First Page

Last Page

Link to publisher version (DOI)

Search

Links

Browse

Author Information

Article Locations

Research outputs 2022 to 2026

A hierarchical reinforcement learning method based on decision frequency and internal reward mechanism

Authors/Creators

Author Identifier (ORCID)

Abstract

Keywords

Document Type

Date of Publication

Volume

Publication Title

Publisher

School

Funders

Comments

Copyright

First Page

Last Page

Share

Link to publisher version (DOI)

Search

Links

Browse

Author Information

Article Locations