Author Identifier (ORCID)
Muhammad Ikram: https://orcid.org/0000-0001-6559-9067
Asma Aziz: https://orcid.org/0000-0003-3538-0536
Daryoush Habibi: https://orcid.org/0000-0002-7662-6830
Abstract
The increasing penetration of inverter−based resources (IBR) in power grids necessitates novel control techniques to deliver advanced ancillary services such as fast frequency response (FFR), crucial for ensuring power system stability. Existing schemes−droop control, virtual synchronous generator (VSG), and hybrid approaches−require full observability, and centralized control, which are unsuitable for utility−scale hybrid power plant (HPP) with a distributed grid−forming inverter (GFMI). In addition, it lacks adaptive control capabilities in response to real−time disturbances according to the IEEE 2800−2022 standard. To address these challenges, this paper proposes a design of a novel multi−agent deep reinforcement learning (MADRL) framework for utility−scale IBR−dominated HPP environment. The proposed framework employs a model−free actor−critic architecture within a centralized training and decentralized execution (CTDE) mechanism, incorporating non−linear dynamics through GRU and LSTM agents. Adaptive control of droop gains, virtual inertia, damping coefficients, and FFR−sensitive control parameters is demonstrated using four MADRL algorithms: MAA2C, IPPO, MAPPO, and novel FreqMAPPO to assess agent−level and system−level frequency stability. Extensive simulations in the modified IEEE 118-bus Grid2Op dynamic environment with 80% IBR penetration are conducted to observe the frequency response following grid disturbances. The proposed framework enhances frequency stability by optimizing GFMI control dynamics to minimize frequency deviation, reduce RoCof, and enable optimal FFR contribution. In the evaluated disturbance suite, FreqMAPPO provides superior performance, achieving a 96.41% success rate, outperforming MAPPO 89.62%, IPPO 78.23%, and MAA2C 65.66% for FFR metrics. Specifically, FreqMAPPO achieves the lowest frequency deviation (0.041 Hz), RoCof (−0.479 Hz/s), and highest FFR capacity (3.41 MW), compared to MAPPO (0.078 Hz, −0.781 Hz/s, and 3.17 MW), IPPO (0.132 Hz, −0.876 Hz/s, and 2.89 MW), and MAA2C (0.164 Hz, −0.857 Hz/s, and 2.76 MW). A sensitivity analysis further validates the robustness, adaptability, and scalability of the proposed framework, showing consistent performance for 10%−20% variations in FFR parameters, highlighting its resilience in real−time frequency stabilization and optimal power dispatch capabilities under dynamic disturbing scenarios.
Keywords
fast frequency response, grid−forming inverters, hybrid power plants, inverter−based resources, multi−agent deep reinforcement learning, proximal policy optimization
Document Type
Journal Article
Date of Publication
6-1-2026
Volume
46
Publication Title
Sustainable Energy, Grids and Networks
Publisher
Elsevier
School
School of Engineering
RAS ID
94349
Funding Information
The first author would like to thank Edith Cowan University (ECU) for the award of an ECU Higher Degree by Research (ECU-HDR) scholarship to support this research project.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Comments
Ikram, M., Aziz, A., & Habibi, D. (2026). A novel multi−agent deep reinforcement learning framework for fast frequency response in inverter−based hybrid power plants. Sustainable Energy, Grids and Networks, 46, 102282. https://doi.org/10.1016/j.segan.2026.102282