Research outputs 2022 to 2026

ℵ-IPOMDP: Mitigating deception in a cognitive hierarchy with off-policy counterfactual anomaly detection

Author Identifier (ORCID)

Joseph M. Barnby: https://orcid.org/0000-0001-6002-1362

Abstract

Social agents with finitely nested opponent models are vulnerable to manipulation by agents with deeper recursive capabilities. This imbalance, rooted in logic and the theory of recursive modelling frameworks, cannot be solved directly. We propose a computational framework called ℵ-IPOMDP, which augments the Bayesian inference of model-based RL agents with an anomaly detection algorithm and an out-of-belief policy. Our mechanism allows agents to realize that they are being deceived, even if they cannot understand how, and to deter opponents via a credible threat. We test this framework in both a mixed-motive and a zero-sum game. Our results demonstrate the ℵ-mechanism’s effectiveness, leading to more equitable outcomes and less exploitation by more sophisticated agents. We discuss implications for AI safety, cybersecurity, cognitive science, and psychiatry.

Keywords

Belief revision and update, multiagent systems, reasoning about actions and change, reinforcement learning

Document Type

Journal Article

Date of Publication

1-1-2026

Volume

Publication Title

Journal of Artificial Intelligence Research

Publisher

AI Access Foundation

School

Centre for Artificial Intelligence and Machine Learning (CAIML)

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Comments

Alon, N., Barnby, J. M., Sarkadi, S., Schulz, L., Rosenschein, J. S., & Dayan, P. (2026). ℵ-IPOMDP: Mitigating deception in a cognitive hierarchy with off-policy counterfactual anomaly detection. Journal of Artificial Intelligence Research, 85. https://doi.org/10.1613/jair.1.19204

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Link to publisher version (DOI)

10.1613/jair.1.19204

Research outputs 2022 to 2026

ℵ-IPOMDP: Mitigating deception in a cognitive hierarchy with off-policy counterfactual anomaly detection

Author Identifier (ORCID)

Abstract

Keywords

Document Type

Date of Publication

Volume

Publication Title

Publisher

School

Creative Commons License

Comments

Included in

Link to publisher version (DOI)

Search

Links

Browse

Author Information

Article Locations

Research outputs 2022 to 2026

ℵ-IPOMDP: Mitigating deception in a cognitive hierarchy with off-policy counterfactual anomaly detection

Authors/Creators

Author Identifier (ORCID)

Abstract

Keywords

Document Type

Date of Publication

Volume

Publication Title

Publisher

School

Creative Commons License

Comments

Included in

Share

Link to publisher version (DOI)

Search

Links

Browse

Author Information

Article Locations