bibtype J - Journal Article
ARLID 0648523
utime 20260417130701.6
mtime 20260409235959.9
SCOPUS 105033597476
WOS 001722680500001
DOI 10.1080/21642583.2026.2646376
title (primary) (eng) Robust sequential decision-making in adversarial environments
specification
page_count 21 s.
media_type E
serial
ARLID cav_un_epca*0630899
ISSN 2164-2583
title Systems Science & Control Engineering
volume_id 14
keyword dynamic programming
keyword adversarial machine learning
keyword multi-agent reinforcement learning
keyword robust reinforcement learning
keyword Bayesian reinforcement learning
author (primary)
ARLID cav_un_auth*0491463
name1 Ružejnikov
name2 Jurij
institution UTIA-B
full_dept (cz) Adaptivní systémy
full_dept (eng) Department of Adaptive Systems
department (cz) AS
department (eng) AS
country CZ
share 80
garant K
fullinstit Ústav teorie informace a automatizace AV ČR, v. v. i.
author
ARLID cav_un_auth*0101092
name1 Guy
name2 Tatiana Valentine
institution UTIA-B
full_dept (cz) Adaptivní systémy
full_dept Department of Adaptive Systems
department (cz) AS
department AS
full_dept Department of Adaptive Systems
share 20
garant S
fullinstit Ústav teorie informace a automatizace AV ČR, v. v. i.
source
url https://library.utia.cas.cz/separaty/2026/AS/ruzejnikov-0648523.pdf
source
url https://www.tandfonline.com/doi/full/10.1080/21642583.2026.2646376
cas_special
project
project_id 101168272
agency EC
country XE
ARLID cav_un_auth*0492513
project
project_id CA24136
agency EC
country XE
ARLID cav_un_auth*0504278
project
project_id 2025A1013
agency ČZU
country CZ
ARLID cav_un_auth*0504279
abstract (eng) Reinforcement learning (RL) agents often fail in adversarial environments where the Markov Decision Process (MDP) assumption of a stationary environment is violated. While model-free solutions for this setting exist, planning-based counterparts remain less explored. This paper introduces offline and online value iteration algorithms within the Threatened Markov Decision Process (TMDP) framework, in which the RL agent maintains and updates a Bayesian belief over the adversary’s policy. The belief is integrated into a modified Bellman optimality equation to compute robust policies. We evaluate our framework with the stochastic adversarial multi-agent Coin Game. Our primary finding is that the model-based agent outperforms the TMDP version of model-free Q-learning by a significant margin, confirming that the benefits of model-based planning extend from MDP to TMDP. Furthermore, the proposed framework maintains a performance advantage over Q-learning baselines even when the system’s transition function is unknown. The RL agent also demonstrated robustness to direct adversarial interactions. This work validates TMDP value iteration as an effective, planning-based approach for decision-making against adaptive adversaries.
result_subspec WOS
RIV BC
FORD0 10000
FORD1 10100
FORD2 10103
reportyear 2027
num_of_auth 2
inst_support RVO:67985556
permalink https://hdl.handle.net/11104/0378170
cooperation
ARLID cav_un_auth*0478849
name Provozně ekonomická fakulta, Česká zemědělská univerzita v Praze
institution PEF CZU
country CZ
confidential S
article_num 2646376
mrcbC91 A
mrcbT16-e AUTOMATION&CONTROLSYSTEMS
mrcbT16-f 4.5
mrcbT16-g 0.9
mrcbT16-h 4.3
mrcbT16-i 0.00285
mrcbT16-j 1.004
mrcbT16-k 2102
mrcbT16-q 38
mrcbT16-s 0.783
mrcbT16-y 39.98
mrcbT16-x 5.14
mrcbT16-3 938
mrcbT16-4 Q1
mrcbT16-5 4.200
mrcbT16-6 116
mrcbT16-7 Q2
mrcbT16-C 74.7
mrcbT16-M 0.65
mrcbT16-N Q2
mrcbT16-P 74.7
arlyear 2026
mrcbU14 105033597476 SCOPUS
mrcbU24 PUBMED
mrcbU34 001722680500001 WOS
mrcbU63 cav_un_epca*0630899 Systems Science & Control Engineering 14 1 2026 2164-2583 2164-2583
mrcbU88 Robust Sequential Decision-Making in Adversarial Environments: Codebase https://hdl.handle.net/11104/0376313