| bibtype |
J -
Journal Article
|
| ARLID |
0648523 |
| utime |
20260417130701.6 |
| mtime |
20260409235959.9 |
| SCOPUS |
105033597476 |
| WOS |
001722680500001 |
| DOI |
10.1080/21642583.2026.2646376 |
| title
(primary) (eng) |
Robust sequential decision-making in adversarial environments |
| specification |
| page_count |
21 s. |
| media_type |
E |
|
| serial |
| ARLID |
cav_un_epca*0630899 |
| ISSN |
2164-2583 |
| title
|
Systems Science & Control Engineering |
| volume_id |
14 |
|
| keyword |
dynamic programming |
| keyword |
adversarial machine learning |
| keyword |
multi-agent reinforcement learning |
| keyword |
robust reinforcement learning |
| keyword |
Bayesian reinforcement learning |
| author
(primary) |
| ARLID |
cav_un_auth*0491463 |
| name1 |
Ružejnikov |
| name2 |
Jurij |
| institution |
UTIA-B |
| full_dept (cz) |
Adaptivní systémy |
| full_dept (eng) |
Department of Adaptive Systems |
| department (cz) |
AS |
| department (eng) |
AS |
| country |
CZ |
| share |
80 |
| garant |
K |
| fullinstit |
Ústav teorie informace a automatizace AV ČR, v. v. i. |
|
| author
|
| ARLID |
cav_un_auth*0101092 |
| name1 |
Guy |
| name2 |
Tatiana Valentine |
| institution |
UTIA-B |
| full_dept (cz) |
Adaptivní systémy |
| full_dept |
Department of Adaptive Systems |
| department (cz) |
AS |
| department |
AS |
| full_dept |
Department of Adaptive Systems |
| share |
20 |
| garant |
S |
| fullinstit |
Ústav teorie informace a automatizace AV ČR, v. v. i. |
|
| source |
|
| source |
|
| cas_special |
| project |
| project_id |
101168272 |
| agency |
EC |
| country |
XE |
| ARLID |
cav_un_auth*0492513 |
|
| project |
| project_id |
CA24136 |
| agency |
EC |
| country |
XE |
| ARLID |
cav_un_auth*0504278 |
|
| project |
| project_id |
2025A1013 |
| agency |
ČZU |
| country |
CZ |
| ARLID |
cav_un_auth*0504279 |
|
| abstract
(eng) |
Reinforcement learning (RL) agents often fail in adversarial environments where the Markov Decision Process (MDP) assumption of a stationary environment is violated. While model-free solutions for this setting exist, planning-based counterparts remain less explored. This paper introduces offline and online value iteration algorithms within the Threatened Markov Decision Process (TMDP) framework, in which the RL agent maintains and updates a Bayesian belief over the adversary’s policy. The belief is integrated into a modified Bellman optimality equation to compute robust policies. We evaluate our framework with the stochastic adversarial multi-agent Coin Game. Our primary finding is that the model-based agent outperforms the TMDP version of model-free Q-learning by a significant margin, confirming that the benefits of model-based planning extend from MDP to TMDP. Furthermore, the proposed framework maintains a performance advantage over Q-learning baselines even when the system’s transition function is unknown. The RL agent also demonstrated robustness to direct adversarial interactions. This work validates TMDP value iteration as an effective, planning-based approach for decision-making against adaptive adversaries. |
| result_subspec |
WOS |
| RIV |
BC |
| FORD0 |
10000 |
| FORD1 |
10100 |
| FORD2 |
10103 |
| reportyear |
2027 |
| num_of_auth |
2 |
| inst_support |
RVO:67985556 |
| permalink |
https://hdl.handle.net/11104/0378170 |
| cooperation |
| ARLID |
cav_un_auth*0478849 |
| name |
Provozně ekonomická fakulta, Česká zemědělská univerzita v Praze |
| institution |
PEF CZU |
| country |
CZ |
|
| confidential |
S |
| article_num |
2646376 |
| mrcbC91 |
A |
| mrcbT16-e |
AUTOMATION&CONTROLSYSTEMS |
| mrcbT16-f |
4.5 |
| mrcbT16-g |
0.9 |
| mrcbT16-h |
4.3 |
| mrcbT16-i |
0.00285 |
| mrcbT16-j |
1.004 |
| mrcbT16-k |
2102 |
| mrcbT16-q |
38 |
| mrcbT16-s |
0.783 |
| mrcbT16-y |
39.98 |
| mrcbT16-x |
5.14 |
| mrcbT16-3 |
938 |
| mrcbT16-4 |
Q1 |
| mrcbT16-5 |
4.200 |
| mrcbT16-6 |
116 |
| mrcbT16-7 |
Q2 |
| mrcbT16-C |
74.7 |
| mrcbT16-M |
0.65 |
| mrcbT16-N |
Q2 |
| mrcbT16-P |
74.7 |
| arlyear |
2026 |
| mrcbU14 |
105033597476 SCOPUS |
| mrcbU24 |
PUBMED |
| mrcbU34 |
001722680500001 WOS |
| mrcbU63 |
cav_un_epca*0630899 Systems Science & Control Engineering 14 1 2026 2164-2583 2164-2583 |
| mrcbU88 |
Robust Sequential Decision-Making in Adversarial Environments: Codebase https://hdl.handle.net/11104/0376313 |
|