UTIA - Library

bibtype

J - Journal Article

ARLID

0648523

utime

20260417130701.6

mtime

20260409235959.9

SCOPUS

105033597476

WOS

001722680500001

DOI

10.1080/21642583.2026.2646376

title (primary) (eng)

Robust sequential decision-making in adversarial environments

specification

page_count	21 s.
media_type	E

serial

ARLID	cav_un_epca*0630899
ISSN	2164-2583
title	Systems Science & Control Engineering
volume_id	14

keyword

dynamic programming

keyword

adversarial machine learning

keyword

multi-agent reinforcement learning

keyword

robust reinforcement learning

keyword

Bayesian reinforcement learning

author (primary)

ARLID	cav_un_auth*0491463
name1	Ružejnikov
name2	Jurij
institution	UTIA-B
full_dept (cz)	Adaptivní systémy
full_dept (eng)	Department of Adaptive Systems
department (cz)	AS
department (eng)	AS
country	CZ
share	80
garant	K
fullinstit	Ústav teorie informace a automatizace AV ČR, v. v. i.

author

ARLID	cav_un_auth*0101092
name1	Guy
name2	Tatiana Valentine
institution	UTIA-B
full_dept (cz)	Adaptivní systémy
full_dept	Department of Adaptive Systems
department (cz)	AS
department	AS
full_dept	Department of Adaptive Systems
share	20
garant	S
fullinstit	Ústav teorie informace a automatizace AV ČR, v. v. i.

source

url	https://library.utia.cas.cz/separaty/2026/AS/ruzejnikov-0648523.pdf

source

url	https://www.tandfonline.com/doi/full/10.1080/21642583.2026.2646376

cas_special

project

project_id	101168272
agency	EC
country	XE
ARLID	cav_un_auth*0492513

project

project_id	CA24136
agency	EC
country	XE
ARLID	cav_un_auth*0504278

project

project_id	2025A1013
agency	ČZU
country	CZ
ARLID	cav_un_auth*0504279

abstract (eng)

Reinforcement learning (RL) agents often fail in adversarial environments where the Markov Decision Process (MDP) assumption of a stationary environment is violated. While model-free solutions for this setting exist, planning-based counterparts remain less explored. This paper introduces offline and online value iteration algorithms within the Threatened Markov Decision Process (TMDP) framework, in which the RL agent maintains and updates a Bayesian belief over the adversary’s policy. The belief is integrated into a modified Bellman optimality equation to compute robust policies. We evaluate our framework with the stochastic adversarial multi-agent Coin Game. Our primary finding is that the model-based agent outperforms the TMDP version of model-free Q-learning by a significant margin, confirming that the benefits of model-based planning extend from MDP to TMDP. Furthermore, the proposed framework maintains a performance advantage over Q-learning baselines even when the system’s transition function is unknown. The RL agent also demonstrated robustness to direct adversarial interactions. This work validates TMDP value iteration as an effective, planning-based approach for decision-making against adaptive adversaries.

result_subspec

WOS

RIV

FORD0

10000

FORD1

10100

FORD2

10103

reportyear

2027

num_of_auth

inst_support

RVO:67985556

permalink

https://hdl.handle.net/11104/0378170

cooperation

ARLID	cav_un_auth*0478849
name	Provozně ekonomická fakulta, Česká zemědělská univerzita v Praze
institution	PEF CZU
country	CZ

confidential

article_num

2646376

mrcbC91

mrcbT16-e

AUTOMATION&CONTROLSYSTEMS

mrcbT16-f

4.5

mrcbT16-g

0.9

mrcbT16-h

4.3

mrcbT16-i

0.00285

mrcbT16-j

1.004

mrcbT16-k

2102

mrcbT16-q

mrcbT16-s

0.783

mrcbT16-y

39.98

mrcbT16-x

5.14

mrcbT16-3

938

mrcbT16-4

mrcbT16-5

4.200

mrcbT16-6

116

mrcbT16-7

mrcbT16-C

74.7

mrcbT16-M

0.65

mrcbT16-N

mrcbT16-P

74.7

arlyear

2026

mrcbU14

105033597476 SCOPUS

mrcbU24

PUBMED

mrcbU34

001722680500001 WOS

mrcbU63

cav_un_epca*0630899 Systems Science & Control Engineering 14 1 2026 2164-2583 2164-2583

mrcbU88

Robust Sequential Decision-Making in Adversarial Environments: Codebase https://hdl.handle.net/11104/0376313