<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="style/detail_T.xsl"?>
<bibitem type="J">   <ARLID>0648523</ARLID> <utime>20260417130701.6</utime><mtime>20260409235959.9</mtime>   <SCOPUS>105033597476</SCOPUS> <WOS>001722680500001</WOS>  <DOI>10.1080/21642583.2026.2646376</DOI>           <title language="eng" primary="1">Robust sequential decision-making in adversarial environments</title>  <specification> <page_count>21 s.</page_count> <media_type>E</media_type> </specification>   <serial><ARLID>cav_un_epca*0630899</ARLID><ISSN>2164-2583</ISSN><title>Systems Science &amp; Control Engineering</title><part_num/><part_title/><volume_id>14</volume_id><volume/></serial>    <keyword>dynamic programming</keyword>   <keyword>adversarial machine learning</keyword>   <keyword>multi-agent reinforcement learning</keyword>   <keyword>robust reinforcement learning</keyword>   <keyword>Bayesian reinforcement learning</keyword>    <author primary="1"> <ARLID>cav_un_auth*0491463</ARLID> <name1>Ružejnikov</name1> <name2>Jurij</name2> <institution>UTIA-B</institution> <full_dept language="cz">Adaptivní systémy</full_dept> <full_dept language="eng">Department of Adaptive Systems</full_dept> <department language="cz">AS</department> <department language="eng">AS</department> <country>CZ</country>  <share>80</share> <garant>K</garant> <fullinstit>Ústav teorie informace a automatizace AV ČR, v. v. i.</fullinstit> </author> <author primary="0"> <ARLID>cav_un_auth*0101092</ARLID> <name1>Guy</name1> <name2>Tatiana Valentine</name2> <institution>UTIA-B</institution> <full_dept language="cz">Adaptivní systémy</full_dept> <full_dept>Department of Adaptive Systems</full_dept> <department language="cz">AS</department> <department>AS</department> <full_dept>Department of Adaptive Systems</full_dept>  <share>20</share> <garant>S</garant> <fullinstit>Ústav teorie informace a automatizace AV ČR, v. v. i.</fullinstit> </author>   <source> <url>https://library.utia.cas.cz/separaty/2026/AS/ruzejnikov-0648523.pdf</url> </source> <source> <url>https://www.tandfonline.com/doi/full/10.1080/21642583.2026.2646376</url>  </source>        <cas_special> <project> <project_id>101168272</project_id> <agency>EC</agency> <country>XE</country>   <ARLID>cav_un_auth*0492513</ARLID> </project> <project> <project_id>CA24136</project_id> <agency>EC</agency> <country>XE</country>  <ARLID>cav_un_auth*0504278</ARLID> </project> <project> <project_id>2025A1013</project_id> <agency>ČZU</agency> <country>CZ</country> <ARLID>cav_un_auth*0504279</ARLID> </project>  <abstract language="eng" primary="1">Reinforcement learning (RL) agents often fail in adversarial environments where the Markov Decision Process (MDP) assumption of a stationary environment is violated. While model-free solutions for this setting exist, planning-based counterparts remain less explored. This paper introduces offline and online value iteration algorithms within the Threatened Markov Decision Process (TMDP) framework, in which the RL agent maintains and updates a Bayesian belief over the adversary’s policy. The belief is integrated into a modified Bellman optimality equation to compute robust policies. We evaluate our framework with the stochastic adversarial multi-agent Coin Game. Our primary finding is that the model-based agent outperforms the TMDP version of model-free Q-learning by a significant margin, confirming that the benefits of model-based planning extend from MDP to TMDP. Furthermore, the proposed framework maintains a performance advantage over Q-learning baselines even when the system’s transition function is unknown. The RL agent also demonstrated robustness to direct adversarial interactions. This work validates TMDP value iteration as an effective, planning-based approach for decision-making against adaptive adversaries.</abstract>     <result_subspec>WOS</result_subspec> <RIV>BC</RIV> <FORD0>10000</FORD0> <FORD1>10100</FORD1> <FORD2>10103</FORD2>    <reportyear>2027</reportyear>      <num_of_auth>2</num_of_auth>  <inst_support> RVO:67985556 </inst_support>  <permalink>https://hdl.handle.net/11104/0378170</permalink>  <cooperation> <ARLID>cav_un_auth*0478849</ARLID> <name>Provozně ekonomická fakulta, Česká zemědělská univerzita v Praze</name> <institution>PEF CZU</institution> <country>CZ</country> </cooperation>  <confidential>S</confidential>    <article_num> 2646376 </article_num> <unknown tag="mrcbC91"> A </unknown>         <unknown tag="mrcbT16-e">AUTOMATION&amp;CONTROLSYSTEMS</unknown> <unknown tag="mrcbT16-f">4.5</unknown> <unknown tag="mrcbT16-g">0.9</unknown> <unknown tag="mrcbT16-h">4.3</unknown> <unknown tag="mrcbT16-i">0.00285</unknown> <unknown tag="mrcbT16-j">1.004</unknown> <unknown tag="mrcbT16-k">2102</unknown> <unknown tag="mrcbT16-q">38</unknown> <unknown tag="mrcbT16-s">0.783</unknown> <unknown tag="mrcbT16-y">39.98</unknown> <unknown tag="mrcbT16-x">5.14</unknown> <unknown tag="mrcbT16-3">938</unknown> <unknown tag="mrcbT16-4">Q1</unknown> <unknown tag="mrcbT16-5">4.200</unknown> <unknown tag="mrcbT16-6">116</unknown> <unknown tag="mrcbT16-7">Q2</unknown> <unknown tag="mrcbT16-C">74.7</unknown> <unknown tag="mrcbT16-M">0.65</unknown> <unknown tag="mrcbT16-N">Q2</unknown> <unknown tag="mrcbT16-P">74.7</unknown> <arlyear>2026</arlyear>       <unknown tag="mrcbU14"> 105033597476 SCOPUS </unknown> <unknown tag="mrcbU24"> PUBMED </unknown> <unknown tag="mrcbU34"> 001722680500001 WOS </unknown> <unknown tag="mrcbU63"> cav_un_epca*0630899 Systems Science &amp; Control Engineering 14 1 2026 2164-2583 2164-2583 </unknown> <unknown tag="mrcbU88"> Robust Sequential Decision-Making in Adversarial Environments: Codebase https://hdl.handle.net/11104/0376313 </unknown> </cas_special> </bibitem>