bibtype J - Journal Article
ARLID 0599786
utime 20241024072750.9
mtime 20241023235959.9
DOI 10.1016/j.ins.2024.121578
title (primary) (eng) Discounted fully probabilistic design of decision rules
page_count 12 s.
media_type P
ARLID cav_un_epca*0256752
ISSN 0020-0255
title Information Sciences
volume_id 690
name Elsevier
keyword Design principles
keyword Kullback-Leibler's divergence
keyword Probabilistic techniques
keyword Discounting
keyword Closed loop
author (primary)
ARLID cav_un_auth*0101124
name1 Kárný
name2 Miroslav
institution UTIA-B
department AS
full_dept (cz) Adaptivní systémy
full_dept (eng) Department of Adaptive Systems
department (cz) AS
department (eng) AS
garant K
fullinstit Ústav teorie informace a automatizace AV ČR, v. v. i.
ARLID cav_un_auth*0471751
name1 Molnárová
name2 Soňa
institution UTIA-B
full_dept (cz) Adaptivní systémy
full_dept Department of Adaptive Systems
department (cz) AS
department AS
country CZ
fullinstit Ústav teorie informace a automatizace AV ČR, v. v. i.
project_id CA21169
agency EU-COST
country XE
ARLID cav_un_auth*0452289
abstract (eng) Axiomatic fully probabilistic design (FPD) of optimal decision rules strictly extends the decision making (DM) theory represented by Markov decision processes (MDP). This means that any MDP task can be approximated by an explicitly found FPD task whereas many FPD tasks have no MDP equivalent. MDP and FPD model the closed loop — the coupling of an agent and its environment — via a joint probability density (pd) relating the involved random variables, referred to as behaviour. Unlike MDP, FPD quantifies agent’s aims and constraints by an ideal pd. The ideal pd is high on the desired behaviours, small on undesired behaviours and zero on forbidden ones. FPD selects the optimal decision rules as the minimiser of Kullback-Leibler’s divergence of the closed-loop-modelling pd to its ideal twin. The proximity measure choice follows from the FPD axiomatics. MDP minimises the expected total loss, which is usually the sum of discounted partial losses. The discounting reflects the decreasing importance of future losses. It also diminishes the influence of errors caused by:\n▶ the imperfection of the employed environment model.\n▶ roughly-expressed aims.\n▶ the approximate learning and decision-rules design.\nThe established FPD cannot currently account for these important features. The paper elaborates the missing discounted version of FPD. This non-trivial filling of the gap in FPD also employs an extension of dynamic programming, which is of an independent interest.
result_subspec WOS
FORD0 20000
FORD1 20200
FORD2 20204
reportyear 2025
num_of_auth 2
inst_support RVO:67985556
confidential S
article_num 121578
mrcbC91 C
mrcbT16-j 1.333
mrcbT16-s 2.285
mrcbT16-D Q1
mrcbT16-E Q1
arlyear 2024
mrcbU14 SCOPUS
mrcbU24 PUBMED
mrcbU34 WOS
mrcbU63 cav_un_epca*0256752 Information Sciences 690 1 2024 0020-0255 1872-6291 Elsevier