bibtype C - Conference Paper (international conference)
ARLID 0503817
utime 20241106135724.9
mtime 20190408235959.9
SCOPUS 85064837601
DOI 10.5220/0007587208570864
title (primary) (eng) Balancing Exploitation and Exploration via Fully Probabilistic Design of Decision Policies
specification
page_count 8 s.
media_type P
serial
ARLID cav_un_epca*0503816
ISBN 978-989-758-350-6
title Proceedings of the 11th International Conference on Agents and Artificial Intelligence
part_num 2
page_num 857-864
publisher
place Setúbal
name SciTePress
year 2019
editor
name1 Rocha
name2 A.
editor
name1 Steels
name2 L.
editor
name1 van den Herik
name2 J.
keyword exploitation
keyword exploration
keyword adaptive systems
keyword Bayesian estimation
keyword fully probabilistic design
keyword Markov decision process
author (primary)
ARLID cav_un_auth*0101124
name1 Kárný
name2 Miroslav
institution UTIA-B
full_dept (cz) Adaptivní systémy
full_dept (eng) Department of Adaptive Systems
department (cz) AS
department (eng) AS
full_dept Department of Adaptive Systems
fullinstit Ústav teorie informace a automatizace AV ČR, v. v. i.
author
ARLID cav_un_auth*0333671
name1 Hůla
name2 František
institution UTIA-B
full_dept (cz) Adaptivní systémy
full_dept Department of Adaptive Systems
department (cz) AS
department AS
full_dept Department of Adaptive Systems
country CZ
fullinstit Ústav teorie informace a automatizace AV ČR, v. v. i.
source
url http://library.utia.cas.cz/separaty/2019/AS/hula-0503817.pdf
cas_special
project
ARLID cav_un_auth*0331019
project_id GA16-09848S
agency GA ČR
country CZ
project
ARLID cav_un_auth*0362986
project_id GA18-15970S
agency GA ČR
country CZ
abstract (eng) Adaptive decision making learns an environment model serving a design of a decision policy. The policy-generated actions influence both the acquired reward and the future knowledge. The optimal policy properly balances exploitation with exploration. The inherent dimensionality curse of decision making under incomplete knowledge prevents the realisation of the optimal design. This has stimulated repetitive attempts to reach this balance at least approximately. Usually, either: (a) the exploitative reward is enriched by a part reflecting the exploration quality and a feasible approximate certainty-equivalent design is made, or (b) an explorative random noise is added to the purely exploitative actions. This paper avoids the inauspicious (a) and improves (b) by employing the non-standard fully probabilistic design (FPD) of decision policies, which naturally generates random actions. Monte-Carlo experiments confirm its achieved quality. The quality stems from methodological contributions, which include: (i) an improvement of the relation between FPD and standard Markov decision processes, (ii) a design of an adaptive tuning of an FPD-parameter. The latter also suits for the tuning of the temperature in both simulated annealing and Boltzmann’s machine.
action
ARLID cav_un_auth*0374223
name International Conference on Agents and Artificial Intelligence
dates 20190219
mrcbC20-s 20190221
place Praha
country CZ
RIV BC
FORD0 10000
FORD1 10200
FORD2 10201
reportyear 2020
num_of_auth 2
mrcbC52 4 A sml 4as 20241106135724.9
presentation_type PO
inst_support RVO:67985556
permalink http://hdl.handle.net/11104/0295689
confidential S
contract
name Content to publish and copyright transfer
date 20190117
arlyear 2019
mrcbTft \nSoubory v repozitáři: hula-0503817-copyright.pdf
mrcbU14 85064837601 SCOPUS
mrcbU24 PUBMED
mrcbU34 WOS
mrcbU63 cav_un_epca*0503816 Proceedings of the 11th International Conference on Agents and Artificial Intelligence 2 SciTePress 2019 Setúbal 857 864 978-989-758-350-6
mrcbU67 340 Rocha A.
mrcbU67 340 Steels L.
mrcbU67 340 van den Herik J.