bibtype |
C -
Conference Paper (international conference)
|
ARLID |
0503817 |
utime |
20241106135724.9 |
mtime |
20190408235959.9 |
SCOPUS |
85064837601 |
DOI |
10.5220/0007587208570864 |
title
(primary) (eng) |
Balancing Exploitation and Exploration via Fully Probabilistic Design of Decision Policies |
specification |
page_count |
8 s. |
media_type |
P |
|
serial |
ARLID |
cav_un_epca*0503816 |
ISBN |
978-989-758-350-6 |
title
|
Proceedings of the 11th International Conference on Agents and Artificial Intelligence |
part_num |
2 |
page_num |
857-864 |
publisher |
place |
Setúbal |
name |
SciTePress |
year |
2019 |
|
editor |
|
editor |
|
editor |
name1 |
van den Herik |
name2 |
J. |
|
|
keyword |
exploitation |
keyword |
exploration |
keyword |
adaptive systems |
keyword |
Bayesian estimation |
keyword |
fully probabilistic design |
keyword |
Markov decision process |
author
(primary) |
ARLID |
cav_un_auth*0101124 |
name1 |
Kárný |
name2 |
Miroslav |
institution |
UTIA-B |
full_dept (cz) |
Adaptivní systémy |
full_dept (eng) |
Department of Adaptive Systems |
department (cz) |
AS |
department (eng) |
AS |
full_dept |
Department of Adaptive Systems |
fullinstit |
Ústav teorie informace a automatizace AV ČR, v. v. i. |
|
author
|
ARLID |
cav_un_auth*0333671 |
name1 |
Hůla |
name2 |
František |
institution |
UTIA-B |
full_dept (cz) |
Adaptivní systémy |
full_dept |
Department of Adaptive Systems |
department (cz) |
AS |
department |
AS |
full_dept |
Department of Adaptive Systems |
country |
CZ |
fullinstit |
Ústav teorie informace a automatizace AV ČR, v. v. i. |
|
source |
|
cas_special |
project |
ARLID |
cav_un_auth*0331019 |
project_id |
GA16-09848S |
agency |
GA ČR |
country |
CZ |
|
project |
ARLID |
cav_un_auth*0362986 |
project_id |
GA18-15970S |
agency |
GA ČR |
country |
CZ |
|
abstract
(eng) |
Adaptive decision making learns an environment model serving a design of a decision policy. The policy-generated actions influence both the acquired reward and the future knowledge. The optimal policy properly balances exploitation with exploration. The inherent dimensionality curse of decision making under incomplete knowledge prevents the realisation of the optimal design. This has stimulated repetitive attempts to reach this balance at least approximately. Usually, either: (a) the exploitative reward is enriched by a part reflecting the exploration quality and a feasible approximate certainty-equivalent design is made, or (b) an explorative random noise is added to the purely exploitative actions. This paper avoids the inauspicious (a) and improves (b) by employing the non-standard fully probabilistic design (FPD) of decision policies, which naturally generates random actions. Monte-Carlo experiments confirm its achieved quality. The quality stems from methodological contributions, which include: (i) an improvement of the relation between FPD and standard Markov decision processes, (ii) a design of an adaptive tuning of an FPD-parameter. The latter also suits for the tuning of the temperature in both simulated annealing and Boltzmann’s machine. |
action |
ARLID |
cav_un_auth*0374223 |
name |
International Conference on Agents and Artificial Intelligence |
dates |
20190219 |
mrcbC20-s |
20190221 |
place |
Praha |
country |
CZ |
|
RIV |
BC |
FORD0 |
10000 |
FORD1 |
10200 |
FORD2 |
10201 |
reportyear |
2020 |
num_of_auth |
2 |
mrcbC52 |
4 A sml 4as 20241106135724.9 |
presentation_type |
PO |
inst_support |
RVO:67985556 |
permalink |
http://hdl.handle.net/11104/0295689 |
confidential |
S |
contract |
name |
Content to publish and copyright transfer |
date |
20190117 |
|
arlyear |
2019 |
mrcbTft |
\nSoubory v repozitáři: hula-0503817-copyright.pdf |
mrcbU14 |
85064837601 SCOPUS |
mrcbU24 |
PUBMED |
mrcbU34 |
WOS |
mrcbU63 |
cav_un_epca*0503816 Proceedings of the 11th International Conference on Agents and Artificial Intelligence 2 SciTePress 2019 Setúbal 857 864 978-989-758-350-6 |
mrcbU67 |
340 Rocha A. |
mrcbU67 |
340 Steels L. |
mrcbU67 |
340 van den Herik J. |
|