UTIA - Library

bibtype

C - Conference Paper (international conference)

ARLID

0503817

utime

20241106135724.9

mtime

20190408235959.9

SCOPUS

85064837601

DOI

10.5220/0007587208570864

title (primary) (eng)

Balancing Exploitation and Exploration via Fully Probabilistic Design of Decision Policies

specification

page_count	8 s.
media_type	P

serial

ARLID

cav_un_epca*0503816

ISBN

978-989-758-350-6

title

Proceedings of the 11th International Conference on Agents and Artificial Intelligence

part_num

page_num

857-864

publisher

place	Setúbal
name	SciTePress
year	2019

editor

name1	Rocha
name2	A.

editor

name1	Steels
name2	L.

editor

name1	van den Herik
name2	J.

keyword

exploitation

keyword

exploration

keyword

adaptive systems

keyword

Bayesian estimation

keyword

fully probabilistic design

keyword

Markov decision process

author (primary)

ARLID	cav_un_auth*0101124
name1	Kárný
name2	Miroslav
institution	UTIA-B
full_dept (cz)	Adaptivní systémy
full_dept (eng)	Department of Adaptive Systems
department (cz)	AS
department (eng)	AS
full_dept	Department of Adaptive Systems
fullinstit	Ústav teorie informace a automatizace AV ČR, v. v. i.

author

ARLID	cav_un_auth*0333671
name1	Hůla
name2	František
institution	UTIA-B
full_dept (cz)	Adaptivní systémy
full_dept	Department of Adaptive Systems
department (cz)	AS
department	AS
full_dept	Department of Adaptive Systems
country	CZ
fullinstit	Ústav teorie informace a automatizace AV ČR, v. v. i.

source

url	http://library.utia.cas.cz/separaty/2019/AS/hula-0503817.pdf

cas_special

project

ARLID	cav_un_auth*0331019
project_id	GA16-09848S
agency	GA ČR
country	CZ

project

ARLID	cav_un_auth*0362986
project_id	GA18-15970S
agency	GA ČR
country	CZ

abstract (eng)

Adaptive decision making learns an environment model serving a design of a decision policy. The policy-generated actions influence both the acquired reward and the future knowledge. The optimal policy properly balances exploitation with exploration. The inherent dimensionality curse of decision making under incomplete knowledge prevents the realisation of the optimal design. This has stimulated repetitive attempts to reach this balance at least approximately. Usually, either: (a) the exploitative reward is enriched by a part reflecting the exploration quality and a feasible approximate certainty-equivalent design is made, or (b) an explorative random noise is added to the purely exploitative actions. This paper avoids the inauspicious (a) and improves (b) by employing the non-standard fully probabilistic design (FPD) of decision policies, which naturally generates random actions. Monte-Carlo experiments confirm its achieved quality. The quality stems from methodological contributions, which include: (i) an improvement of the relation between FPD and standard Markov decision processes, (ii) a design of an adaptive tuning of an FPD-parameter. The latter also suits for the tuning of the temperature in both simulated annealing and Boltzmann’s machine.

action

ARLID	cav_un_auth*0374223
name	International Conference on Agents and Artificial Intelligence
dates	20190219
mrcbC20-s	20190221
place	Praha
country	CZ

RIV

FORD0

10000

FORD1

10200

FORD2

10201

reportyear

2020

num_of_auth

mrcbC52

4 A sml 4as 20241106135724.9

presentation_type

inst_support

RVO:67985556

permalink

http://hdl.handle.net/11104/0295689

confidential

contract

name	Content to publish and copyright transfer
date	20190117

arlyear

2019

mrcbTft

\nSoubory v repozitáři: hula-0503817-copyright.pdf

mrcbU14

85064837601 SCOPUS

mrcbU24

PUBMED

mrcbU34

WOS

mrcbU63

cav_un_epca*0503816 Proceedings of the 11th International Conference on Agents and Artificial Intelligence 2 SciTePress 2019 Setúbal 857 864 978-989-758-350-6

mrcbU67

340 Rocha A.

mrcbU67

340 Steels L.

mrcbU67

340 van den Herik J.