Balancing Exploitation and Exploration via Fully Probabilistic Design of Decision Policies

0495875 20240103220824.920181105235959.9 Balancing Exploitation and Exploration via Fully Probabilistic Design of Decision Policies Praha ÚTIA AV ČR, v.v.i 2018 13 s. P Research Report 2376 Exploitation Exploration Bayesian estimation Adaptive systems Fully probabilistic design Kullback-Leibler divergence Decision policy Markov decision process cav_un_auth*0101124 Kárný Miroslav Adaptivní systémy Department of Adaptive Systems AS AS UTIA-B Department of Adaptive Systems Ústav teorie informace a automatizace AV ČR, v. v. i. cav_un_auth*0333671 Hůla František Adaptivní systémy Department of Adaptive Systems AS AS UTIA-B Department of Adaptive Systems CZ Ústav teorie informace a automatizace AV ČR, v. v. i. http://library.utia.cas.cz/separaty/2018/AS/karny-0495875.pdf cav_un_auth*0331019 GA16-09848S GA ČR cav_un_auth*0362986 GA18-15970S GA ČR CZ Adaptive decision making learns an environment model serving a design of a decision policy. The policy-generated actions influence both the acquired reward and the future knowledge. The optimal policy properly balances exploitation with exploration. The inherent dimensionality curse of decision making under incomplete knowledge prevents the realisation of the optimal design. BC 10000 10200 10201 2019 2 4 O 4o 20231122143532.6 RVO:67985556 http://hdl.handle.net/11104/0288947 S 2018 Soubory v repozitáři: 0495875.pdf 2018 Praha ÚTIA AV ČR, v.v.i