bibtype C - Conference Paper (international conference)
ARLID 0604532
utime 20250224135743.9
mtime 20250120235959.9
title (primary) (eng) Policy Learning via Fully Probabilistic Design
specification
page_count 1 s.
media_type P
serial
ARLID cav_un_epca*0604531
title DYNALIFE WG1-WG2 Interaction Meeting Data driven evidence: theoretical models and complex biological data
page_num 52-52
publisher
place Brusel
name The European Cooperation in Science and Technology (COST)
year 2024
keyword Fully probabilistic design
keyword imitation learning
keyword Kullback-Liebler divergence
keyword learning from demonstration
keyword optimal policy.
author (primary)
ARLID cav_un_auth*0355639
name1 Fakhimi Derakhshan
name2 Siavash
institution UTIA-B
full_dept (cz) Adaptivní systémy
full_dept (eng) Department of Adaptive Systems
department (cz) AS
department (eng) AS
full_dept Department of Adaptive Systems
country IR
share 70
garant A
fullinstit Ústav teorie informace a automatizace AV ČR, v. v. i.
author
ARLID cav_un_auth*0101092
name1 Guy
name2 Tatiana Valentine
institution UTIA-B
full_dept (cz) Adaptivní systémy
full_dept Department of Adaptive Systems
department (cz) AS
department AS
full_dept Department of Adaptive Systems
share 30
garant S
fullinstit Ústav teorie informace a automatizace AV ČR, v. v. i.
source
url https://library.utia.cas.cz/separaty/2025/AS/guy-0604532.pdf
cas_special
project
project_id CA21169
agency EU-COST
country XE
ARLID cav_un_auth*0452289
abstract (eng) Applying formalism of fully probabilistic design, we propose a new general data driven approach for finding a stochastic policy from demonstrations. The approach infers a policy directly from data without interaction with the expert or using any reinforcement signal. The expert’s actions generally need not to be optimal. The proposed approach learns an optimal policy by minimising Kullback-Liebler divergence between probabilistic description of the actual agent-environment behaviour and the distribution describing the targeted behaviour of the optimised closed loop. We demonstrate our approach on simulated examples and show that the learned policy: i) converges to the optimised policy obtained by FPD. ii) achieves better performance than the optimal FPD policy whenever a mismodelling is present.
action
ARLID cav_un_auth*0481157
name DYNALIFE Interaction Meeting Data driven evidence: theoretical models and complex biological data
dates 20240605
country GR
mrcbC20-s 20240607
place Thessaloniki
RIV IN
FORD0 10000
FORD1 10200
FORD2 10201
reportyear 2025
num_of_auth 2
presentation_type PR
inst_support RVO:67985556
permalink https://hdl.handle.net/11104/0363792
confidential S
arlyear 2024
mrcbU14 SCOPUS
mrcbU24 PUBMED
mrcbU34 WOS
mrcbU63 cav_un_epca*0604531 DYNALIFE WG1-WG2 Interaction Meeting Data driven evidence: theoretical models and complex biological data The European Cooperation in Science and Technology (COST) 2024 Brusel 52 52