UTIA - Library

bibtype

C - Conference Paper (international conference)

ARLID

0604532

utime

20250224135743.9

mtime

20250120235959.9

title (primary) (eng)

Policy Learning via Fully Probabilistic Design

specification

page_count	1 s.
media_type	P

serial

ARLID

cav_un_epca*0604531

title

DYNALIFE WG1-WG2 Interaction Meeting Data driven evidence: theoretical models and complex biological data

page_num

52-52

publisher

place	Brusel
name	The European Cooperation in Science and Technology (COST)
year	2024

keyword

Fully probabilistic design

keyword

imitation learning

keyword

Kullback-Liebler divergence

keyword

learning from demonstration

keyword

optimal policy.

author (primary)

ARLID	cav_un_auth*0355639
name1	Fakhimi Derakhshan
name2	Siavash
institution	UTIA-B
full_dept (cz)	Adaptivní systémy
full_dept (eng)	Department of Adaptive Systems
department (cz)	AS
department (eng)	AS
full_dept	Department of Adaptive Systems
country	IR
share	70
garant	A
fullinstit	Ústav teorie informace a automatizace AV ČR, v. v. i.

author

ARLID	cav_un_auth*0101092
name1	Guy
name2	Tatiana Valentine
institution	UTIA-B
full_dept (cz)	Adaptivní systémy
full_dept	Department of Adaptive Systems
department (cz)	AS
department	AS
full_dept	Department of Adaptive Systems
share	30
garant	S
fullinstit	Ústav teorie informace a automatizace AV ČR, v. v. i.

source

url	https://library.utia.cas.cz/separaty/2025/AS/guy-0604532.pdf

cas_special

project

project_id	CA21169
agency	EU-COST
country	XE
ARLID	cav_un_auth*0452289

abstract (eng)

Applying formalism of fully probabilistic design, we propose a new general data driven approach for finding a stochastic policy from demonstrations. The approach infers a policy directly from data without interaction with the expert or using any reinforcement signal. The expert’s actions generally need not to be optimal. The proposed approach learns an optimal policy by minimising Kullback-Liebler divergence between probabilistic description of the actual agent-environment behaviour and the distribution describing the targeted behaviour of the optimised closed loop. We demonstrate our approach on simulated examples and show that the learned policy: i) converges to the optimised policy obtained by FPD. ii) achieves better performance than the optimal FPD policy whenever a mismodelling is present.

action

ARLID	cav_un_auth*0481157
name	DYNALIFE Interaction Meeting Data driven evidence: theoretical models and complex biological data
dates	20240605
country	GR
mrcbC20-s	20240607
place	Thessaloniki

RIV

FORD0

10000

FORD1

10200

FORD2

10201

reportyear

2025

num_of_auth

presentation_type

inst_support

RVO:67985556

permalink

https://hdl.handle.net/11104/0363792

confidential

arlyear

2024

mrcbU14

SCOPUS

mrcbU24

PUBMED

mrcbU34

WOS

mrcbU63

cav_un_epca*0604531 DYNALIFE WG1-WG2 Interaction Meeting Data driven evidence: theoretical models and complex biological data The European Cooperation in Science and Technology (COST) 2024 Brusel 52 52