bibtype C - Conference Paper (international conference)
ARLID 0534000
utime 20240103224647.4
mtime 20201105235959.9
SCOPUS 85098853951
DOI 10.1109/SMC42975.2020.9283093
title (primary) (eng) Similarity-based transfer learning of decision policies
specification
page_count 8 s.
media_type E
serial
ARLID cav_un_epca*0534238
ISBN 978-1-7281-8527-9
ISSN 1062-922X
title Proceedings of the IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS 2020
page_num 37-44
publisher
place Piscataway
name IEEE
year 2020
keyword probabilistic model
keyword fully probabilistic design
keyword transfer learning
keyword closed-loop behavior
keyword Bayesian estimation
keyword sequential decision making
author (primary)
ARLID cav_un_auth*0398726
name1 Zugarová
name2 Eliška
institution UTIA-B
full_dept (cz) Adaptivní systémy
full_dept (eng) Department of Adaptive Systems
department (cz) AS
department (eng) AS
country CZ
fullinstit Ústav teorie informace a automatizace AV ČR, v. v. i.
author
ARLID cav_un_auth*0101092
name1 Guy
name2 Tatiana Valentine
institution UTIA-B
full_dept (cz) Adaptivní systémy
full_dept Department of Adaptive Systems
department (cz) AS
department AS
full_dept Department of Adaptive Systems
share 50
garant A
fullinstit Ústav teorie informace a automatizace AV ČR, v. v. i.
source
url http://library.utia.cas.cz/separaty/2020/AS/guy-0534000.pdf
cas_special
project
project_id LTC18075
agency GA MŠk
country CZ
ARLID cav_un_auth*0372050
abstract (eng) We consider a problem of learning decision policy from past experience available. Using the Fully Probabilistic Design (FPD) formalism, we propose a new general approach for finding a stochastic policy from the past data. The proposedapproach assigns degree of similarity to all of the past closed-loop behaviors. The degree of similarity expresses how close the current decision making task is to a past task. Then it is used by Bayesian estimation to learn an approximate optimal policy, which comprises the best past experience. The approach learns decision policy directly from the data without interacting with any supervisor/expert or using any reinforcement signal. The past experience may consider a decision objective different than the current one. Moreover the past decision policy need not to be optimal with respect to the past objective. We demonstrate our approach on simulated examples and show that the learned policy achieves better performance than optimal FPD policy whenever a mismodeling is present.
action
ARLID cav_un_auth*0398727
name IEEE International Conference on Systems, Man and Cybernetics 2020
dates 20201011
mrcbC20-s 20201014
place Toronto
country CA
RIV BB
FORD0 10000
FORD1 10100
FORD2 10103
reportyear 2021
num_of_auth 2
presentation_type PR
inst_support RVO:67985556
permalink http://hdl.handle.net/11104/0312464
confidential S
arlyear 2020
mrcbU14 85098853951 SCOPUS
mrcbU24 PUBMED
mrcbU34 WOS
mrcbU63 cav_un_epca*0534238 Proceedings of the IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS 2020 978-1-7281-8527-9 1062-922X 37 44 Piscataway IEEE 2020