N-to-1 Knowledge Transfer in Reinforcement Learning via Adaptive Q-Function Selection

0649859 20260526110123.620260525235959.9 105034452028 001731023500044 10.1109/ACCESS.2026.3676711 N-to-1 Knowledge Transfer in Reinforcement Learning via Adaptive Q-Function Selection 13 s. E cav_un_epca*04610362169-3536IEEE Access141 (2026)45964-45976Institute of Electrical and Electronics Engineers Reinforcement learning transfer learning multi-source knowledge transfer adaptive policy selection Q-function selection cav_un_auth*0333672 Ruman Marko UTIA-B Adaptivní systémy Department of Adaptive Systems AS AS SK 70% K Ústav teorie informace a automatizace AV ČR, v. v. i. cav_un_auth*0469825 Guy Tatiana V. UTIA-B Adaptivní systémy Department of Adaptive Systems AS AS CZ 30% S Ústav teorie informace a automatizace AV ČR, v. v. i. https://library.utia.cas.cz/separaty/2026/AS/guy-0649859.pdf https://ieeexplore.ieee.org/document/11450364/ 101168272 EC XE cav_un_auth*0492513 CA24136 EC XE cav_un_auth*0504278 This paper tackles multi-source (N-to-1) knowledge transfer in reinforcement learning (RL), where an agent must adaptively solve a new task by leveraging a library of pre-learned skills. We consider a setting where these skills are represented as multiple Q-functions and corresponding environment models, without any explicit labels to guide the selection. To address this scenario, we introduce a theoretically grounded method that dynamically selects the most suitable Q-function at each learning stage. Instead of relying on noisy, short-term signals, our approach makes a farsighted choice by simulating the long-term performance of each skill while simultaneously evaluating the trustworthiness of its underlying world model. This allows the agent to intelligently transition from relying on transferred knowledge to using its own newly acquired policy. The proposed method is evaluated in three scenarios: 1) selection among multiple Q-functions to solve a fixed RL task, 2) adaptation in a dynamically changing environment, and 3) switching from a partially learned Q-function to a newly learned one. In all cases, our method accelerates learning and demonstrates robust adaptation, confirming its effectiveness for scalable multi-source transfer in RL. WOS BB 10000 10200 10201 2027 2 RVO:67985556 https://hdl.handle.net/11104/0378906 S A TELECOMMUNICATIONS|ENGINEERING.ELECTRICAL&ELECTRONIC|COMPUTERSCIENCE.INFORMATIONSYSTEMS 3.9 0.8 4.1 0.3457 0.67 294150 290 0.849 50.78 5.31 185847 Q1 3.200 13193 Q2 62.3 0.83 Q2 64.8 2026 105034452028 SCOPUS PUBMED 001731023500044 WOS cav_un_epca*0461036 IEEE Access Roč. 14 č. 1 2026 45964 45976 2169-3536 2169-3536 Institute of Electrical and Electronics Engineers