Second Order Optimality in Transient and Discounted Markov Decision Chains

0448938 20240111140909.120151023235959.9 000387898900125 Second Order Optimality in Transient and Discounted Markov Decision Chains 6 s. P cav_un_epca*0447280978-80-261-0539-8Procedings of the 33rd International Conference Mathematical Methods in Economics MME 2015731-736PlzeňUniversity of West Bohemia, Plzeň2015 dynamic programming discounted and transient Markov reward chains reward-variance optimality cav_un_auth*0101196 Ekonometrie Department of Econometrics E E Department of Econometrics Sladký Karel UTIA-B Ústav teorie informace a automatizace AV ČR, v. v. i. pdf soubor http://library.utia.cas.cz/separaty/2015/E/sladky-0448938.pdf cav_un_auth*0292652 GA13-14445S GA ČR cav_un_auth*0321097 GA15-10331S GA ČR The article is devoted to second order optimality in Markov decision processes. Attention is primarily focused on the reward variance for discounted models and undiscounted transient models (i.e. where the spectral radius of the transition probability matrix is less than unity). Considering the second order optimality criteria means that in the class of policies maximizing (or minimizing) total expected discounted reward (or undiscounted reward for the transient model) we choose the policy minimizing the total variance. Explicit formulae for calculating the variances for transient and discounted models are reported along with sketches of algoritmic procedures for finding second order optimal policies. cav_un_auth*0320824 Mathematical Methods in Economics 2015 /33./ 09.09.2015-11.09.2015 Cheb CZ BC 2016 1 PR RVO:67985556 http://hdl.handle.net/11104/0250633 S 2015 000387898900125 WOS pdf soubor cav_un_epca*0447280 Procedings of the 33rd International Conference Mathematical Methods in Economics MME 2015 978-80-261-0539-8 731 736 Plzeň University of West Bohemia, Plzeň 2015