<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="style/detail_T.xsl"?>
<bibitem type="J">   <ARLID>0485146</ARLID> <utime>20240903170638.9</utime><mtime>20180119235959.9</mtime>   <SCOPUS>85040739483</SCOPUS> <WOS>000424732300008</WOS>  <DOI>10.14736/kyb-2017-6-1086</DOI>           <title language="eng" primary="1">Second  Order Optimality in Markov Decision Chains</title>  <specification> <page_count>14 s.</page_count> <media_type>P</media_type> </specification>   <serial><ARLID>cav_un_epca*0297163</ARLID><ISSN>0023-5954</ISSN><title>Kybernetika</title><part_num/><part_title/><volume_id>53</volume_id><volume>6 (2017)</volume><page_num>1086-1099</page_num><publisher><place/><name>Ústav teorie informace a automatizace AV ČR, v. v. i.</name><year/></publisher></serial>    <keyword>Markov decision chains</keyword>   <keyword>second order optimality</keyword>   <keyword>optimalilty conditions for transient, discounted and average models</keyword>   <keyword>policy and value iterations</keyword>    <author primary="1"> <ARLID>cav_un_auth*0101196</ARLID> <full_dept language="cz">Ekonometrie</full_dept> <full_dept language="eng">Department of Econometrics</full_dept> <department language="cz">E</department> <department language="eng">E</department> <full_dept>Department of Econometrics</full_dept>  <share>100%</share> <name1>Sladký</name1> <name2>Karel</name2> <institution>UTIA-B</institution> <garant>K</garant> <fullinstit>Ústav teorie informace a automatizace AV ČR, v. v. i.</fullinstit> </author>   <source> <url>http://library.utia.cas.cz/separaty/2017/E/sladky-0485146.pdf</url> </source>        <cas_special> <project> <ARLID>cav_un_auth*0321097</ARLID> <project_id>GA15-10331S</project_id> <agency>GA ČR</agency> </project>  <abstract language="eng" primary="1">The article is devoted to Markov reward chains in discrete-time setting with finite state spaces. Unfortunately, the usual optimization criteria examined in the literature on Markov decision chains, such as a total discounted, total reward up to reaching some specific state (called the first passage models) or mean (average) reward optimality, may be quite insufficient to characterize the problem from the point of a decision maker. To this end it seems that it may be preferable if not necessary to select more sophisticated criteria that also reflect variability -risk features of the problem. Perhaps the best known approaches stem from the classical work of Markowitz on mean variance selection rules, i.e. we optimize the weighted sum of average or total reward and its variance. The article presents explicit formulae for calculating the variances for transient and discounted models (where the value of the discount factor depends on the current state and action taken) for finite and infinite time horizon. The same result is presented for the long run average nondiscounted models where finding stationary policies minimizing the average variance in the class of policies with a given long run average reward is discussed.</abstract>     <RIV>BB</RIV> <FORD0>10000</FORD0> <FORD1>10100</FORD1> <FORD2>10103</FORD2>    <reportyear>2018</reportyear>      <num_of_auth>1</num_of_auth>  <inst_support> RVO:67985556 </inst_support>  <permalink>http://hdl.handle.net/11104/0280354</permalink>   <confidential>S</confidential>  <unknown tag="mrcbC86"> 3+4 Article|Proceedings Paper Computer Science Cybernetics  </unknown> <unknown tag="mrcbC86"> 3+4 Article|Proceedings Paper Computer Science Cybernetics  </unknown> <unknown tag="mrcbC86"> 3+4 Article|Proceedings Paper Computer Science Cybernetics  </unknown>         <unknown tag="mrcbT16-e">COMPUTERSCIENCE.CYBERNETICS</unknown> <unknown tag="mrcbT16-f">0.596</unknown> <unknown tag="mrcbT16-g">0.048</unknown> <unknown tag="mrcbT16-h">12.4</unknown> <unknown tag="mrcbT16-i">0.00096</unknown> <unknown tag="mrcbT16-j">0.224</unknown> <unknown tag="mrcbT16-k">808</unknown> <unknown tag="mrcbT16-s">0.321</unknown> <unknown tag="mrcbT16-5">0.513</unknown> <unknown tag="mrcbT16-6">63</unknown> <unknown tag="mrcbT16-7">Q4</unknown> <unknown tag="mrcbT16-B">18.907</unknown> <unknown tag="mrcbT16-C">11.4</unknown> <unknown tag="mrcbT16-D">Q4</unknown> <unknown tag="mrcbT16-E">Q3</unknown> <unknown tag="mrcbT16-M">0.2</unknown> <unknown tag="mrcbT16-N">Q4</unknown> <unknown tag="mrcbT16-P">11.364</unknown> <arlyear>2017</arlyear>       <unknown tag="mrcbU14"> 85040739483 SCOPUS </unknown> <unknown tag="mrcbU24"> PUBMED </unknown> <unknown tag="mrcbU34"> 000424732300008 WOS </unknown> <unknown tag="mrcbU63"> cav_un_epca*0297163 Kybernetika 0023-5954 Roč. 53 č. 6 2017 1086 1099 Ústav teorie informace a automatizace AV ČR, v. v. i. </unknown> </cas_special> </bibitem>