//-->
Finding optimal memoryless policies of POMDPs under the expected average reward criterion
Li, Yanjie, (2011)
A basic formula for performance gradient estimation of semi-Markov decision processes
Li, Yanjie, (2013)