Similar Search Results

Infinite-horizon policy-gradient estimation

Baxter, J.; Bartlett, P. L. - 2001

Gradient-based approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems associated with policy degradation in value-function methods. In this paper we introduce GPOMDP,...

Persistent link: https://www.econbiz.de/10009438377