Showing 1 - 1 of 1
Gradient-based approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems associated with policy degradation in value-function methods. In this paper we introduce GPOMDP,...
Persistent link: https://www.econbiz.de/10009438377