A Reinforcement Learning Method of Solving Markov Decision Processes : An Adaptive Exploration Model Based on Temporal Difference Error

To overcome the contradiction between the requirement of knowing the optimal expected payoff of the subprocess after any state at any time and the inability to know in the actual decision process in the backward recursive method for solving Markov decision processes (MDP), this paper proposes a model called Temporal difference Error-based Adaptive Exploration (TEAE) for solving MDP. The model is based on the reinforcement learning method for MDP (RLMDP) and addresses the limitations of traditional MDP solving methods. TEAE has the ability to dynamically adjust the exploration probability based on the agent’s performance. It leverages a deep convolutional neural network to minimize the temporal difference error between the dual networks at each subsequent time step, approximating the optimal expected payoff function of the subprocess after a given state and time. Furthermore, TEAE is seamlessly integrated into the DQN-PER and DDQN-PER methods [1], resulting in the DQN-PER-TEAE and DDQN-PER-TEAE variants, respectively. To assess the effectiveness of TEAE, comprehensive evaluations are performed using multiple metrics, comparing its performance against the other RLMDP methods. The simulation results demonstrate the superior efficiency of TEAE method compared to the existing MDP solving methods

MoreLess

Year of publication:	[2023]
Authors:	Wang, Xianjia ; yang, zhipeng ; Chen, Guici ; Liu, Yanli
Publisher:	[S.l.] : SSRN
Subject:	Markov-Kette \| Markov chain \| Theorie \| Theory \| Entscheidung \| Decision

Extent:	1 Online-Ressource (17 p)
Type of publication:	Book / Working Paper
Language:	English
Other identifiers:	10.2139/ssrn.4531608 [DOI]
Source:	ECONIS - Online Catalogue of the ZBW

Persistent link: https://www.econbiz.de/10014359679