-
Essay / Solving the Pole Balancing Problem with POMDP - 1002
Abstract — Partially observable Markov decision processes (POMDP) have been widely applied in areas such as robotic navigation, machine maintenance, marketing, medical diagnosis, etc. [1]. But its exact solution is inefficient both in space and time. This paper studies Smooth Partially Observable Value Approximation (SPOVA) [2], which approximates belief values by a differentiable function and then uses gradient descent to update the belief values. This POMDP approximation algorithm is applied to the pole balancing problem with regulation. Simulation results reveal that this regulated approach is capable of estimating state transition probabilities and simultaneously improving its policy.Keywords – POMDP; SPOVA; Pole Balancing.IntroductionThe Markov Decision Process (MDP) has proven to be a useful framework for solving a variety of problems in areas such as robotics, economics, and manufacturing. Unfortunately, many real-world problems cannot be modeled as MDP, especially when problem states are partially observable. Partially observable Markov decision processes (POMDP) extend the MDP framework to include states that are not fully observable. With this extension, we are able to model more practical problems, while the solution methods that exist for MDP will no longer be applicable. The computational complexity of POMDP algorithms is much higher than that of MDP. This complexity is due to the uncertainty about the actual state, which leads to a probability distribution over the states. POMDP algorithms therefore deal with probability distributions, while MDP algorithms work on a finite number of discrete states. This difference transforms an optimization problem on a discrete space into that defined ...... middle of paper ...... total reward discounted over an infinite horizon. The expected reward for policy π initialized from belief b_o is defined as J^π (b_o )=E[∑_(t=0)^∞▒β^tr(s_t,a_t )│b_o,π] ( 3) where 0, 1967.