|
CMU-CS-05-127
Computer Science Department
School of Computer Science, Carnegie Mellon University
CMU-CS-05-127
H. Brendan McMahan, Geoffrey J. Gordon
May 2005
CMU-CS-05-127.ps
CMU-CS-05-127.pdf
Keywords: Planning under uncertainty, Markov Decision Processes,
Prioritized Sweeping, Dijsktra s Algorithm, Gaussian Elimination,
Value Iteration, Improved Prioritized Sweeping (IPS), Prioritized
Policy Iteration (PPI), Gauss-Dijsktra Elimination (GDE)
We study the problem of computing the optimal value function for a
Markov decision process with positive costs. Computing this function
quickly and accurately is a basic step in many schemes for deciding
how to act in stochastic environments. There are efficient algorithms
which compute value functions for special types of MDPs: for
deterministic MDPs with S states and A actions, Dijkstra's
algorithm runs in time O(AS log S). And, in single-action MDPs
(Markov chains), standard linear-algebraic algorithms find the value
function in time O(S3), or faster by taking
advantage of sparsity or good conditioning. Algorithms for solving
general MDPs can take much longer: we are not aware of any speed
guarantees better than those for comparably-sized linear programs.
We present a family of algorithms which reduce to Dijkstra's algorithm
when applied to deterministic MDPs, and to standard techniques for
solving linear equations when applied to Markov chains. More
importantly, we demonstrate experimentally that these algorithms
perform well when applied to MDPs which "almost" have the required
special structure.
24 pages
|