From: Keith Noah Snavely (snavely_at_cs.washington.edu)
Date: Wed Nov 19 2003 - 01:09:26 PST
"Symbolic Heuristic Search for Factored Markov Decision Processes" by
Feng and Hansen describes an algorithm that combines symbolic
representation of states in an MDP with heuristic search to solve
MDPs. The authors show through experiment that in a certain class of
problems the algorithm dramatically reduces the nuumber of states that
must be enumerated to find an optimal policy.
The main contribution of this paper is the symbolic LAO* search
algorithm, which combines two threads of research: state abstraction
(accomplished using ADDs as in SPUDD) and methods for enumerating only
a subset of the states (i.e. only those reachable from the start state
by an optimal policy). In particular, I found the idea of applying
search to solving MDPs in a systematic way appealing, especially the
fact that the authors found a neat way to use the existing search
algorithm LAO*, which handles the fact that a policy can generate
cyclic behavior. The basic idea is that starting with an empty policy
and some initial state, an optimal policy is constructed incrementally
by applying an admissable heuristic to states reachable from the
initial state by the current best policy. For other algorithms such
as value iteration, on the other hand, policies are constructed to
encompass all states.
The experimental results shown seem quite promising, and the analysis
of the numbers is good. However, it would be interesting to see the
results of applying the algorithm to a variety of other MDPs,
especially since the characteristics of a general MDP are not really
clear in my own head. For instance, are most MDPs like the factory
problems, in which most states are irrelevant, or like the "a"
problems, in which many states can be visited? Since there is no
characterization of the "a" problems, it seems that they are more
artificial than the factory examples, but it is impossible to say.
This paper left me wondering whether there are other ways to reduce
the number of states that must be considered in solving an MDP. Also,
I wonder how well the techniques presented in this paper and the last
carry over to partially-observable MDPs, where the state space is not
only huge but continuous.
This archive was generated by hypermail 2.1.6 : Wed Nov 19 2003 - 01:09:27 PST