Review: "Symbolic Heuristic Search for Factored Markov Decision Processes"

From: Keith Noah Snavely (snavely_at_cs.washington.edu)
Date: Wed Nov 19 2003 - 01:09:26 PST

  • Next message: Karthik Gopalratnam: "LAO* Review"

    "Symbolic Heuristic Search for Factored Markov Decision Processes" by
    Feng and Hansen describes an algorithm that combines symbolic
    representation of states in an MDP with heuristic search to solve
    MDPs. The authors show through experiment that in a certain class of
    problems the algorithm dramatically reduces the nuumber of states that
    must be enumerated to find an optimal policy.

    The main contribution of this paper is the symbolic LAO* search
    algorithm, which combines two threads of research: state abstraction
    (accomplished using ADDs as in SPUDD) and methods for enumerating only
    a subset of the states (i.e. only those reachable from the start state
    by an optimal policy). In particular, I found the idea of applying
    search to solving MDPs in a systematic way appealing, especially the
    fact that the authors found a neat way to use the existing search
    algorithm LAO*, which handles the fact that a policy can generate
    cyclic behavior. The basic idea is that starting with an empty policy
    and some initial state, an optimal policy is constructed incrementally
    by applying an admissable heuristic to states reachable from the
    initial state by the current best policy. For other algorithms such
    as value iteration, on the other hand, policies are constructed to
    encompass all states.

    The experimental results shown seem quite promising, and the analysis
    of the numbers is good. However, it would be interesting to see the
    results of applying the algorithm to a variety of other MDPs,
    especially since the characteristics of a general MDP are not really
    clear in my own head. For instance, are most MDPs like the factory
    problems, in which most states are irrelevant, or like the "a"
    problems, in which many states can be visited? Since there is no
    characterization of the "a" problems, it seems that they are more
    artificial than the factory examples, but it is impossible to say.

    This paper left me wondering whether there are other ways to reduce
    the number of states that must be considered in solving an MDP. Also,
    I wonder how well the techniques presented in this paper and the last
    carry over to partially-observable MDPs, where the state space is not
    only huge but continuous.


  • Next message: Karthik Gopalratnam: "LAO* Review"

    This archive was generated by hypermail 2.1.6 : Wed Nov 19 2003 - 01:09:27 PST