Symbolic Heuristic Search for Factored Markov Decision Processes

From: Daniel J. Klein (djklein_at_aa.washington.edu)
Date: Wed Nov 19 2003 - 10:08:05 PST

  • Next message: Lucas Kreger-Stickles: "Review: Symbolic Huerisitic Search for Factored MDPs"

    Title: Symbolic Heuristic Search for Factored Markov Decision Processes
    Authors: Zhengzhu Feng, Eric A. Hansen

    Summary: The authors present an enhancement to SUPDD that solves MDPs efficiently using masked ADDs and a heuristic.]

    The Two Most Important Ideas:

    The most important ideas presented in this paper are the use of a heuristic and reachability analysis to focus the search of SPUDD-like MDP value iteration. Reachability analysis is accomplished using a simple masking process on decision diagrams. The mask is a mathematical projection that limits the range of all possible actions from a particular state to reachable states. States that are not reachable are in the null space of the masking projection operator.

    The authors are very smart and use the projection iteratively on the path of an optimal partial policy. As opposed to considering each state individually, the authors associate each action with the set of states for which the action is appropriate under the current partial policy. Essentially, they partition the space (again, in a mathematical sense) allowing the reachability analysis to be performed for an entire set of actions at once.

    Another main idea presented in the paper is the use of an admissible heuristic on dynamic programming to further guide the search. The authors employ symbolic LAO*, an extension of classical AO* that works on cycles - a necessary requirement to solve MDPs iteratively. LAO* involves a two step process of partial solution expansion followed by dynamic programming. The process is terminated when the iterative solution converges (non-uniformly). LAO* is referred to as 'symbolic' because it manipulates sets of states as opposed to individual states. The authors suggest a number of admissible heuristics for the dynamic programming step of LAO* and show how the use of these heuristics, used in combination with masking, results in faster convergence times than simple SPUDD.

    One Flaw:

    This is a well written paper. Each decision made by the authors is explained clearly and supported by pervious work and mathematical foundations of operator theory. I did, however, notice one possible flaw with their solution. The LAO* process does not converge uniformly, meaning that the solution time is a function of the initial states. In order to get meaningful results, the algorithm is averaged over ~10 random starting points. However!! The authors never say a worst case! The average number they present could be misleading...they should have presented either a standard deviation or the best and worst convergence times for the random initial conditions.

    Future Research Questions:

    The authors have made significant advances in MPD value iteration by incorporating some basic mathematical operator theory. As one research question, I wonder if there might be more operator theory or system theory that could be incorporated to yield an even faster convergence. One open research question posed by the authors at the end of the conclusion deals with the integration of multiple strategies. I believe this is an excellent idea and a good future research question because incorporating other ideas can only further reduce the solution time (provided the extra computation isn't a huge burden).


  • Next message: Lucas Kreger-Stickles: "Review: Symbolic Huerisitic Search for Factored MDPs"

    This archive was generated by hypermail 2.1.6 : Wed Nov 19 2003 - 10:09:55 PST