SPUDD

From: Daniel J. Klein (djklein_at_u.washington.edu)
Date: Mon Nov 17 2003 - 10:41:40 PST

  • Next message: Lucas Kreger-Stickles: "SPUDD Review"

    Title: SPUDD: Stochastic Planning using Decision Diagrams
    Authors: Jesse Hoey, Robert St-Aubin, Alan Hu, and Craig Boutilier

    Summary:

    The authors present a MDP value iteration algorithm derived from
    structured policy iteration (SPI) that uses algebraic decision diagrams
    (ADDs) to represent value functions/policies.

    Two Most Important Ideas:

    The main focus of the paper is on the ADD representation of MDPs. ADDs
    are a generalization of ordered binary decision diagrams (BDD) that
    allow real valued functions to be evaluated from a tree-like structure.
    ADDs take advantage of naturally occurring repetition to create a more
    compact isomorphic representation. This smaller representation allows
    for fewer MDP expected value computations and therefore increased
    processing speed.

    A second main idea presented in the paper is optimization of the ADD
    representation. The authors directly say that the basic SPUDD algorithm
    "suffers from certain practical difficulties" that can be resolved by
    optimization. Although more questions than answers are presented in the
    paper about optimization, the authors still have a good point in that
    there are possible methods that could be used to optimize the use of
    ADDs for MDP value iteration. Two optimization techniques presented are
    dynamic variable ordering and pre-computation of the joint probability
    distribution. The authors go further by presenting a "tuning knob" that
    allows for adjustment between space and time constraints.

    Largest Flaws:

    Hmm...where to begin.
    1. The example used to illustrate the ADD representation (the bolt
    thing) is used throughout the paper, but not described until the
    conclusion. Without first reading the paper they referenced, I had no
    clue what PL, ADR, BO, etc represented. This clouded my understanding
    of the ADD representation of MDPs.

    2. Why is the ADD representation based on and only compared to SPI when
    the authors clearly state in the conclusion that dynamic programming
    algorithms, such as modified policy iteration, are generally considered
    to converge more quickly than value iteration in practice.

    Open research questions:

    The authors optimization section reads like an open research section.
    One idea they mention is dynamic variable ordering. It is well known
    that choosing a good variable ordering heuristic can save a significant
    amount of processing time. As another research question, I would like
    to see the "turning knob" investigated further and compared to variable
    elimination algorithms.

    Dan's overall paper-o-meter rating: 6.5/10


  • Next message: Lucas Kreger-Stickles: "SPUDD Review"

    This archive was generated by hypermail 2.1.6 : Mon Nov 17 2003 - 10:42:40 PST