SPUDD

From: MAUSAM (mausam_at_cs.washington.edu)
Date: Fri May 02 2003 - 10:32:59 PDT

  • Next message: Parag: "SPUDD review"

    SPUDD : Stochastic Planning using Decision Diagrams
    Hoey, St-Aubin, Hu & Boutilier

    This paper presents a symbolic algorithm to do value iteration over MDPs
    with full observability in an infinite horizon discounted reward model.

    Structured policy iteration using decision trees had already been
    developed. However, the authors realised that there is a lot of
    duplication in such trees, and an acyclic structure like decision diagram
    could provide good speed up. In this paper, the authors motivated this
    idea and provided experimental results to back them up.

    ADDs can represent any function from set of states to an algebraic number.
    The authors used this fact, to represent the reward model, the current
    value function, etc in terms of ADDs and defined one whole iteration of
    value iteration in terms of operations on these ADDs. Thus, the explicit
    state space was never enumerated and there were speedups exploiting the
    structure of the problem.

    The experimental section was a well thought after section. The authors
    understood that due to different implementation behaviours, it will be
    unrealistic to compare SPUDD with SPI in running times. So, the authors
    compared the two algorithms on the basis of the size of graphs generated.
    They showed massive reductions in the number of nodes encountered in
    SPUDD.

    They also provided the best case and worst case analysis of SPUDD to show
    that the algorithm may not perform better always. However, their worst
    case graph (where time should be exponential in the number of state
    variables) looks linear. It is probably a small portion of a big
    exponential curve. However, I would have liked it being exponential.

    It would have been nicer had they did some experiments in the space time
    trade-offs in the subset-splitting optimisation that they talked about.
    Though, they themselves assign this for future work.

    Also, most of their experiments are in the factory domain. Since SPUDD is
    so much dependent on the problems involving some inherent structure, it
    would have been nicer to take some other real world problems and shown
    similar results, comprehensively establishing that most real world
    problems have structure which can be utilized in SPUDD.

    One major drawback of this study is the absence of a formal model of
    "structure in the problem". The paper shows that practically there is
    speed up and there is a pathological case. However, one could go further,
    and formalise the notion of structure in problems, and then find some real
    complexity results from there. Of course, this seems like a non-trivial
    task to me.

    One obvious future work is to try policy iteration in place of value
    iteration since it is known to converge faster. Even the authors mention
    this in their list of future works. Since policy can also be represented
    as ADDs, I think that it should be doable in such a representation.
    However, no future papers from authors or others suggest this, so may be
    there is a catch or no one has tried this as yet.

    A further question is can we handle larger problems by approximating the
    ADDs, something similar to the way approximating value trees is done.

    Another question to ask is in what other domains can we apply similar
    ideas. For example, can we utilize this to do better in the POMDP domains,
    or in domains where there are continuous state variables (using ADDs
    with continuous variable internal nodes having multiple children for
    different intervals) etc.


  • Next message: Parag: "SPUDD review"

    This archive was generated by hypermail 2.1.6 : Fri May 02 2003 - 10:33:00 PDT