Review: "Acting Optimally in Partially Observable Stochastic Domains"

From: Keith Noah Snavely (snavely_at_cs.washington.edu)
Date: Mon Nov 24 2003 - 01:36:54 PST

  • Next message: Alan L. Liu: "Acting Optimally in Partially Observable Stochastic Domains"

    "Acting Optimally in Paritally Observable Stochastic Domains", by
    Cassandra et al. describes a new algorithm to find approximate
    solutions to partially-observable MDPs.

    One major contribution of this paper is the Witness algorithm. Like
    other algorithms, Witness approximates the value function of a POMDP
    as a piecewise-linear convex function, which can be represented by a
    set of vectors, and creates an incrementally better approximation by
    adding vectors to the set. The improvement of Witness over these
    other algorithms seems to be in determining the vector (if any) that
    should be added to the set. This problem is formulated as a linear
    program, which produces approximate solutions quickly. The final set
    of vectors partitions the belief space and a policy graph is created
    using these partitions as nodes (I wasn't sure if this is an original
    contribution, but it is a neat idea anyway, and seems similar in a way
    to the use of ADDs in SPUDD). The authors include some examples that
    explain the intuition behind why the structure of problems often
    result in the belief state space forming a relatively small number of
    partitions, showing the use of this approach.

    The major weakness of this paper is just the sketchiness of some of
    the details of this work and the work that it builds upon. The paper
    claims that in theory and practice their algorithm is faster than its
    predecessors, but the authors do not show why. For instance, it would
    have been nice to see a comparison with the linear support algorithm
    of Cheng's that is mentioned. The paper also mentions a heuristic for
    creating the policy graph from the final set of vectors, but does not
    expand on it. Another complaint that is more or less invariant over
    the MDP papers we have read is with the treatment of experiments; the
    problems that the algorithm was run on were not characterized in any
    sense, and the results of the experiments were spotty. This probably
    points to a lack of a good set of standard benchmarks for these
    problems, for which the authors cannot really be blamed. From the
    results given, however, it is evident that even with the improvements
    made in this work it is not yet practical to solve large POMDPs.

    I think that the most interesting future work in this area is in
    finding ways of either a) improving the representation of the value
    function or otherwise refine the algorithm to make it faster, or b)
    approximating solutions to POMDPs intelligently (without too much
    decrease in quality). Another really good thing would be to build a
    good test suite of representative POMDPs for testing and comparing
    different algorithms.


  • Next message: Alan L. Liu: "Acting Optimally in Partially Observable Stochastic Domains"

    This archive was generated by hypermail 2.1.6 : Mon Nov 24 2003 - 01:36:55 PST