POMDP Paper Review

From: Julie Letchner (letchner_at_cs.washington.edu)
Date: Sun Nov 23 2003 - 18:59:19 PST

  • Next message: Raphael Hoffmann: ""Acting Optimally in Partially Observable Stochastic Domains""

             This paper presents a value-iteration "witness" algorithm over an
    MDP of continuous belief states that encapsulate an underlying POMDP
    defined over finite world states.
             One fundamental contribution of this paper is the use of belief
    states as a mechanism for formulating POMDPs as fully observable (though
    continuous) MDPs. This representation fully encapsulates the POMDP problem
    in that an optimal solution to the belief state problem will be an optimal
    solution to the underlying POMDP. This is important, since even the best
    algorithms can produce solutions only as good as the problem formulation
    they work upon.
             The major contribution that forms the focus of this paper is the
    "witness" algorithm that copes with the continuous-spaced belief state by
    iteratively refining a partitioning of this space into regions that share
    the same optimal value and optimal choice of action. I like this algorithm
    for its cleanliness and relative simplicity, but--no thanks to the
    authors--I have little knowledge of other belief-state algorithms to use as
    a basis for comparison.
    On a related vein, evaluation of the witness algorithm is nonexistent
    here. The authors state that its advantage is that it does not require
    exponential time to run if the number of vectors required to partition the
    belief space is not exponential--meaning that the running time is
    proportional to the number of vectors required. It is not clear to me how
    this property is not met by the existing POMDP algorithms mentioned (but
    not named or explained) by the authors. Despite the fact that it's clear
    the authors have a good idea, an explicit high-level comparison with other
    algorithms or a concrete comparison of their respective running times on
    various problems would go a long way in improving the credibility of their
    paper.
             Another evaluation I would have liked to see is
    time-to-convergence analysis for the witness algorithm and/or heuristics
    for choosing the value of delta that determines termination. The
    practicality of this algorithm will depend heavily on the convergence
    characteristics. How frequently are these POMDP problems solvable with a
    non-exponential number of vectors?
             Areas for future work are consideration of the questions above, as
    well as concrete evaluation of the witness algorithm on non-trivial
    problems and with respect to existing techniques.

    ~*~*~*~*~*~*~*~*~*~*~
    Julie Letchner
    (206) 890-8733
    University of Washington
    Computer Science & Engineering
    letchner_at_cs.washington.edu


  • Next message: Raphael Hoffmann: ""Acting Optimally in Partially Observable Stochastic Domains""

    This archive was generated by hypermail 2.1.6 : Sun Nov 23 2003 - 18:59:45 PST