POMDP Review

From: Karthik Gopalratnam (karthikg_at_cs.washington.edu)
Date: Mon Nov 24 2003 - 02:39:04 PST

  • Next message: Lillie Kittredge: "paper 6"

    "Acting Optimally in Partially Observable Stochastic Domains": Cassandra,
    Kaelbling & Littman

    Reviewer: Karthik Gopalratnam

            This paper presents the formalism of using POMDP's for Planning in
    Stochastic, Partially Observable domains. It draws on previous results from
    the Operations Research world and presents a computationally more efficient
    algorithm to compute the optimal policy.
            The fundamental contribution of this paper is the use of belief
    states to represent the partially observable world model, and the use of MDP
    methods - specifically Value iteration, to search within this continuous
    belief space. This space is approximated to arbitrary accuracy by the
    Witness algorithm, which represents the expected discounted value of a
    belief state at a given time step by a subset of all states, given the
    approximation at the previous time step. The witness algorithm is based on
    linear programming, and for reasonable approximations, the authors claim, is
    guaranteed to be non-exponential if the number of approximation vectors used
    to partition is not exponential.
            The witness algorithm, is presented in only the barest detail, and
    the true import of its contribution is not appreciable from the authors'
    description. (No doubt the technical report that they refer to gives more
    details.) Also, the claim that the algorithm is not exponential if the
    number of vectors used to partition the belief space is not exponential, is
    a little hard to see given the sketchiness of the conference paper. A brief
    discussion of why this is so, with some comparison with other POMDP
    algorithms would have been more informative.
            The natural direction for future research (most likely already
    explored) as suggested by the authors themselves, to extend this algorithm
    to policy iteration, and then apply this to larger and more general planning
    problems.


  • Next message: Lillie Kittredge: "paper 6"

    This archive was generated by hypermail 2.1.6 : Mon Nov 24 2003 - 02:38:22 PST