POMDP Review

From: Bhushan Mandhani (bhushan_at_cs.washington.edu)
Date: Mon Nov 24 2003 - 10:07:28 PST

  • Next message: Russell Power: "Acting Optimally in Partially Observable Stochastic Domains"

    Title: Acting Optimally in Partially Observable Stochastic Domains

    Authors: A. Cassandra, L. Kaelbling and M. Littman

    Summary: Solving POMDP's is difficult because it is like solving MDP's
    over a state space which is continuous and high-dimensional. The paper
    combines value iteration with a linear programming based "Witness
    Algorithm" to solve a POMDP.
     
    Main Ideas:

    1. The optimal value function for the belief states can be computed by
    finding a set V of |S|-dimensional vectors such that given V, the optimal
    value for any belief state can be easily found by taking the maximum of
    the dot products with elements of V. The main idea in the paper is to
    compute V efficiently by adding elements to it only if it is required. The
    witness algorithm returns a belief state b for which an element needs to
    be added to V, otherwise it fails. The idea is, if size of V is not
    exponential, the running time won't be exponential either.

    2. The set of vectors in V partition the belief space into regions such
    that each region has a common optimal action. This allows the construction
    of policy graphs.

    Flaws:

    I don't see any flaws in the paper. However, the paper doesn't say much.
    All it talks about is a technique to speed up the computation of the
    vector set V. That such a V exists was known beforehand. Also, the size of
    the test data used was small.

    Future Work:

    1. Performing policy iteration insead of value iteration.

    2. Using approximate methods to further speed up the algorithm instead of
    doing it analytically the way they are doing now.


  • Next message: Russell Power: "Acting Optimally in Partially Observable Stochastic Domains"

    This archive was generated by hypermail 2.1.6 : Mon Nov 24 2003 - 10:07:29 PST