Acting Optimally in Partially Observable Stochastic Domains

From: Alan L. Liu (aliu_at_cs.washington.edu)
Date: Mon Nov 24 2003 - 02:38:01 PST

  • Next message: Karthik Gopalratnam: "POMDP Review"

    Paper reviewed: Acting Optimally in Partially Observable Stochastic Domains
    Authors: Anthony Cassandra, Kaelbling and Littman

    Summary:
    This paper gives an overview of MDPs, POMDPS, and their Witness
    Algorithm for approximating the value function.

    Big ideas:
    POMDPs are MDPs over belief states instead of regular states, because
    actions deterministically move from one belief state to another.

    The Witness Algorithm they propose, which when used with value iteration
    can generate arbitrarily close to optimal policies. It finds a
    "witness," which is an example which shows that the current approximate
    value function differs from the optimal value function given the values
    approximated in the last iteration.

    Flaw:
    The paper mentions that the Witness algorithm is guaranteed to be
    non-exponential when the number of belief states is not exponential.
    However, there's no mention of what kinds of cases should require
    exponential belief states, and what kinds don't, or whether there are
    useful problems that do require exponential number of belief states.

    Open research questions:
    The authors mentioned trying this on real problems. It would be
    interesting to see how large the POMDP can be such that this algorithm
    can still produce a good, approximate policy within a reasonable amount
    time.

    One recurring theme is that it would be nice to have a corpus of
    problems to use for comparing algorithms (in this case, a set of POMDPs).


  • Next message: Karthik Gopalratnam: "POMDP Review"

    This archive was generated by hypermail 2.1.6 : Mon Nov 24 2003 - 02:38:01 PST