Acting Optimally in Partially Observable Stochastic Domains

From: Danny Wyatt (danny_at_cs.washington.edu)
Date: Sun Nov 23 2003 - 22:21:56 PST

  • Next message: Masaharu Kobashi: "Review of Paper 6"

    Acting Optimally in Partially Observable Stochastic Domains
    Anthony R. Cassandra, Leslie Pack Kaelbling, Michael L. Littman

    Summary
    The authors describe how a POMDP over environment state space can be
    recast as an MDP over belief state space, they present a form of value
    iteration that partitions continuous belief space into sets of belief
    states that have identical optimal actions, and they show how these
    partitions can be represented as a policy graph.

    Important Ideas
    Searching through belief space returns the problem to full
    observability, since beliefs are always known. Rewarding belief states
    also has the benefit of collapsing environment-based rewards with
    information-gathering rewards, so the same policy can weight the two
    together.
    Approximating an optimal value function with vectors partitions belief
    space into sets that share an optimal action, and often transition to
    the same next belief set after that action. This allows the entire
    policy to be represented with a small finite automaton that no longer
    even needs to consider state probabilities based on observations.

    Flaws
    I couldn't follow all the summary details of how their Witness algorithm
    works with value iteration, but that did not seem to be the main point
    of the paper.

    Open Questions
    Personally, I would like to hear more about any work towards some kind
    of "unified theory of MDPs" that considers all of the different
    techniques we've learned about so far. (Of course, the authors of this
    paper can't be blamed for not predicting the later papers we've read.)


  • Next message: Masaharu Kobashi: "Review of Paper 6"

    This archive was generated by hypermail 2.1.6 : Sun Nov 23 2003 - 22:21:17 PST