Acting Optimally in Partially Observable Stochastic Domains - Cassandra et. al.

From: Lucas Kreger-Stickles (lucasks_at_cs.washington.edu)
Date: Mon Nov 24 2003 - 10:49:34 PST

  • Next message: Jessica Kristan Miller: "POMDP paper review"

    The authors present an overview of the problem of constructing optimal
    policies for partially observable stochastic domains and indtroduce the
    Witness algorithm which they claim is more efficent than others that
    attempt to takle the same problem.

    One of the major ideas of the paper is that one cannot simply take the
    set of observations and have that be the set of states. The authors
    point out that since two very different states could appear the same to
    the agent a policy with regard to suych a state could perform
    arbitrarily poorly. Instead, the authors introduce the idea of a
    'belief state' which maintiains a vector indicating which the
    probability that it is in any one of the potential states. This allows
    them to introduce a state estimator to their system, which continually
    updates the state the agent believes it is in, based on what states it
    may have been in before and which actions it took. This seems like a
    very important idea to me in that it better represents all the
    information that the agent has about what state it is in and I presume
    it will therefore perform better.

    The other major idea of their paper is their Witness algorithm which the
    is garunteed not to be exponential is the number of vectors to consider
    is not. The authors also contend that such a garauntee is unique to
    their algorithm.

    Overall I thought that the paper was quite readable and consise. While
    normally I would critique the paper for testing the algorithm on such
    small problems the paper presents the results are preliminary and
    indicuate that this paper and research is exploritory.

    In terms of future work, it occurs to me that inorder to solve
    stochastic planning problems with any of the methods we have seen so far
    we must be able to define the probabliities for each of the states and
    state transitions. It would be insteeresting to construct agents that
    attempted to accomplish goals but which were confronted with unknown
    probabilities for state transisitions and which could explore to
    discover those probabilities, continually refining its policy based on
    its evolving understanding of the state transition and reward funtions.


  • Next message: Jessica Kristan Miller: "POMDP paper review"

    This archive was generated by hypermail 2.1.6 : Mon Nov 24 2003 - 10:49:34 PST