From: Lucas Kreger-Stickles (lucasks_at_cs.washington.edu)
Date: Mon Nov 24 2003 - 10:49:34 PST
The authors present an overview of the problem of constructing optimal
policies for partially observable stochastic domains and indtroduce the
Witness algorithm which they claim is more efficent than others that
attempt to takle the same problem.
One of the major ideas of the paper is that one cannot simply take the
set of observations and have that be the set of states. The authors
point out that since two very different states could appear the same to
the agent a policy with regard to suych a state could perform
arbitrarily poorly. Instead, the authors introduce the idea of a
'belief state' which maintiains a vector indicating which the
probability that it is in any one of the potential states. This allows
them to introduce a state estimator to their system, which continually
updates the state the agent believes it is in, based on what states it
may have been in before and which actions it took. This seems like a
very important idea to me in that it better represents all the
information that the agent has about what state it is in and I presume
it will therefore perform better.
The other major idea of their paper is their Witness algorithm which the
is garunteed not to be exponential is the number of vectors to consider
is not. The authors also contend that such a garauntee is unique to
their algorithm.
Overall I thought that the paper was quite readable and consise. While
normally I would critique the paper for testing the algorithm on such
small problems the paper presents the results are preliminary and
indicuate that this paper and research is exploritory.
In terms of future work, it occurs to me that inorder to solve
stochastic planning problems with any of the methods we have seen so far
we must be able to define the probabliities for each of the states and
state transitions. It would be insteeresting to construct agents that
attempted to accomplish goals but which were confronted with unknown
probabilities for state transisitions and which could explore to
discover those probabilities, continually refining its policy based on
its evolving understanding of the state transition and reward funtions.
This archive was generated by hypermail 2.1.6 : Mon Nov 24 2003 - 10:49:34 PST