From: Lillie Kittredge (kittredl_at_u.washington.edu)
Date: Mon Nov 24 2003 - 09:49:57 PST
/Acting Optimally in Partially Observable Stochastic Domains/ by
Anthony R. Cassandra, Leslie Pack Kaelbling and Michael L. Littman
The authors present a new algorithm for constructing optimal policies in
partially observable MDPs.
The first main idea is that partially observable MDPs can be redefined
as a regular MDP in state space made of beliefs. The problem with this
is that such a space is continuous, which leads to their second main
idea, the Witness algorithm for dealing with the space. The algorithm
starts with a course-grained view of the space and iteratively refines
it, approximating the optimal policy.
I'm not entirely clear what the original contribution of this paper is,
as the Witness algorithm is credited to "Cassandra, Kaelbling & Littman,
1994", which seems to be the authors themselves. Is this just a rehash
of another of their own papers? That would explain the flaw everybody
else is pointing out, that the algorithm is poorly explained. Other
than that same flaw, I have no problem with this paper. I rather like
the example with the tigers.
Future research, as pointed out by the authors, will be to work on
policy iteration, rather than value. Also they mention using this to
solve real world problems - I'd be interested to see some actual
physical example. Give me a robot using this algorithm, and then I'll
be impressed.
This archive was generated by hypermail 2.1.6 : Mon Nov 24 2003 - 09:49:14 PST