From: Alan L. Liu (aliu_at_cs.washington.edu)
Date: Mon Nov 24 2003 - 02:38:01 PST
Paper reviewed: Acting Optimally in Partially Observable Stochastic Domains
Authors: Anthony Cassandra, Kaelbling and Littman
Summary:
This paper gives an overview of MDPs, POMDPS, and their Witness
Algorithm for approximating the value function.
Big ideas:
POMDPs are MDPs over belief states instead of regular states, because
actions deterministically move from one belief state to another.
The Witness Algorithm they propose, which when used with value iteration
can generate arbitrarily close to optimal policies. It finds a
"witness," which is an example which shows that the current approximate
value function differs from the optimal value function given the values
approximated in the last iteration.
Flaw:
The paper mentions that the Witness algorithm is guaranteed to be
non-exponential when the number of belief states is not exponential.
However, there's no mention of what kinds of cases should require
exponential belief states, and what kinds don't, or whether there are
useful problems that do require exponential number of belief states.
Open research questions:
The authors mentioned trying this on real problems. It would be
interesting to see how large the POMDP can be such that this algorithm
can still produce a good, approximate policy within a reasonable amount
time.
One recurring theme is that it would be nice to have a corpus of
problems to use for comparing algorithms (in this case, a set of POMDPs).
This archive was generated by hypermail 2.1.6 : Mon Nov 24 2003 - 02:38:01 PST