From: Xu Miao (xm_at_u.washington.edu)
Date: Mon Nov 24 2003 - 10:50:05 PST
Title: Acting Optimally in Partially Observable Stochastic Domains
Authors: Anthony R. Cassandra, Leslie Pack Kaelbling and Michael L. Littman
Summary: This paper describes the POMDP and a new algorithm to approximately
find the optimal solution.
Ideas:
1. The paper embeds the Partial Observability to the MDPs model, and
use a convex piecewise-linear function to represetns the value function. And
then develops a Witness Algorithm to find the approximate optimal solution
which can be arbitarily close to the optimal solution of the value iteration
function.
2. After the value iteration fucntion, the authors use a policy
graph to represents the policy, which is generated from the partition of the
belief space defined by the solution of the value function.
Flaws:
1. The idea of partition of the belief space is very impressive, but
if the authors can give some brief proof and the detailed algorithm of
construction, it will be more complete.
2. The result part is too short and not convinced, because there is
too few description of what kind of problems they solved and no comparison.
Open research:
1. Adding policy iteration algorithm. Although the policy graph is
simple and effective on some small problem, it coulb be very large on some
big problem. So maybe adding policy iteration algorithm can solve the
problem effectively.
2. Maybe find a way to find approximately optimal policies by
searching only part of the space by some methods, for example, Dean et al.'
methods mentioned by the authors, or LAO* etc.
This archive was generated by hypermail 2.1.6 : Mon Nov 24 2003 - 10:49:22 PST