From: Patrick Haluptzok (patrickh_at_windows.microsoft.com)
Date: Sun Nov 23 2003 - 14:18:20 PST
Paper Review #6 - by Patrick Haluptzok - CSE 573 Fall 2003
Paper title/Author:
Acting Optimally in Partially Observable Stochastic Domains
By Anthony Cassandra, Leslie Kaelbling, and Michael Littman
One-line summary:
The paper describes the "Witness Algorithm" which is a more efficient procedure for finding a near optimal policy for a POMDP.
The (two) most important ideas in the paper, and why:
They redefine the POMDP problem into a continuous MDP problem and then apply value iteration to solve the continuous MDP. They explain how solving the POMDP by taking the action with the highest expected value function using the value function from the observable MDP solution for each state weighted by the belief state distribution is incorrect because that would be solving the problem assuming everything was observable afterwards. This insight motivates why their approach is necessary - that the continuous MDP problem represents the continuing uncertainty after an action - and thus will find a different and more optimal policy.
Because the value function is convex they approximate it with a set of vectors that when you compute the dot product of the belief state with the set of vectors the maximal value found is used for the value function. They use the fact that belief state space is partitioned where all points in a partition share the same action - so by determining what the best action is for a specific vector used to represent the value function, any belief state that has maximal dot product with that vector is also in the same partition and has the same optimal action upon the algorithm's convergence. They eliminate the exponential increase in vectors needed to represent value functions by allowing a delta of approximation error - allowing larger problems to be solved.
The one or two largest flaws in the paper:
I found the explanation of building up the vectors to approximate the value function tough to follow - I guess they want you to read the cited papers for the approach.
The transformation of the set of vectors for a value function to a policy graph was sort of left as an exercise to the reader.
Identify two important, open research questions on the topic, and why they matter:
The problems they tried sounded very small - how well would this scale up to larger problems - since the real world is full of large problems.
In a problem where you don't have a model and you are building up the model for state transitions as you go - will this converge? Many real world problems may start where you don't have a complete model.
This archive was generated by hypermail 2.1.6 : Sun Nov 23 2003 - 14:18:19 PST