From: Bhushan Mandhani (bhushan_at_cs.washington.edu)
Date: Mon Nov 24 2003 - 10:07:28 PST
Title: Acting Optimally in Partially Observable Stochastic Domains
Authors: A. Cassandra, L. Kaelbling and M. Littman
Summary: Solving POMDP's is difficult because it is like solving MDP's
over a state space which is continuous and high-dimensional. The paper
combines value iteration with a linear programming based "Witness
Algorithm" to solve a POMDP.
Main Ideas:
1. The optimal value function for the belief states can be computed by
finding a set V of |S|-dimensional vectors such that given V, the optimal
value for any belief state can be easily found by taking the maximum of
the dot products with elements of V. The main idea in the paper is to
compute V efficiently by adding elements to it only if it is required. The
witness algorithm returns a belief state b for which an element needs to
be added to V, otherwise it fails. The idea is, if size of V is not
exponential, the running time won't be exponential either.
2. The set of vectors in V partition the belief space into regions such
that each region has a common optimal action. This allows the construction
of policy graphs.
Flaws:
I don't see any flaws in the paper. However, the paper doesn't say much.
All it talks about is a technique to speed up the computation of the
vector set V. That such a V exists was known beforehand. Also, the size of
the test data used was small.
Future Work:
1. Performing policy iteration insead of value iteration.
2. Using approximate methods to further speed up the algorithm instead of
doing it analytically the way they are doing now.
This archive was generated by hypermail 2.1.6 : Mon Nov 24 2003 - 10:07:29 PST