From: Karthik Gopalratnam (karthikg_at_cs.washington.edu)
Date: Mon Nov 24 2003 - 02:39:04 PST
"Acting Optimally in Partially Observable Stochastic Domains": Cassandra,
Kaelbling & Littman
Reviewer: Karthik Gopalratnam
This paper presents the formalism of using POMDP's for Planning in
Stochastic, Partially Observable domains. It draws on previous results from
the Operations Research world and presents a computationally more efficient
algorithm to compute the optimal policy.
The fundamental contribution of this paper is the use of belief
states to represent the partially observable world model, and the use of MDP
methods - specifically Value iteration, to search within this continuous
belief space. This space is approximated to arbitrary accuracy by the
Witness algorithm, which represents the expected discounted value of a
belief state at a given time step by a subset of all states, given the
approximation at the previous time step. The witness algorithm is based on
linear programming, and for reasonable approximations, the authors claim, is
guaranteed to be non-exponential if the number of approximation vectors used
to partition is not exponential.
The witness algorithm, is presented in only the barest detail, and
the true import of its contribution is not appreciable from the authors'
description. (No doubt the technical report that they refer to gives more
details.) Also, the claim that the algorithm is not exponential if the
number of vectors used to partition the belief space is not exponential, is
a little hard to see given the sketchiness of the conference paper. A brief
discussion of why this is so, with some comparison with other POMDP
algorithms would have been more informative.
The natural direction for future research (most likely already
explored) as suggested by the authors themselves, to extend this algorithm
to policy iteration, and then apply this to larger and more general planning
problems.
This archive was generated by hypermail 2.1.6 : Mon Nov 24 2003 - 02:38:22 PST