From: Lincoln Ritter (lritter_at_cs.washington.edu)
Date: Mon Nov 24 2003 - 00:02:32 PST
Acting Optimally in Partially Observable Stochastic Domains
Cassandra, Kaelbling, Littman
Reviewed by Lincoln Ritter
Summary:
The authors reformulate MDPs in terms of belief and describe the
"witness" algorithm for value iteration for optimal behavior in
partially observable environments.
The use of piecewise linearization of continuous valued functions
seems to be a key insight in this paper. This technique allow the
authors to aim for approximations rather than exact solutions as in
previous work. As presented, it seems that this lead to a large part
of the gains in computational efficiency presented in this paper.
The concept of the policy graph struck me as particularly interesting,
especially in light of our recent reding into symbolic manipulation of
state spaces. Here, the graph nodes represent aggregations of belief
states. This leads to simple, uniform treatment of various types of
actions within the policy. but it seems that the real lesson to be
drawn, having seen this strategy be so effective in multiply areas, is
that symbolic manipulation is and representation is an exciting and
useful technique.
I have two major criticisms:
The results of this paper are not presented in any sort of
quantitative way. Even the qualitative treatment is barely present.
Barely any formal treatment of the algorithms is given. How is
someone reading this paper supposed to actually know how important the
ideas presented here are in a practical sense?
Further, it was not clear to me reading this paper what work was
actually being newly presented here. Perhaps i need to spend more
time with the references, but at each turn in this paper where a new
idea was being introduced, i was confused if the idea was being built
off those referenced or was actually their work.
I think that an interesting avenue for research would be to try to
apply some of the search techniques reviewed in last weeks paper to
POMDPs. Whether any speedups would be apreciable given the amount of
computation that must be devoted to computing conditional
probabilities and solving linear programs regardless is debatable, but
clearly it is an approach that may be fruitful.
This archive was generated by hypermail 2.1.6 : Mon Nov 24 2003 - 00:02:33 PST