From: Nan Li (annli_at_cs.washington.edu)
Date: Tue Apr 22 2003 - 04:41:17 PDT
Planning with Incomplete Information as Heuristic Search in Belief Space /
B. Bonet, H. Geffner
This paper makes explicit the formulation of conformant planning and
contigent planning as heuristic search in belief space, provides
practial search algorithms for different cases, and tests them over a
number of domains.
The paper first analyses the mathematical models underlying three
different planning tasks: classical planning, conformant planning, and
contigent planning. It turns out all the three cases can be defined in a
general formulation, a heuristic search problem in belief space. The idea
behind might not be novel, but the authors efforts surely provide a neat
result. Note that the concept "belief space", which is extended from
"state space" in classical planning. It can be understood as a structured
set of state spaces. This extension makes it possible to describe actions
as transformations of states, even in non-deterministic and probabilistic
domains.
Then for each case, the paper suggests practical search algorithms to
implement the heuristic search. In the most difficulty case, contingent
planning, the authors suggest an approximat algorithm, for the sake of
time and memory costs. The basic of real-time dynamic programming is, it's
hopefully that using greedy policy possibly reduce search cost, when the
heuristic function is close enough to the optimal cost function, which is
improved iteratively by dynamic programming. Intuitively, compared with
the optimal heuristic AND/OR graph search algorithms, the RTDP mighte be
more effecient, since it updates ( and hopefully improves )its heuristic
function; but it doesn't maintain a real cost for explored path, it can
not be theoretically optimal. Though the authors claim that within certain
limitations the algorithm will eventually terminate and approach the
optimal solution.
One thing I expect yet can not find the answer is, when the authors
compare the empirical results of applying heuristic function with those
of merely bf-search, they say that there is no notable difference, but
they don't explain it. Is it because of the choice of heuristic function,
or it's inherited in the RTDP algorithm? It would have been an interesting
discussion.
One open problem comes from a limitation of the RTDP. Only when the belief
space is finite is the algorithm guaranteed to succeed. Currently, for
infinite belief space domains, the solution is to choose a suitable
heuristic function and focus the updates only on "relevant" states. The
meaning of "relevent" here is unclear, and may not be able to apply in
some domains.
This archive was generated by hypermail 2.1.6 : Tue Apr 22 2003 - 04:41:19 PDT