From: Karthik Gopalratnam (karthikg_at_cs.washington.edu)
Date: Mon Nov 17 2003 - 02:04:29 PST
PAPER REVIEW #4: "SPUDD: Stochastic Planning using Decision Diagrams" - Hoey
et. al.
This paper presents a different perspective on solving MDP's without
having to explicitly enumerate the entire state-space, by introducing a
dynamic abstraction method using ADD's.
The key premise of the paper is that ADD's induce a method of
representing value functions and policies by means of a compact decision
graph, that can then be used to learn the optimal policy. ADD's can make use
of the structure inherent in the representation of the actions as a DBN to
represent the functions that denote the conditional distributions defined
over the actions. Solving the MDP is done subsequently by implementing value
iteration over the ADD representations for the value functions and the
CPT's, which have implicitly encoded the structure inherent to the problem.
The authors present the VI algorithm over the ADD's and also present certain
optimizations to leverage time-space tradeoffs in implementing the basic
SPUDD algorithm.
The patently glaring drawback of this method is the fact that ADD's
can only represent boolean valued functions. Like the authors point out,
though multi-valued functions can be represented as multiple boolean
variables, changing the representation seldom preserves the original
structure that this method was capable of utilizing. Therefore, the blow-up
in the number of variables that such an abstraction is bound to entail,
would imply that for general purpose planning problems with multi-valued
functions, this method might actually perform poorly.
Though the authors' tests on the FACTORY data present compelling
evidence that this algorithm is capable of performing very well relative to
SPI on at least one domain, it remains to be seen, especially in the light
of the worst-case analysis graphs, how the algorithm compares to other
methods, on a broader set of planning problems.
It would be interesting to see, given the significantly faster
machines that are available today, how the time-space tradeoffs suggested by
the authors in their optimizations affect the run-time of the algorithm, and
whether larger problems may be feasibly solved using this method. Also, it
would be very useful to address the drawback of the algorithm in its not
being able to represent multi-valued functions, in a more efficient manner.
The effect of implementing dynamic variable reorderings on the effectiveness
of this method would also be a useful study.
This archive was generated by hypermail 2.1.6 : Mon Nov 17 2003 - 02:03:50 PST