From: Keith Noah Snavely (snavely_at_cs.washington.edu)
Date: Mon Nov 17 2003 - 01:46:48 PST
"SPUDD: Stochastic Planning using Decision Diagrams" by Hoey, et
al. describes the use of algebraic decision diagrams (ADDs) in a new
value iteration algorithm for solving Markov decision processes. ADDs
are a generalization of binary decision diagrams and can often
compactly represent real-valued functions over boolean variables
(especially for functions with nice structure).
The authors claim, and show experimentally, that using ADDs to
represent actions diagrams can allow for solving of MDPs with very
large state spaces, whereas with other representations (such as
tables) space requirements are prohibitive. They also point out that
since ADDs have been widely used in other domains, using ADDs to solve
MDPs takes advantage of existing efficient implementations of ADD
operations. The SPUDD algorithm, described in Section 4, shows how to
use ADDs for this purpose and is the main contribution of the paper.
In addition, a couple of optimizations to SPUDD are described.
I think that the authors did a good job of discussing the strengths
and weaknesses of ADDs in a general sense, pointing out that they are
not always the smallest function representations, but some more
experimental results on real MDPs (besides the Factory family) could
have shown this more effectively. Also, I wasn't sure whether or not
ADDs could be generalized to represent multi-valued variables,
although the authors sounded semi-confident that they could be.
Another thing that I didn't get was how SPI worked, and in particular
how it represents functions, so the comparisons between SPUDD and SPI
were a little confusing. (Especially since they left out discussion
of time performance.)
The paper mentions several possibilities for future work. The first
is to apply existing techniques for making ADDs more efficient -- such
as dynamic variable reordering -- to solving MPDs, and to figure out
what kinds of variable reordering schemes work well for these
problems. Another is to apply ADDs to policy iteration rather than
value iteration, which might be easier to compare with SPI. Though
this doesn't really apply to the current discussion, I think it would
be neat to talk about good ways to represent functions of variables
with infinite or continuous domains, in which case the state space for
an MDP would also be infinite.
This archive was generated by hypermail 2.1.6 : Mon Nov 17 2003 - 01:46:48 PST