From: Daniel J. Klein (djklein_at_u.washington.edu)
Date: Mon Nov 17 2003 - 10:41:40 PST
Title: SPUDD: Stochastic Planning using Decision Diagrams
Authors: Jesse Hoey, Robert St-Aubin, Alan Hu, and Craig Boutilier
Summary:
The authors present a MDP value iteration algorithm derived from
structured policy iteration (SPI) that uses algebraic decision diagrams
(ADDs) to represent value functions/policies.
Two Most Important Ideas:
The main focus of the paper is on the ADD representation of MDPs. ADDs
are a generalization of ordered binary decision diagrams (BDD) that
allow real valued functions to be evaluated from a tree-like structure.
ADDs take advantage of naturally occurring repetition to create a more
compact isomorphic representation. This smaller representation allows
for fewer MDP expected value computations and therefore increased
processing speed.
A second main idea presented in the paper is optimization of the ADD
representation. The authors directly say that the basic SPUDD algorithm
"suffers from certain practical difficulties" that can be resolved by
optimization. Although more questions than answers are presented in the
paper about optimization, the authors still have a good point in that
there are possible methods that could be used to optimize the use of
ADDs for MDP value iteration. Two optimization techniques presented are
dynamic variable ordering and pre-computation of the joint probability
distribution. The authors go further by presenting a "tuning knob" that
allows for adjustment between space and time constraints.
Largest Flaws:
Hmm...where to begin.
1. The example used to illustrate the ADD representation (the bolt
thing) is used throughout the paper, but not described until the
conclusion. Without first reading the paper they referenced, I had no
clue what PL, ADR, BO, etc represented. This clouded my understanding
of the ADD representation of MDPs.
2. Why is the ADD representation based on and only compared to SPI when
the authors clearly state in the conclusion that dynamic programming
algorithms, such as modified policy iteration, are generally considered
to converge more quickly than value iteration in practice.
Open research questions:
The authors optimization section reads like an open research section.
One idea they mention is dynamic variable ordering. It is well known
that choosing a good variable ordering heuristic can save a significant
amount of processing time. As another research question, I would like
to see the "turning knob" investigated further and compared to variable
elimination algorithms.
Dan's overall paper-o-meter rating: 6.5/10
This archive was generated by hypermail 2.1.6 : Mon Nov 17 2003 - 10:42:40 PST