From: MAUSAM (mausam_at_cs.washington.edu)
Date: Fri May 02 2003 - 10:32:59 PDT
SPUDD : Stochastic Planning using Decision Diagrams
Hoey, St-Aubin, Hu & Boutilier
This paper presents a symbolic algorithm to do value iteration over MDPs
with full observability in an infinite horizon discounted reward model.
Structured policy iteration using decision trees had already been
developed. However, the authors realised that there is a lot of
duplication in such trees, and an acyclic structure like decision diagram
could provide good speed up. In this paper, the authors motivated this
idea and provided experimental results to back them up.
ADDs can represent any function from set of states to an algebraic number.
The authors used this fact, to represent the reward model, the current
value function, etc in terms of ADDs and defined one whole iteration of
value iteration in terms of operations on these ADDs. Thus, the explicit
state space was never enumerated and there were speedups exploiting the
structure of the problem.
The experimental section was a well thought after section. The authors
understood that due to different implementation behaviours, it will be
unrealistic to compare SPUDD with SPI in running times. So, the authors
compared the two algorithms on the basis of the size of graphs generated.
They showed massive reductions in the number of nodes encountered in
SPUDD.
They also provided the best case and worst case analysis of SPUDD to show
that the algorithm may not perform better always. However, their worst
case graph (where time should be exponential in the number of state
variables) looks linear. It is probably a small portion of a big
exponential curve. However, I would have liked it being exponential.
It would have been nicer had they did some experiments in the space time
trade-offs in the subset-splitting optimisation that they talked about.
Though, they themselves assign this for future work.
Also, most of their experiments are in the factory domain. Since SPUDD is
so much dependent on the problems involving some inherent structure, it
would have been nicer to take some other real world problems and shown
similar results, comprehensively establishing that most real world
problems have structure which can be utilized in SPUDD.
One major drawback of this study is the absence of a formal model of
"structure in the problem". The paper shows that practically there is
speed up and there is a pathological case. However, one could go further,
and formalise the notion of structure in problems, and then find some real
complexity results from there. Of course, this seems like a non-trivial
task to me.
One obvious future work is to try policy iteration in place of value
iteration since it is known to converge faster. Even the authors mention
this in their list of future works. Since policy can also be represented
as ADDs, I think that it should be doable in such a representation.
However, no future papers from authors or others suggest this, so may be
there is a catch or no one has tried this as yet.
A further question is can we handle larger problems by approximating the
ADDs, something similar to the way approximating value trees is done.
Another question to ask is in what other domains can we apply similar
ideas. For example, can we utilize this to do better in the POMDP domains,
or in domains where there are continuous state variables (using ADDs
with continuous variable internal nodes having multiple children for
different intervals) etc.
This archive was generated by hypermail 2.1.6 : Fri May 02 2003 - 10:33:00 PDT