From: Raphael Hoffmann (raphaelh_at_u.washington.edu)
Date: Sun Nov 16 2003 - 22:59:27 PST
"SPUDD: Stochastic Planning Using Decision Diagrams"
Summary:
Due to the need of complete state enumeration, solving large MDPs by the
value iteration algorithm is often infeasible. The authors of this paper
propose a different (reduced) representation of value functions and show
that some large MDP problems can be computed efficiently.
Pros:
The idea of finding a technique to efficiently solve large MDPs is a very
important one, since applications arise in many contexts and current
algorithms often prohibit practical usage. Using ADDs seems to be a
promising solution. The space complexity can be reduced substantially and
many problems that could not be solved previously are computable now.
The authors also invested a lot of effort in further optimizing their
algorithm and the results in a series of tests look very promising.
Certainly, not for all MDP problems can states be easily represented by
merely boolean variables. The authors therefore addressed the possibility of
transforming multivalued variables into several boolean variables. This
feature makes ADD also usable when a few variables are multivalued.
Cons:
However, the way the authors cope with multivalued variables appears to be
rather simple and obviously has some flaws: The state space could grow
quickly - depending on the number of multivalued variables and the size of
their domains. The effects haven't been examined thoroughly.
At the beginning of Chapter 4, the authors explain briefly why the ADD
representation can be so efficient: it exploits the regularities int the
actions and reward networks. It would be great if the efficiency gain -
based on a measure of the regularities in the data - could be assessed more
precisely.
Future work:
In my opinion, improvements can be made in how to deal with multivalued
variables. The authors indicated that it is possible to allow ADDs to work
directly with multivalued variables. Further research could evaluate the
performance issues and compare the results with the performance of doing the
simple boolean conversion of the multivalued variables. It could be possible
that in some cases the expansion of the state space still has advantages
over the direct usage of multivalued variables and vice versa.
The authors suggest that the performance could be significantly improved by
implementing dynamic variable reordering at runtime. To make this possible,
algorithms that allow dynamic variable reordering will have to be developed
and the performance increases analysed.
As mentioned above, it would be great to calculate the efficiency gain based
on a measure of the regularities in the data. Perhaps it would then be
possible to make a priori estimates of the runtime. This could be a very
helpful information in practical applications.
This archive was generated by hypermail 2.1.6 : Sun Nov 16 2003 - 22:59:34 PST