From: Julie Letchner (letchner_at_cs.washington.edu)
Date: Tue Nov 18 2003 - 16:52:32 PST
The authors introduce an MDP solver that uses pruning in the symbolic state
space of ADDs to eliminate states not reachable from the initial state. The
policy search is made more efficient by eliminating from it states that can
never be reached.
None of the techniques in this paper (A*/LAO* search, dynamic programming,
ADD symbolic representation of MDPs, heuristic generation) are new; the
contribution here is in combining them to create an algorithm more efficient
than its components. The results show that this combination is indeed
successful: factoring MDPs into ADDs generates a reasonably-sized symbolic
state space, the grouped structure of which allows pruning to be effective
and its guiding heuristics efficiently computable.
The combining-old-tools approach taken in this paper possibly explains why
the evaluation section has length equal to that of the algorithms'
explanation. The experiments are well-reported; not only do they show
positive results, but the accompanying explanations do a very nice job at
high-level credit assignment. The authors' explanations of what features of
a problem (for example, fringe size and # states evaluated) correlate with
what types of reachability/size/timing results will prove far more valuable
than strictly numeric results in suggesting and guiding future work. This
speaks well to the authors' choice of evaluation metrics, though one
evaluation I missed was an evaluation of the effects of heuristic accuracy
on the reachability/size/timing results.
Just as with the SPUDD paper, my major gripe is that the paper does nothing
to convince me that the a* and f* problems bear any resemblance to
real-world MDP problems. In particular, my intuition (though it may be
wrong) tells me that most symbolic states in well-constructed real problems
are reachable, and are furthermore the "image" of a state set will expand
into the entire reachable set of states in a very small number of
iterations-in which case the value of reachability pruning may be
negligible.
I think a good direction for future work would be the development of formal
characterizations of the space of MDPs. Such a characterization would
provide a set of characteristics by which various MDPs could be described,
which would allow the performance of these various algorithms on various
types of MDPs to be evaluated and compared, providing more concrete
directions for future work than has been given by the two papers we've read
on this subject.
This archive was generated by hypermail 2.1.6 : Tue Nov 18 2003 - 16:51:56 PST