Probagraphplan

From: Stefan B. Sigurdsson (stebbi_at_cs.washington.edu)
Date: Tue Apr 29 2003 - 12:53:19 PDT


Avrim L. Blum and John C. Langford
Probabilistic planning in the Graphplan framework

Summary
 - Explores extensions of Graphplan from classical to probabilistic planning domains with non-deterministic actions, with full observability of the current state. The authors' goal is to determine to what extent the speed of Graphplan can be preserved given the less constrained problem.

Key ideas
 - Two approaches are presented. TGraphplan is a simple extension of Graphplan with probability-weighted edges and using the backwards-chaining search to derive a fuzzy (instead of a binary) evaluation of paths through the plan graph. The outcome is a "trajectory" with optimal probability of success (or just sufficient, based on how the planning horizon is handled). The algorithm is comparatively fast (almost as fast as Graphplan) because it just finds a trajectory instead of computing a full policy. It seems like there might not be many more simple ways to extend Graphplan to non-deterministic actions than the TGraphplan approach - but I haven't read the Smith/Weld and Weld et al. probabilistic Graphplan papers yet.
 - PGraphplan finds an optimal/complete policy but runs significantly slower than TGraphplan. It's based on dynamic programming and uses the plan graph for pruning, by observing whether nodes are relevant to achieving any goal (neededness), and how easy it is to take advantage of a node in reaching a goal (value propagation). The discussion of forward vs. backward-chaining is interesting; it seems intuitively plausible that forwards is better suited for policy computation, but I can't quite figure out how and I also wasn't completely happy with the discussion in the paper.

Questions
 - It's a bit unfortunate that the Graphplan mutexes have to be relaxed to work in TGraphplan - the notion of exclusive operators seems likely to cause false negatives during the search.
 - An upside of the experiments section is that comparisons with other planners are presented, but a downside is that the problems/domains are very simple. The experiments aren't very conclusive.

Directions
 - Experimenting with different handling of mutexes in TGraphplan.
 - Further relaxing the domain restrictions, e.g. extending Graphplan to partial observability, cost/rewards, etc.



This archive was generated by hypermail 2.1.6 : Tue Apr 29 2003 - 12:54:05 PDT