From: Alexander Yates (ayates_at_cs.washington.edu)
Date: Tue Apr 29 2003 - 00:28:40 PDT
Review of "Probabilistic Planning in the Graphplan Framework," by Avrim L.
Blum and John C. Langford.
This paper proposes two new algorithms for solving MDPs with finite
horizons, both of them based on the Graphplan algorithm for classical
planning. PGraphplan finds an optimal policy but is much slower than
Graphplan, and TGraphplan finds optimal trajectories and runs nearly as fast
as Graphplan, but it does not output a complete policy.
PGraphplan uses a forward-chaining search and a top-down dynamic programming
procedure to search for an optimal policy. The search uses the planning
graph structure, but because it is a forward-chaining search, it cannot use
mutual exclusion relations. Instead, the search uses a heuristic estimate
of the fastest way the goals can be reached from a state, and it prunes any
state where this (admissible) heuristic is more than the amount of time
left. The search also does a kind of reachability analysis to prune nodes
that can't reach the goal. TGraphplan runs basically the same way as
Graphplan except over the planning graph labeled with probabilities for
transitions, and it outputs the likelihood of the optimal trajectory instead
of simple success or failure. The authors see it as a module for other
procedures that do greedy contingent planning, or as a subroutine for
finding an optimal policy.
It seems like TGraphplan could easily get an agent into trouble, since it
ignores what happens when transitions don't follow the optimal trajectory.
For example, an agent trying to cross a chasm might decide to use a rickety
old bridge using TGraphplan to get across the quickest, but optimally it
would decide to hike up to a steadier bridge a mile away to avoid dying.
Contingent planning wouldn't really help the agent using TGraphplan. I
guess this is pretty obvious, since I'm just saying greedy algorithms have
pitfalls, but I would still dispute their claim that the "important time for
TGraphplan is the time to find the optimal trajectory rather than the time
to find a complete policy."
One interesting research direction would be to figure out a way to use
planning graph analysis to attack POMDPs, if that's possible.
Alex
This archive was generated by hypermail 2.1.6 : Tue Apr 29 2003 - 00:29:50 PDT