PGraphplan

From: Alexander Yates (ayates_at_cs.washington.edu)
Date: Tue Apr 29 2003 - 00:28:40 PDT

  • Next message: Nan Li: "MDP in Graphplan"

    Review of "Probabilistic Planning in the Graphplan Framework," by Avrim L.
    Blum and John C. Langford.

    This paper proposes two new algorithms for solving MDPs with finite
    horizons, both of them based on the Graphplan algorithm for classical
    planning. PGraphplan finds an optimal policy but is much slower than
    Graphplan, and TGraphplan finds optimal trajectories and runs nearly as fast
    as Graphplan, but it does not output a complete policy.

    PGraphplan uses a forward-chaining search and a top-down dynamic programming
    procedure to search for an optimal policy. The search uses the planning
    graph structure, but because it is a forward-chaining search, it cannot use
    mutual exclusion relations. Instead, the search uses a heuristic estimate
    of the fastest way the goals can be reached from a state, and it prunes any
    state where this (admissible) heuristic is more than the amount of time
    left. The search also does a kind of reachability analysis to prune nodes
    that can't reach the goal. TGraphplan runs basically the same way as
    Graphplan except over the planning graph labeled with probabilities for
    transitions, and it outputs the likelihood of the optimal trajectory instead
    of simple success or failure. The authors see it as a module for other
    procedures that do greedy contingent planning, or as a subroutine for
    finding an optimal policy.

    It seems like TGraphplan could easily get an agent into trouble, since it
    ignores what happens when transitions don't follow the optimal trajectory.
    For example, an agent trying to cross a chasm might decide to use a rickety
    old bridge using TGraphplan to get across the quickest, but optimally it
    would decide to hike up to a steadier bridge a mile away to avoid dying.
    Contingent planning wouldn't really help the agent using TGraphplan. I
    guess this is pretty obvious, since I'm just saying greedy algorithms have
    pitfalls, but I would still dispute their claim that the "important time for
    TGraphplan is the time to find the optimal trajectory rather than the time
    to find a complete policy."

    One interesting research direction would be to figure out a way to use
    planning graph analysis to attack POMDPs, if that's possible.

    Alex


  • Next message: Nan Li: "MDP in Graphplan"

    This archive was generated by hypermail 2.1.6 : Tue Apr 29 2003 - 00:29:50 PDT