From: Alexander Yates (ayates_at_cs.washington.edu)
Date: Mon May 05 2003 - 16:23:13 PDT
Review of "Incremental Contingency Planning," by R.Dearden, N. Meuleau, S.
Ramikrishnan, D. Smith, and R. Washington.
This paper proposes a method for incrementally building probabilistic plans
that reason over continuous properties. The basic method is to build a
straight line plan with actions having their most likely effects, and then
add branches to the plan at the places where the branch would have the
highest utility, taking into effect the probability that the branch will be
taken.
The Mars Rover motivates the idea of planning over continuous time and
resources. This kind of problem is something like an MDP with an
uncountable state set (they assume full observability of states), so
traditional MDP techniques that visit the entire state set to build a full
policy are inappropriate. This research focuses instead on building a plan
for a specific initial state and a set of goal states, rather than a
complete policy, somewhat like PGraphplan and TGraphplan. Also like
TGraphplan, it begins by finding the best trajectory to the goals (the main
branch). It then uses utility estimates to find the best places to add
branches to the plan. The utility estimates are propagated backwards from
the goal nodes through a planning graph, and they are represented as a
piecewise constant function of the resources.
I don't quite understand why this planning problem is called contingent
planning, since they assume full observability. That conflicts with the
previous notion of contingent planning that we've seen. The reason for
branches in these plans aren't because there is uncertainty in what state
the planner is in, but because it's impossible to enumerate all of the
different outcomes. Also, the paper never discusses probabilistic effects
for state transitions other than the ones effecting the continuous
attributes. I wasn't sure if this meant that all actions were deterministic
except in their resource consumption, or if it meant that any attributes
which might have probabilistic transitions would need to be treated as
resources.
Another approach to this kind of problem would be to automatically
discretize the continuous attributes into bins. It seems like what they are
doing is kind of figuring out the best way of selecting certain bins. A
possible direction for future research is find other ways of automatically
discretizing the state space, and then applying more standard MDP
approaches. Another possible direction is to look for better heuristics,
perhaps using different representations of the utility function (piecewise
linear or something).
Alex
This archive was generated by hypermail 2.1.6 : Mon May 05 2003 - 16:23:14 PDT