|
CSE 574 - Automated Planning
Winter 2003
|
|
574 Project Page.
[ Administrivia | Paper
schedule | Project Ideas | Resources ]
Your first task is to choose (at least one) planner, fire it up, try a
couple of problems, and perhaps write a new problem yourself. After you
have a sense for it's strengths and weaknesses, fill out a [Review]
- An ADD-based MDP solver called SPUDD - both snazzy
and with a nice online interface.
- The SAPA and LPG temporal planners are available here.
Other recommended planners are: Geffner's GPT
(which handles uncertainty) and Blackbox.
I hope that groups (1-3 people) will form and decide on their specific
projects by Tues April 22. The two default proposals are:
Goal Selection
Modify a classical planner (e.g. LPG, HSPr, Blackbox, Graphplan, or
whatever) to solve the following optimization problem. In addition to
the standard init state, action schemata, and goals inputs, your
planner will take a resource bound. Each action will have an
associated cost and each goal an associated utility. We'll assume that
the agent's overall utility is a linear combination of the individual
goals.
Your planner should output a plan which maximizes (or as close as it
can find to optimal) the overall utility achievable without exceeding
the resource bound. The straightforward way of solving this is doubly
exponential, so don't be sad if you don't find a fast solution which
is guarenteed optimal. The ideal project would identify one or more
clever heuristics and / or auxilliary datastructures. Is dynamic
programming (i.e. generating minimal resource-usage plans for sets of
one goals, sets of two, etc) a useful strategy? Can you use a
planning graph variant to determine bounds on goal interactions?
Incremental Replanning in an Embedded Agent
Build a world simulator (or modify one from the AIMA textbook
page). The idea simulator would let you input the world physics with a
set of planning operators and an init state - the same inputs you'd
give to your planner. You'll want to assume that actions have an
associated reward (or cost). The simulator should record the agent's
actions and it's performance (accumulated reward stream). Your
simulator should talk to any agent, probably via sockets, maybe using
an XML-based protocol to say which actions are being executed and to
report sensor data back to the agent. You should probably assume that
the world and the agent are synchronous - the world will wait for the
agent to command an action before it changes the world. This
eliminates a host of race conditionbs and means you don't need to
start out focussing on how fast the planner is.
Now take a planner (probably GPT or SPUDD) and connect it to the
world. How does it perform?
Finally, extend the planner in some interesting way. My favorite idea
here would be to allow the human user to (at any time step) to add
one-shot goals of state achievement (with associated utilities). Can
you get the agent to satisfy these? One way would be to compile them
into the MDP and resolve using value iteration, (see this aaai-96 paper), but maybe you can figure
out a faster way?
Here are some notes on other ideas.
- Analyze graphplan in terms of CSP framework; compare mutex to
k-consistency; can one bound k?
- Reimplement STAN's TIM analysis and symmetry detection algorithms
- Abstraction methods motivated by McDermott grid world
- Dynamic objects with universally quantified effects in graphplan
- One-shot goals in an MDP framework
- Planning where actions can take real-valued parameters (and hence the
branching factor is infinite).