Spudd

From: Stefan B. Sigurdsson (stebbi_at_cs.washington.edu)
Date: Fri Apr 18 2003 - 12:26:22 PDT

  • Next message: Kevin Sikorski: "SPUDD planner"

    SPUDD
    UBC 2003

    Summary:
     - Spudd extends the classical planning problem to actions with uncertain outcomes by listing and assigning probabilities to each possible outcome in the action definitions.
     - Instead of a sequential plan designed to reach a well-defined goal, Spudd computes a decision tree (or DAG, rather) representing an action policy designed to maximize the agent's reward.
     - A key difference between Spudd and classical planners is that Spudd policies seem to support continuous agent operation in domains with outside perturbation.
     - I downloaded, built and ran some of the provided examples but most of my time with Spudd I simply read problem and policy descriptions. The on-line interface helped with the latter.

    Problem descriptions:
     - Spudd has an *old* and a *new* input file format, for encoding problems with binary and multi-valued values respectively.
     - The high-level structure of the file formats is the same:
        1. Variable definitions; a list of the variables in the problem
        2. Action definitions; a list of one or more actions
        3. Reward definition; a tree specifying the "goodness" of the different variable end-state combinations
        4. A discount factor assignment
        5. A tolerance factor assignment
     - Variables: The old version only allows binary variable values. The new version extends the variable definition list with per-variable lists of allowed values
     - Actions: The format is as follows:
        <action>: "action" <name> <cost>? (<variable-effect-list>)+
        <name>: self-explanatory
        <cost>: defaults to zero
        <variable-effect-list>: one <variable-effect> for each variable
        <variable-effect>: specified in terms of trees describing each possible state before the action is executed, with leaves enumerating the probabilities of the variable taking each of its possible values. The leaves only contain one probability estimate in case of binary variables
     - Reward: Assigns numerical reward values to different final states, using a tree format
     - Discount, tolerance: I didn't figure this out yet

    Policies:
     - A decision tree of variables (internal nodes), values (edges) and actions (leaves).
     - Each time an action should be taken the agent can resolve what action is best using the policy.
     - The agent should eventually reach a (locally) optimal state, but that doesn't seem to be guaranteed?


  • Next message: Kevin Sikorski: "SPUDD planner"

    This archive was generated by hypermail 2.1.6 : Fri Apr 18 2003 - 12:26:23 PDT