Paper Review #5 - by Patrick Haluptzok - CSE 573 Fall 2003

From: Patrick Haluptzok (patrickh_at_windows.microsoft.com)
Date: Tue Nov 18 2003 - 22:56:16 PST

  • Next message: Masaharu Kobashi: "Review of paper 5"

    Paper Review #5 - by Patrick Haluptzok - CSE 573 Fall 2003

     

    Paper title/Author:

    Symbolic Heuristic Search for Factored Markov Decision Processes

    By Zhengzhu Feng & Eric A. Hansen

     

    One-line summary:

    The paper describes how in a factored MDP problem doing state abstraction via an ADD representation for the value function and only representing that value function for reachable states produces a faster convergence with a smaller ADD representation than SPUDD or classic complete state enumeration approaches.

     

    The (two) most important ideas in the paper, and why:

    The authors only represent the value function with an ADD for states that are actually reachable. This saves making the ADD for the value function more complicated and larger than necessary. They use a heuristic estimate for the value function to prune search, so they don't have to represent the value function for many states they would never get to following an optimal policy.

     

    They use an ADD representation to save explicit state enumeration. So in addition to not visiting all the states they use ADD for state abstraction. In fact they use ADD for many functions maintained in the LAO* search, they use it for the value function, the states that are reached, the states that are reached by an action from any state, etc. This saves time and space.

     

    The one or two largest flaws in the paper:

    I found this paper easier to read - but I guess one flaw is it's just combining previous ideas together - so maybe not super original work.

     

    The other flaw is I don't think this would work if the value function was stochastic or if the model for the world was unknown. Maybe that is what reinforcement learning handles - but in real world problems I don't think you have a good model for transitions based on actions - and I think the value function feedback is stochastic - your boss doesn't always say good job when you do something right.

     

    Identify two important, open research questions on the topic, and why they matter:

     

    Can the representation of the value function be represented more generally by a vanilla neural net or other function that handles continuous variables - since many problems have continuous variable inputs to represent states and converting to discrete is a bit ad hoc in how fine you bucket the variables range,etc.

     

    Can this be extended to domains without a model, or with stochastic value function feedback - since the real world is often like that.

     


  • Next message: Masaharu Kobashi: "Review of paper 5"

    This archive was generated by hypermail 2.1.6 : Tue Nov 18 2003 - 22:56:22 PST