Acting Optimally in Partially Observable Stochastic Domains

From: Daniel J Klein (djklein_at_u.washington.edu)
Date: Mon Nov 24 2003 - 00:11:42 PST

  • Next message: Daniel Lowd: "Acting Optimally in Partially Observable Stochastic Domains"

    Title: Acting Optimally in Partially Observable Stochastic Domains
    Authors: A. Cassandra, L. Kaelbline, and M. Littman

    Summary: The authors detail the basics of MDP and POMDP and then present their POMDP algorithm.

    Main Ideas:

    POMDP is the main idea throughout the paper. Initially, the authors frame the problem of MDP and give a basic explanation of how it works. The authors then move onto the problem of partially observable decision processes. They go slowly and support their explanation with a simple example.

    One concept fundamental to POMDP is to replace the ordinary state space from MPD with a belief space. This allows the POMDP problem to be treated similar to a MDP problem after some Bayes analysis. The simple examples were especially useful here to understanding the authors main ideas.

    One Small Flaw:

    This was the easiest paper we have read thus far. The authors stick to basic explanations and examples that are easy to follow. However, the flip side of over-simplification is that the specific details are left out. Specifically, I though their Witness section should have been more detailed.

    Future Research:

    Having recently read MDP problems, the size of the problems considered for their Witness algorithm seem puny in comparison. Granted, POMDP is much more difficult than MDP, but I think further research will produce algorithms capable of operating on larger problems.

    The authors point out that it is often much easier to find a approximate policies than the one optimal policy. But are these approximate policies ever useful? If an approximate policy is good enough, why bother with finding the optimal policy? Is it possible to determine if one of the approximate solutions will be "good enough" ahead of time? Along the same line of questioning, is it possible to determine delta in the convergent sequence automatically using some heuristic?


  • Next message: Daniel Lowd: "Acting Optimally in Partially Observable Stochastic Domains"

    This archive was generated by hypermail 2.1.6 : Mon Nov 24 2003 - 00:11:43 PST