From: Jessica Kristan Miller (jessica_at_cs.washington.edu)
Date: Mon Nov 24 2003 - 11:33:57 PST
Paper Reviewed: Acting Optimally in Partially Observable Stochastic
Domains
Paper Author: Cassandra, Kaelbling, Littman
Reviewed By: Jessica Miller
Summary:
The goal of this paper is to describe an algorithm that produces optimal
policies for POMDPs in a more efficient manner existing algorithms.
Important Ideas:
The authors represent POMDPs as a completely observable continuous-space
MDPs over the belief state space. This representation is important
because the an optimal policy for a MDP on the belief space is an optimal
policy for the POMDP.
Using the formulation of POMDPs as belief MDPs, the witness algorithm uses
a form of value iteration to approximate the optimal value function. The
authors claim that this algorithm is one of the only algorithms that solve
POMDPs in less than exponential time.
I thought that the coolest part of this paper was the final output - the
policy graph. This representation of the result is very clear and
demonstrates how the algorithm has been able to generalize regions of the
state space. However, I don't know if this representation existed before
this paper.
Flaws:
The largest flaw of this paper is it contains no quantifiable results. I
suppose it is because they had to meet some kind of page limit and they
expound upon the results in other papers.
Open Research Questions:
The authors propose extending their algorithm to policy iteration which
might yield further efficiency. Perhaps another area of research could be
how to improve the algorithm so that low tolerance factors don't result in
unstable linear problems...or ways to deal with them gracefully somehow.
This archive was generated by hypermail 2.1.6 : Mon Nov 24 2003 - 11:33:58 PST