From: Masaharu Kobashi (mkbsh_at_cs.washington.edu)
Date: Sun Nov 23 2003 - 23:39:28 PST
Title: Acting Optimally in Partially Observable Stochastic Domains
Authors: Anthony R.Cassandra, Leslie Pack Kaebling, and Michael L. Littman
[Summary]
The paper reports a new algorithm for finding optimal or
near-optimal solutions for MDPs in partially observable environment
and claims that their algorithm improves the efficiency by mostly
limited empirical evidences.
[Most Important Ideas]
First, they claim that they developed a new algorithm to
improve the existing Witness algorithm, although as I state
below, it is not very clear to me.
Second, although these are not the invention of the authors but
the properties of POMPD and Witness algorithm, they treat both
the actions that affect the environment and the actions that
only affect the agent's state of information uniformly and
the Witness algorithm can guarantee that the running time
never become exponential if the number of required vectors
is not exponential.
[Largest Flaws]
First, I am afraid it may be due to my reading capability,
but I was not able to grasp how they improved the lots
of existing and quoted algorithms in the paper. Their
description of the algorithms seem to be mostly those
of the already existing ones.
Second, although the authors describes the contents of the
paper as "preliminary", the descriptions are sketchy and has
little to appeal because of the lack of in-depth logical
analysis of why and how their new algorithm works better
as well as the limited experiments and descriptions of real
applications.
[Important Open Research Questions]
The way to extend their algorithm to perform policy
iteration is one, as noted by the authors.
Another is to further enhance the efficiency by limiting
the search space in some ingenious way, which the authors
state as what they are aiming at.
This archive was generated by hypermail 2.1.6 : Sun Nov 23 2003 - 23:39:29 PST