Name of Reviewer
------------------
Ian Simon

Key  Contribution
------------------ 
Summarize the paper's main contribution(s). Address yourself to both the class
and to the authors, both of whom should be able to agree with your summary.

The paper gives an algorithm for multiview stereo that constructs a set of
oriented rectangular patches in 3D from a set of input images.  The algorithm
consists of 3 main steps:

1. Initialization -- Patches are constructed from local regions which are
consistent in a few images (usually 3).  Candidate regions are selected using
Harris and Difference of Gaussian feature detectors.

2. Expansion -- New patches are added for image regions which are adjacent to
regions that are already represented by patches.  You can think of this step
as trying to extend surfaces along their tangent planes.

3. Filtering -- Outlier patches are detected in three ways and removed.
First, if the image regions corresponding to a patch are better explained by
occluded patches, the patch is removed.  Second, patches that are not visible
in enough (usually 3) images are removed.  Finally, patches that violate a 
local smoothness constraint are removed.

Steps 2 and 3 are repeated several times.  The final set of 3D patches is then
used to fit a polygonal mesh.


Novelty 
--------
Does this paper describe novel work? If you deem the paper to lack novelty
please cite explicitly the published prior work which supports your claim.
Citations should be sufficient to locate the paper and page unambiguously.
Do not cite entire textbooks without a page reference. 

The algorithm itself is novel, though it has the same basic setup as Michael
Goesele's paper from CVPR 2006.  It ends up being quite similar to Michael's
ICCV 2007 paper, which of course was published later.  Though the algorithm is
novel, it's hard to point to one key insight that makes it work.  Rather, it
appears to be the result of a number of small observations (patches can
occlude each other, a good place to look for points on a surface is along the
tangent direction, etc.).

Reference to prior work 
-----------------------
Please cite explicitly any prior work which the paper should cite. 

The citations seem fine.

Clarity 
-------
Does it set out the motivation for the work, relationship to previous work,
details of the theory and methods, experimental results and conclusions as
well as can be expected in the limited space available? Can the paper be read
and understood by a competent graduate student? Are terms defined before they
are used? Is appropriate citation made for techniques used? 

There's a fair bit of notation, but most of the steps in the algorithm are
pretty intuitive.  That is, until I got to the section on constructing a
polygonal mesh, which I still do not fully understand.


Technical Correctness 
---------------------
You should be able to follow each derivation in most papers. If there are
certain steps which make overly large leaps, be specific here about which ones
you had to skip. 

The paper isn't really making any technical claims, so it's hard to go wrong
here.

Experimental Validation
-----------------------
For experimental papers, how convinced are you that the main parameters of the
algorithms under test have been exercised? Does the test set exercise the
failure modes of the algorithm? For theoretical papers, have worked examples
been used to sanity-check theorems? Speak about both positive and negative
aspects of the paper's evaluation. 

Well, they used the multiview stereo benchmark set and their algorithm appears
to have performed quite well.  I'm not quite as convinced by their "crowded
scene" data sets, which are small and not particularly crowded, and for which
there are no comparisons to other methods.  They also claim that an entire
image which is an outlier constitutes a particularly challenging example, and
I'm not sure why this should be so.

Also, I'm not sure how their algorithm measures up speed-wise.  For large
datasets, they say a single expansion step (of which they run three) takes
between 20 minutes and a few hours.  Is this reasonable?

Overall Evaluation
------------------
Overall, the paper seems decent.  The algorithm is very similar to Michael
Goesele's, in that it begins with feature matches and expands to cover the
rest of the object.  Though Michael's work doesn't handle occlusions
geometrically (I don't think), it appears to achieve the same effect through
view selection.

Since the paper is claiming to handle occlusions, they should do more
extensive testing on "crowded scene" data sets.


Questions and Issues for Discussion
-----------------------------------
What questions and issues are raised by this paper? What issues do you think
this paper does not address well?  How can the work in this paper be extended? 

The algorithm in this paper is not set up as a large optimization problem.
Would it be difficult to formulate an objective function that takes into
account all of the factors this approach deals with (photometric consistency,
smoothness, occlusion)?

The authors don't mention this, but it seems possible for this algorithm to
oscillate.  Also, is three expansion/filtering iterations really enough to
converge?

What if a set of images contains multiple rigid objects?  Ignoring the
calibration problem, this algorithm should be able to reconstruct all of them,
it seems.

Is there anything that can be done if the object is nonrigid?