Name of Reviewer ------------------ Ian Simon Key Contribution ------------------ Summarize the paper's main contribution(s). Address yourself to both the class and to the authors, both of whom should be able to agree with your summary. The paper gives an algorithm for multiview stereo that constructs a set of oriented rectangular patches in 3D from a set of input images. The algorithm consists of 3 main steps: 1. Initialization -- Patches are constructed from local regions which are consistent in a few images (usually 3). Candidate regions are selected using Harris and Difference of Gaussian feature detectors. 2. Expansion -- New patches are added for image regions which are adjacent to regions that are already represented by patches. You can think of this step as trying to extend surfaces along their tangent planes. 3. Filtering -- Outlier patches are detected in three ways and removed. First, if the image regions corresponding to a patch are better explained by occluded patches, the patch is removed. Second, patches that are not visible in enough (usually 3) images are removed. Finally, patches that violate a local smoothness constraint are removed. Steps 2 and 3 are repeated several times. The final set of 3D patches is then used to fit a polygonal mesh. Novelty -------- Does this paper describe novel work? If you deem the paper to lack novelty please cite explicitly the published prior work which supports your claim. Citations should be sufficient to locate the paper and page unambiguously. Do not cite entire textbooks without a page reference. The algorithm itself is novel, though it has the same basic setup as Michael Goesele's paper from CVPR 2006. It ends up being quite similar to Michael's ICCV 2007 paper, which of course was published later. Though the algorithm is novel, it's hard to point to one key insight that makes it work. Rather, it appears to be the result of a number of small observations (patches can occlude each other, a good place to look for points on a surface is along the tangent direction, etc.). Reference to prior work ----------------------- Please cite explicitly any prior work which the paper should cite. The citations seem fine. Clarity ------- Does it set out the motivation for the work, relationship to previous work, details of the theory and methods, experimental results and conclusions as well as can be expected in the limited space available? Can the paper be read and understood by a competent graduate student? Are terms defined before they are used? Is appropriate citation made for techniques used? There's a fair bit of notation, but most of the steps in the algorithm are pretty intuitive. That is, until I got to the section on constructing a polygonal mesh, which I still do not fully understand. Technical Correctness --------------------- You should be able to follow each derivation in most papers. If there are certain steps which make overly large leaps, be specific here about which ones you had to skip. The paper isn't really making any technical claims, so it's hard to go wrong here. Experimental Validation ----------------------- For experimental papers, how convinced are you that the main parameters of the algorithms under test have been exercised? Does the test set exercise the failure modes of the algorithm? For theoretical papers, have worked examples been used to sanity-check theorems? Speak about both positive and negative aspects of the paper's evaluation. Well, they used the multiview stereo benchmark set and their algorithm appears to have performed quite well. I'm not quite as convinced by their "crowded scene" data sets, which are small and not particularly crowded, and for which there are no comparisons to other methods. They also claim that an entire image which is an outlier constitutes a particularly challenging example, and I'm not sure why this should be so. Also, I'm not sure how their algorithm measures up speed-wise. For large datasets, they say a single expansion step (of which they run three) takes between 20 minutes and a few hours. Is this reasonable? Overall Evaluation ------------------ Overall, the paper seems decent. The algorithm is very similar to Michael Goesele's, in that it begins with feature matches and expands to cover the rest of the object. Though Michael's work doesn't handle occlusions geometrically (I don't think), it appears to achieve the same effect through view selection. Since the paper is claiming to handle occlusions, they should do more extensive testing on "crowded scene" data sets. Questions and Issues for Discussion ----------------------------------- What questions and issues are raised by this paper? What issues do you think this paper does not address well? How can the work in this paper be extended? The algorithm in this paper is not set up as a large optimization problem. Would it be difficult to formulate an objective function that takes into account all of the factors this approach deals with (photometric consistency, smoothness, occlusion)? The authors don't mention this, but it seems possible for this algorithm to oscillate. Also, is three expansion/filtering iterations really enough to converge? What if a set of images contains multiple rigid objects? Ignoring the calibration problem, this algorithm should be able to reconstruct all of them, it seems. Is there anything that can be done if the object is nonrigid?