Name of Reviewer ------------------ Rahul Garg Key Contribution ------------------ The paper attempts to solve the problem of inferring temporal order of images. The problems is reduced to a constraint satisfaction problem and solved using a local search. Novelty -------- The paper claims to be the first one to attempt the problem. Reference to prior work ----------------------- The work that is nearest to the authors' work and cited in the paper is "4D Structure from Motion: A computational algorithm" [M. Ge and M.D'Zmura]. But still it's vastly different from the authors' work since they had considered collection of images ordered in time while the authors deal with unordered collection of images. Clarity ------- The paper is clear and easy to understand since it does not involve much technical details. Technical Correctness --------------------- The paper seems to be technically correct. The authors skip over the details of SfM algorithm which seems appropriate. Experimental Validation ----------------------- The experimental validation is not exhaustive. There are only two results on real world datasets - a sequence of 6 images, and a sequence of 20 images - clearly too small a sample. Having said that, it is difficult to obtain large and verifiable data sets for the problem. Overall Evaluation & Questions and Issues ------------------------------------------ The problem being solved is novel and the authors present a simple approach to reduce it to a constraint satisfaction problem. However there are a few points which need clarifications: [1] Authors manually detect and match feature points. The authors could have used existing interest point detectors and descriptors like SIFT to automate the method. Where would their approach fail if this is done? [2] The method to determine if the point is occluded or missing does not seem to be a robust one. For instance, consider the case when there are tall buildings with feature points near the top of those buildings. The triangulation is likely to generate triangles which would connect points from different buildings thus generating spurious occluders while missing out on a lot of true occluders like the sides of a building. Perhaps this is why they use manually selected feature points? [3] The author mention the fact that there are number of images that belong to the same "era", thus giving rise to multiple correct orderings. Can the images belonging to the same era clustered before finding the ordering so as to reduce the search space? [4] Possible applications of the problem?