Name of Reviewer Andrey Kolobov Key Contribution ------------------ The paper presents a method for learning a similarity measure that would allow an agent to determine if given images depict the same object instance, even if the agent has never seen the object(s) on the images before. The proposed similarity measure embeds domain-specific knowledge. The measure can be trained using image pairs in which images are only marked as "same" or "different" and carry no other labels. Novelty -------- I don't know the field too well, but the approach seems novel in that it is designed to perform well when the image pairs in the training set have only very primitive labeling (either "same" or "different"). Reference to prior work ----------------------- The Related Work section is quite comprehensive. Clarity ------- The paper suffers somewhat from the lack of clarity. Most sections are wordy and occasionally deviate unnecessarily from the course of explanation. Technical Correctness --------------------- The explanations are very detailed and straightforward, if a bit wordy. Some familiarity with SIFT and SVMs helps a lot in understanding the paper. Experimental Validation ----------------------- The experimental base seems very thorough. The authors ran the experiments on 4 data sets (3 of them publicly available) to justify their algorithm design decisions, demonstrate their algorithm's properties, and compare the algorithm to the previously proposed ones. Overall Evaluation ------------------ It is a good paper overall, fairly novel, with reasonably clear explanations and an excellent experimental results section. Questions and Issues for Discussion ----------------------------------- Based on the fact that the algorithm performs best when training and test are done on the same dataset, the authors claim that the algorithm must be embedding category-specific knowledge. This implies, however, that the algorithm may be overfitting to a particular dataset. Based on the algorithm description, how real is this danger? Authors are currently extending the algorithm to recognize similar object categories. How might one go about doing that?