590V Vision Seminar Paper: Inferring Temporal Order of Images From 3D Structure (schindler, dellaert, kang) Wed 10/10/2007 Name of Reviewer --------------------- Kathleen Tuite (ktuite@cs) Key Contribution ------------------ This paper presents the first known method for inferring temporal order of images. The images this technique applies to are ones with which contain information about 3D structures over time, where some 3d points can be (manually) matched up between images, and where the 3D structures come into/go out of existence like urban buildings, and where if a structure/building exists and then gets torn down, it will never get re-built in the same way. It's a neat concept, though it seems to have pretty limited applications right now and would only really work on photo collections of urban landscapes without fog, trees, and other things that could obscure parts of buildings. Novelty -------- Yes, this paper is novel. It cited another paper mentioning 4D structure, but I'm not even sure what they mean by 4D structure -- they sure don't explicitly refer to 'time' very often. Reference to prior work ----------------------- Work this paper should cite that it doesn't? I have no idea. Clarity ------- Yes, this paper is clear. As a new grad, I was afraid it wouldn't be, but it turned out to be surprisingly straightforward. Every tricky case I thought of where the method might not work was addressed in the next paragraph I read. The motivation seems pretty clear -- it's a neat idea, the explanation is clear, and the results (though not as astounding as they could be) are also clearly described. Technical Correctness --------------------- Seems fine to me. Experimental Validation ----------------------- Here's where things seemed kind of lacking. There were 2 or 3 experiments on real photo sets that they mentioned, though they didn't use photos they actually knew the times of. The six-image photo set of the light and dark sky scrapers is hard for me, as a human, to verify that it's in temporal order. The 20 image set is a little easier to appreciate, if only because the greater number of images I'm looking at give me more context for the scene as a whole. They did set up their own experimental structure to photograph from different angles/at different times, so I believe that their method works, they just haven't used it in interesting/exciting enough experiments yet. Overall Evaluation ------------------ Definitely a cool, new idea, that made sense and was easy to follow. Right now the method seems to have a pretty limited application, applying only to photos of urban areas with blocky, prominent buildings. And it's too bad that the feature detection and matching has to be done manually for now. Results of experiments need more information/context for me to find them interesting. I believe the method works, but you'll have to show me it working on bigger/better photo sets and maybe describe the region/city I'm looking at in the photos. Questions and Issues for Discussion ------------------------------------ What applications could this method be used for? What sort of interesting result could be made with it? What changes/additions to the method itself would yield more interesting results? I'm envisioning an animation of a city growing and flourishing over time, with black and white grainy images gradually becoming 'newer looking', less grainy and more colorful. If a 3d+time model of the city could be built and animated from a movable user-selected view point, that would be cool. If images are put in temporal order, I want to be able to watch things evolve over time, but I don't want to get whipped around between a hundred different view points. Obviously this work could be (and is being) extended to deal with more complicated scenes, containing objects that could obscure parts of objects/buildings in an inconvenient way. If it's a little more robust, could this method be applied to photos from the internet/flickr or urban scenes? Could there be a way to match up more geographic features like hills, mountains, and coast lines?