Visual recognition is the process of automatically extracting valuable information from visual data (images and videos). This includes: knowing what objects are in a picture, where they are, what's happening, and where it is happening. Given the nature of the problem and the abundanc e of data available today, visual recognition is typically formulated as a machine learning problem.
In this course we study the problems of instance-level object recognition ("what is this?"), category-level object recognition ("what kind of thing is this?"), attributes ("what is this thing like?"), parts ("what is this thing composed of?"), action recognition ("what's going on ?"), pose estimation ("what is the configuration of this thing?"), detection ("where is this thing in the image?"), and segmentation ("where e xactly is this thing in the image?"). We will cover a wide range of supervised, semi-supervised, and unsupervised models (consisting of fully- , partially-, and un-labeled data), as well as transferable models in object and activity recognition (learning from one dataset or task and applying to a different one).
Students are recommended to be familiar with either computer vision or machine learning. If you are interested in this course and you have been exposed to neither computer vision nor machine learning please ask for instructor's approval. Throughout the quarter I will provide necessary background material.
This course will consist of lectures, student presentations, and a course project. Students who register for only one credit are not required to do a course project.
- Features and Representations[2.pptx][3.pptx]
- Standard Tools and Techniques[4.pptx][5.pptx][6.pptx] [7.pptx][8.pptx][9.pptx]
- Object Detection
- Main: A Discriminatively Trained, Multiscale, Deformable Part Model
- Main: Object Detection with Discriminatively Trained Part Based Models
- Speedup: Cascade Object Detection with Deformable Part Models
- Analysis: How important are Deformable Parts in the Deformable Parts Model?
- Analysis: Diagnosing Error in Object Detectors
- Pictorial Structures: Pictorial structures for object recognition
- Latent SVM: Support vector machines for multiple-instance learning
- Object Instance Recognition and Matching [Ji]
- Video Google: A Text Retrieval Approach to Object Matching in Videos
- Large Scale: Discovering Favorite Views of Popular Places with Iconoid Shift
- Image Webs: Computing and Exploiting Connectivity in Image Collections
- Object Retrieval with Large Vocabularies and Fast Spatial Matching
- City-Scale Location Recognition
- Object Category Recognition [Supasorn]
- Role of Context
- Stuff, Things, Geometry and Scenes [Ezgi,Andre]
- Learning Spatial Context: Using Stuff to Find Things
- Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics
- Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry
- Scene Semantics from Long-term Observation of People
- Context Based Object Categorization: A Critical Survey
- Weakly supervised Recognition [Yuyin]
- Unsupervised Recognition and Object Discovery [Tanner]
- Large Scale Recognition [Jinna]
- Parts and Attributes [Ila]
- Shared Representations, Transfer Learning
- Deep Architectures [Rob]
- From Video [Yongjin]
- From Still Images [Alex]
- Pose Estimation [Youngdae]
- Objects, Actions, and Poses [Jun]