CSE 547/Stat 548, Spring 2018

Your Course Project

Figure 1: Some important tasks in Computer Vision that this course project considers.1


Important Dates


Required (and Extra) reading

The following readings are important for understanding the setup and the motivation, along with modern techniques. Extra reading:


Introduction and background: In this class, we will work with the MS-COCO dataset [Lin. et. al. 2014] and understand several aspects of building machine learning pipelines that deal with issues stemming from large amount of high-dimensional data processing. MS-COCO is one of the current state-of-the-art datasets used for developing and benchmarking in several computer vision tasks including (a) semantic segmentation, (b) object localization and detection, (c) multi-label learning, (d) image retrieval, amongst other interesting and practically relevant questions. Homeworks 1 and 2 were useful warmups to get familiar with the dataset for binary and multilabel classification. For the project milestone due on May 17, you will build an object detector on the small dataset. The final project will build on this with nearest neighbor retrieval and/or metric learning.

Setup: The instructors have setup the dataset in a way that allows for any complications stemming from data/feature extraction to be abstracted out in order to maximize the time spent on developing and evaluating the machine learning models. Features have been provided from a convolutional neural network. Ideas are taken from [Girshick 2015; He et. al. 2015].

Dataset splits: MS-COCO has 12 super-categories, and multiple categories in each supercategory. First, we just consider 2 super-categories and solve all the downstream tasks as a simple problem involving predicting category (a) or (b) or none. Then, we will consider all categories that are a derivative of these 2 super-categories for multi-label classification and object detection.

Starter code: Image features (see above) and code to read these features has already been provided with hw1. Soon, we have also provided code that allows you to (a) find rectangles that are likely to contain objects in each image (''regions of interest'') using the selective search functionality in OpenCV, and (b) extract features corresponding to any rectangle in an image. You are welcome to use your own code instead.

Task details: Here is a brief overview of the project. Broadly, the project is composed of two parts: the first part deals with object detection/localization; the second part consists of nearest neighbor search and retrieval.

Optional things you can try: In addition to (mandatory) choice of one of above two options (or both if you are in groups), you are welcome to try out additional methods. Some ideas are:

Useful pointers: It is important to note that AWS credits must be rationed and used through out the course. Running out of AWS credits is a sign that the rationing and management of AWS credits has not been good enough. The assignments/projects have been designed in a way that a budgeted utilization will make sure that a student does not run out of credits. Always use your own laptops for the purpose of debugging and unit testing your code. AWS should be used only when you are sure that the code is stable and does not carry any bugs, to the best of your knowledge.


Project options:

You must choose one of the two following options. See the next secion for full details.

You are of course free to do both. Also, you are allowed to work in groups of up to two, in which case you have to complete both options above, i.e., object detection + retrieval on the large dataset with a data structure + metric learning for brute force retrieval on the smaller dataset. Note that the object detection task uses the same dataset regardless of whether you choose option 1 or option 2.

Once the project reports are submitted, we will post a summary on canvas, along with excerpts showing the highest performing methods for each option (as judged by the evaluation metrics below).

Further Instructions

Instructions specific to option 1:

Evaluation Metric:


Instructions specific to option 2:

Evaluation Metric:




The grading of the final project is split amongst three deliverables:

Your final report will be evaluated by the following criteria:


Poster Session

We will hold a poster session in the Atrium of the Paul Allen Center. Each team will be given a stand to present a poster summarizing the project motivation, methodology, and results. The poster session will give you a chance to show off the hard work you put into your project, and to learn about the projects of your peers.

Here are some details on the poster format:

You must submit your poster on Canvas.


Project Report

Your final submission will be a project report on Canvas. Your write up should be 8 pages maximum in NIPS format, not including references. You must use the NIPS LaTex format. You should describe the task you solved, your approach, the algorithms, the results, and the conclusions of your analysis. Note that, as with any conference, the page limits are strict. Papers over the limit will not be considered.


1. Source: Fei-Fei Li, Andrej Karpathy & Justin Johnson (2016) cs231n, Lecture 8-Slide 8, Spatial Localization and Detection. Available here.