This is Project 4 for UW CSE P576 Computer Vision.
In this project, you will build a model to perform inference at pixel level in images. You may choose to tackle a regression problem where you estimate a continuous value at each pixel (e.g., RGB→Depth, Grey→RGB) or a classification problem where you estimate a label (e.g., RGB or RGBD→segmentation).
This is an open ended project. You are free to choose your own dataset, deep learning framework and compute setup, or follow one of the suggestions given below. See the section below for advice on choosing a suitable problem and dataset. You can then follow an approach similar to Project 3 to generate baseline results and extend to build an accurate model.
Assessment: To obtain full marks for this project, you should follow a logical approach, and clearly document your experimental process and findings. The marks are broken down into 5 project tasks: (1) Introduction and Problem Specification [5%], (2) Dataset Preparation and Understanding/Visualization [20%], (3) Establishing Baseline Performance [15%], (4) Model Design, Training and Analysis [40%], (5) Model Evaluation and Improvement [20%].
To obtain full marks, you should clearly explain your methodology and experimental results for each stage of the process (2-5). Clear methodology and analysis are more important than high quality results for the purposes of this project. The notebook sections below describe in detail what is needed for each part.
It is expected that you will look at other approaches and research general techniques for solving your problem, however the code that your write and model design and analysis should be your own.
What to turn in: Hand in a zipfile with a pdf report describing your experiments, and any source files or notebooks your created. To make your presentation clear, please use the same section headings (1-5) as given in the notebook below to organise your report. We have prepared templates to make this easier: docx, tex.
version 051620
The following are suggested projects that should generate good with results using relatively small models and datasets:
downsampled_imagenet
Tensorflow dataset) or use your own images. You may also choose any other regression or classification problem using a single or multiple images for input, but should generate a single image output image, with appropriate ground truth for evaluation. For example:
Several useful datasets are available as Tensorflow datasets, e.g., imagenet, downsampled_imagenet, coco, caltech_birds etc.
If you want to choose a project that does not fall under one these categories, or are unsure if your project meets the criteria, please post a "customized project request" via Piazza (deadline: 05/24) that briefly explains your project to the instructors to make sure your project is in the scope of the assignment. You must get approval from the instructor before starting your own project.
Write a short introduction and problem statement for your task.
Write code to prepare your data as needed, e.g., downsample, extract patches, perform data augmentation etc. This may not be needed if you are using an existing dataset and data input pipeline (e.g., tensorflow dataset).
Next, you should visualise your datasets and generate relevant statistics that can help with the training process. Some thoughts to consider: what is the data distribution? How can it be visualised? What is the label/target range/distribution? Are distributions spatially varying? What statistics are relevant to compute (e.g., mean, variance, other)? Use tables and/or figures where appropriate to show your findings.
If you are using an existing dataset, more marks will be awarded for understanding/visualising the data. If you generate your own dataset, marks will be awarded for your generation approach in addition to data exploration.
Establish baselines for performance on your dataset, for example using linear and/or nearest neighbour models (e.g., see Project 3). Consider if basic statistical methods or non-learned approaches might provide suitable baselines, e.g., mean depth per pixel for depth estimation, bilinear or bicubic interpolation for Super-Resolution, etc. Show results of your baseline on validation and test sets.
Design a new model for your pixel labelling problem. Describe your model clearly, using figures or tables if helpful. Consider the size of your input images, and effective receptive fields of your filters. Explore model architecture and hyperparameters, plotting validation accuracy as a function of model parameters to study their effect on performance. Try out different optimization strategies and loss functions. Summarise your explorations showing a range of model designs and validation performance in tabular form. If you can, identify trends and suggest explanations as to why some models are more effective than others.
Hint: To train models with limited compute in reasonable time, start using small images (possibly grayscale), or extract patches from your image inputs and targets. Also start with small models (e.g., small extensions to your baseline model), which will allow you to iterate fast.
As you increase model size, monitor training and validation error to make sure you are not overfitting. As you converge on a good model design, try using larger models and datasets, possibly with data augmentation, to get the best final performance.
Evaluate your model on the test set and compare with the baseline model. Plot model output for success and failure cases. Can you identify systematic errors or difficult cases for your model? Do the loss scores correlate well with your perception of quality? Visualise the first layer weights or use feature visualisation techniques to explore some other aspect of the model. Can you identify any interesting structure in your model? How do you think the model could be improved if you had more time or more compute power? Make a ranked list of ideas you think could improve the performance.