CSE 546: Mini-project Guidlelines
Instead of a final exam, you should complete a mini-project. It can be on any ML-related topic, including those we have not covered in class. Examples are listed below.
Given the time constraints, the goal would be to put in as much time as it would take to do 1.5-2 homeworks. Of course, this is more challenging since you have to also define the project yourself. But, we hope it is also much more fun!
Collaboration: You can work alone or in groups of two. Groups are expected to do twice the work.
Proposal Date: Send Luke a short email describing your proposed project as soon as you have an idea, but definitely before the end of day on Monday, Feb. 26. Please come to office hours, or contact us if you need help deciding on a topic.
Due Date: Friday Mar. 16th, 5pm.
Submit: A final project report and a single compressed file containing source code with instructions describing how it should be run. The project report should be in pdf format with no more than 4 pages of primary content. You are allowed unlimited space for the citations and appendices, starting on page 5, but your story should be complete and understandable without reading this extra material. Group projects can have 6 pages of primary content. You should upload the files to the CSE 546 DropBox.
Project Ideas: A strong project will demonstrate understanding of topics in machine learning that are beyond the scope of what we covered in class. This can be done by, for example:
- Implementing an algorithm that we didn't have time to cover from a ML book or research paper (see list below). As a intermediate step, be sure to demonstrate interesting learning behavior on toy or simulated data. Here, you might explore issues such as overfitting, model selection, etc. A further goal would be replicate the results from a paper, but this can be surprisingly difficult to achieve in practice.
- Applying an existing algorithms to a new problems (see list of software below). In this case, you would be welcome to use data from your own research. A strong project would carefully describe the new problem, why the application is appropriate, the results achieved, and include an summary of what was learned from the exercise. Negative results can be interesting if you describe why you originally though the approach would work.
- Teaching yourself new ML topics and completing existing homeworks in this area. For example, you might study reinforcement learning and complete the Pac-man RL homework from last year's CSE573 course. Any topic is fine and you could also design your own assignment, as long as it demonstrates that you learned new ML topics. For this case, you could have a relatively short write-up describing what you are doing and also submit the homework with solutions.
- Other ideas of similar size and complexity are welcome. Feel free to pitch them to Luke if you are unsure.
Research Papers (in no particular order; feel free to suggest others)
Supervised Classification
- Ryan Rifkin and Aldebaro Klautau, In Defense of One-vs-All Classification. Journal of Machine Learning Research, Volume 5 (Jan): 101-141, 2004.
- Andrew Y. Ng and Michael I. Jordan, On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes. Advances of Neural Information and Processing System, 2001.
- Yoav Freund and Robert E. Schapire, Large Margin Classification Using the Perceptron Algorithm. Machine Learning, Volume 37, Issue 3, 1999.
- Robert E. Schapire, Yoav Freund, Peter Bartlett, and Wee Sun Lee, Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods. The Annals of Statistics, Volume 26, Issue 5, 1998.
- Thorsten Joachims, Text Categorization with Suport Vector Machines: Learning with Many Relevant Features. Proceedings of the European Conference on Machine Learning, 1998.
Semi-supervised Learning
Unsupervised Learning
- Lawrence K. Saul and Sam T. Roweis, Think Globally, Fit Locally: Unsupervised Learning of Low-Dimensional Manifolds, Journal of Machine Learning Research, Volume 4, 119-155, 2003.
- Michael E. Tipping and Christopher M. Bishop, Probabilistic Principal Component Analysis, Journal of Royal Statistics Society (B), Volume 61, Part 3, 611-622, 1999.
Machine Learning and Vision
Structured Prediction Models for Tagging in NLP
E-mail Spam Filtering
Software Packages (feel free to suggest others)