Description

Have you enjoyed coding up your own machine learning models? Do you want to try being (more like) a real-world ML practitioner? Then this Extra Credit Final Project is for you!

  • This project will take place as a Kaggle competition. You can log into Kaggle with your UW/CSE email address.
  • You will be using a real-world dataset of student performance data on HarvardX-MITx, and predicting whether the students will be certified in the course or not.
  • You can use ANY models you’d like – try the models you’ve learnt in this course, or feel free to try models that are above and beyond the course content!
  • You’ll have access to a train dataset with labels, and a test dataset without labels – this is designed to mimic the fact that in the real world, the evaluation of your model will be on unseen data.

This extra credit assignment is worth 10 points, with 6 being directly from the accuracy of your model, and 4 being from a detailed report describing the approach(es) you used. More information on evaluation is in the Kaggle competition. You’ll be submitting your final report (as a notebook) on Gradescope, and it is due on the same day as the final homework, Sun June 5 11:59PM, with no late days.

You can work in a team of 4.

What you need to submit on Gradescope:

  • A Python / Jupyter notebook file with fully commented code.
  • A writeup pdf with the following information:
    • Your reasoning for the entire model training pipeline (what this dataset is about, what leads you to choose certain models, any tradeoffs you have to weigh, etc.)
    • Discussion questions outlined in the Colab notebook
    • Any difficulties your team has faced during this process, and what you have learned from this experience.

Note: the groups with top accuracies or quality writeups will receive more extra credit too (up to 5 points), so please try your best to score as high as possible on the leaderboard and analyze this dataset to the best of your ability.

Ready to jump in? Here is the link to the competition!