Frequently Asked Questions

Q: What are the prerequisites for the course?

A: Students are expected to have good programming skills. Knowledge of Python/Java is especially important since you will be writing lots of Spark code. In addition to programming basic knowledge of probability/statistics and linear algebra is required as well.

Q: What constitutes a large dataset for this class? How large is large enough?

A: If your project uses very simple machine learning algorithms and solves a simpler task (e.g., supervised one-vs-all classification), the size of the dataset and quality/complexity of the problem must make up for it. If there is high complexity in your model or task (e.g., protein folding predictions), we understand that datasets must be much smaller.

Q: How big is big enough for a big data class?

A: It depends. The datasets we listed on the website are those feasible projects that can be constructed. For online datasets, in-depth modeling typically corresponds with smaller data sizes. For example, a student has found detailed MLR images, but the dataset only contains a couple hundred data points. An ideal dataset would have some complexity and non-trivial aspects. The key is to find the right tradeoff between complexity and data size. You are encouraged to check in during anyone's office hours.

Q: How novel does the algorithm need to be for this class?

A: There is no expectation of novelty, as this is course project which is intended to engage students with material and reinforce concepts learnt in class. However, a project with this goal in mind would be encouraged by the teaching staff.

Q: What makes for a good project in this class?

A: A good project is one that can be evaluated well. You need to be able to tell whether you were successful or to what degree. This is generally easier in supervised learning tasks, where you have access to labeled data. Subjective interpretations of clustering without additional validation or comparisons to ground truth labeled data are not good enough.

Q: Where can we find example projects or example datasets?

A: Example projects are described at the bottom of page here.

Q: Could we rerun existing models on new datasets, how novel is this?

A: As stated previously, novelty is not a requirement for the course project. Projects should contribute to at least one of the following areas. Great projects will make contributions to two or more:

  • Complexity of Model/Algorithmic Development (e.g., are you using more traditional/simple architectures like ResNet50for vision tasks or more complex/modern Transformers for NLP or contrastive learning models like OpenCLIP?)
  • Dataset Size and Complexity (e.g., for vision tasks a long-tail dataset such as iNaturalist may be more complex than balanced . Both datasets are much larger than CIFAR-10.)
  • Comprehensiveness and Depth of Evaluation. Is your method compared well to baseline methods? Is it evaluated on multiple tasks or datasets?
  • Strength and Complexity of Insights to Application domain. Are there insights that are backed up empirically (with the evaluation from Point 3) with possible theoretical grounding?
  • Q: What if I just run algorithms from Spark (or some other popular model libraries) library on an interesting dataset?

    A: This is okay if you have a comprehensive and detailed evaluation or strong insights in the application domain. There should also be a discussion of what makes the dataset, research question, and implications interesting.

    Q: How do I submit my assignments?

    A: All students should submit their assignments electronically via GradeScope. Simply sign up on the gradescope website and use the course code X3WYKY. Please use your UW NetID if possible. No handwritten work will be accepted. Math formulas must be typeset using LATEX or other word processing software that supports mathematical symbols (e.g., Google Docs, Microsoft Word).

    All students will be given two no-questions-asked late periods, but only one late period can be used per homework and cannot be used for project deliverables. A late-period lasts 48 hours from the original deadline (so if an assignment is due on Thursday at 11:59 pm, the late period goes to the Saturday at 11:59pm Pacific Time). Homework that is submitted beyond the late period will not be graded and receive zero credit.

    For the non-coding component of the homework, you should upload a PDF rather than submitting as images. We will use Gradescope for the submission of code as well. Please make sure to tag each part correctly on Gradescope so it is easier for us to grade. There will be a small point deduction for each mistagged page and for each question that includes code. Put all the code for a single question into a single file and upload it. Only files in text format (e.g. .txt, .py, .java) will be accepted. There will be no credit for coding questions without submitted code on Gradescope, or for submitting it after the deadline , so please remember to submit your code.

    Q: How do I submit a regrade request?

    We take great care to ensure that grading is fair and consistent. Since we will always use the same grading procedure, any grades you receive are unlikely to change significantly. However, if you feel that your work deserves a regrade, please submit a request on GradeScope within one week of receiving your grade.

    Before requesting a regrade, please prepare a clear and concise argument for your stance by doing the following:

    And then submit your regrade request via GradeScope. We reserve the right to regrade the entirety of any homework for which any regrade is requested.

    Q: Who makes the decision for a regrade request?

    Every time you submit a regrade request for a problem, an email gets sent to both Prof. Althoff and the TA who graded your problem. Submitting multiple regrade requests on the same problem set will result in multiple emails being sent. All TAs will be able to see your request, but the original grader of the problem will have the final say in determining your grade, because after reading 100 solutions to the same problem, they become the expert in which answers are right and which ones are wrong. (In particularly ambiguous cases, the original grader will usually consult with other TAs before replying to your request, but they will still make the final decision.)

    The head TA does not technically have the power to override the original grader. However, they can make strong recommendations to the original grader, if they disagree with their decision.

    Q: What actions will be taken after a regrade request?

    Regrade requests will only be honored in cases where the TA made a clear error in grading your problem set.

    If a TA gives back points to someone who submitted a regrade request, the TA must give back points to all people who had a similar deduction, even people who did not submit a regrade request. If a TA violates this policy, you should email the head TA.

    If you are not sure whether your regrade request is justified or not, come to office hours and speak to a TA.

    Q: What are some good and bad regrade requests?

    Examples of good regrade requests include

    Examples of bad regrade requests include

    Q: How do I register for EdDiscussion?

    A: Navigate to Please use your UW email address for registering (