Colabs (20%)

There will be 10 Colabs in total: Colab 0 (Spark tutorial), and Colab 1 to 9 (released weekly). Each one of them is worth 2%. Colab 0 will be solved in real-time during the first Recitation Session.

Homeworks (40%)

There will be four longer homework assignments (10% each). Homework assignments should be submitted on Gradescope as a PDF. In addition, you should upload all the code associated with your assignment on Gradescope. No handwritten work will be accepted. Math formulas must be typeset using LaTeX or other word processing software that supports mathematical symbols (E.g. Google Docs, Microsoft Word).

To register for Gradescope,

Students also need to upload their code at Gradescope, prior to the assignment due date. Put all the code for a single question into a single file and upload it. Only files in text format (e.g. .txt, .py, .java) will be accepted.

Course Project (40%)

The course project is a group-based project and includes a proposal (20%), a milestone (20%), a final report (50%), and a project presentation (10%). See details on the project page .

Extra Credit (up to 2%)

Students may be awarded extra credit for actions that improve other students' CS547 experience. Examples include:

Everyone who has 0 points in the extra credit sheet will be given no extra credit.

Everyone who has a nonzero number of points in the extra credit sheet will be sorted by the number of extra credit points they have. The amount of extra credit you receive will depend on your rank in the list, as in the following example calculation:

Raw points
Amount of extra credit given

Note: Identity management

Please use the same username (your UW or CSE NetID) for all of your coursework and for registering into Gradescope and EdDiscussion, so that we can give you appropriate credit for your work.

Example: If your UW email address is, then

If you are having trouble registering for these services under your UW NetID, please send an email to the course staff mailing list during the first week of class, so that we may make a note of it.

Homework Policies


Four homeworks, released every two weeks, that will involve programming, working with Spark, as well as numerical/theoretical problems.

Questions: We try very hard to make questions unambiguous, but some ambiguities may remain. Ask (i.e., post a question on EdDiscussion) if confused, or state your assumptions explicitly. Reasonable assumptions will be accepted in case of ambiguous questions. As per the extra credit policy, you may receive extra credit for pointing out ambiguities in course material.

Collaboration Policy & Honor Code: We take honor code extremely seriously ( We strongly encourage students to form study groups. Students may verbally discuss and work on homework problems in groups. However, each student must write down the solutions and the code independently. Students may not share work or programs (on paper, electronic, or any other form) with anyone else. Importantly, in their submissions, each student should write down the set of people whom they interacted with including anyone not taking this class or not working at UW (excluding the instructor and TA(s)). Students should appropriately cite any helpful material they find in published literature or on the web including assignment’s answer, or partial answer. Students should not claim to have come up with an idea that wasn’t originally theirs; instead, should explain it in their own words and make it clear where it came from.

Since we occasionally reuse problem set questions from previous years, we expect students not to copy, refer to, or look at the solutions in preparing their answers. It is cheating to intentionally refer to a previous year's solutions. This applies both to the official solutions and to solutions that you or someone else may have written up in a previous year. We may run your code through a system for detecting software plagiarism.

Finally, we consider it an academic integrity violation to post your homework solutions to a place where it is easy for other students to access it. This includes uploading your solutions to publicly-viewable repositories like on GitHub.

Late assignments: Each student will have a total of two late periods to use for homeworks. Late periods may not be used on any course project deliverables. A late-period lasts 48 hours from the original deadline (this means that if an assignment is due on Thursday the late period goes to the Saturday at 11:59pm Pacific Time.) No assignments will be accepted after the late period is due.

Assignment submission: All students should submit their assignments via Gradescope by 11:59PM on the due date. You can typeset or scan your assignment, but you should upload a PDF rather than submitting as images.

We will use Gradescope for the submission of code as well. Please make sure to tag each part correctly on Gradescope so it is easier for us to grade. There will be a small point deduction for each mistagged page and for each question that includes code. Put all the code for a single question into a single file and upload it. Only files in text format (e.g. .txt, .py, .java) will be accepted. There will be no credit for coding questions without submitted code on Gradescope, or for submitting it after the deadline , so please remember to submit your code.