Welcome to DATA516/CSED516: Scalable Data Systems and Algorithms!

In this course, we will study the specialized systems and algorithms that have been developed to work with data at scale, including parallel database systems, MapReduce and its contemporaries, graph systems, and streaming systems. We will also go over core techniques of cloud platforms; and important scalable algorithms.


Course Staff

Instructor: Jack Khuu; Office hours: Thursday 9:30PM - 10:30PM, by zoom (see canvas).

TA: Punya Prakash Shetty; Office hours: Wednesday 3:30PM - 4:30PM, by zoom (see canvas).

TA: Nayan Kaushal; Office hours: Tuesday 1:00PM - 2:00PM, by zoom (see canvas).


Instruction

Lecture: Tuesdays, 5pm-7:50pm, GWN 301

Section: Tuesdays, 8pm-8:50pm, GWN 301 (same room)


COVID-19 FACE COVERING POLICY

UW recommends everyone to wear a mask in the classroom. See here.

Homework Assignments and Project

See Calendar: here

Homework Assignments: use Gitlab

Project: here

Gitlab Instructions: here

Your gitlab repository: https://gitlab.cs.washington.edu/csed516-2022au/csed516-YourUwIdHere


Reading Assignments

See the list here


Communication:

Canvas (grades and panopto video recordings): here


Workload and Grading (note subject to change!)

Reading assigned papers and writing short statements (15%)

Each statement should be at most one page in length, written as a set of bullet points. The statement should demonstrate that you read and thought about the paper.

Reviews are due before the lecture. There are no late days.

Homework assignments (60%)

Three assignment involving three big data systems: Redshift, Spark, Snowflake.

Some mini-assignments using stream and/or graph databases

You have up to 4 late days per quarter (for unexpected events), max 2 late days per homework. No late days afterwards.

Short final project (25%)

Projects are in teams of 1 or 2. Milestones are due on time.

There are no late days.

Class participation

I reserve the right to add or subtract points based on your participation in class.


Sections

Each week, after lecture, we will have a 50-minute section that will give you hands-on demonstrations and tutorials of various big data systems and cloud services. Please bring your laptop Each section will be connected to either a full assignment or a mini assignment.