In this course, we will study the specialized systems and algorithms that have been developed to work with data at scale, including parallel database systems, MapReduce and its contemporaries, graph systems, and streaming systems. We will also go over core techniques of cloud platforms; and important scalable algorithms.
UW recommends everyone to wear a mask in the classroom. See here.
Each statement should be at most one page in length, written as a set of bullet points. The statement should demonstrate that you read and thought about the paper.
Reviews are due before the lecture. There are no late days.
Three assignment involving three big data systems: Redshift, Spark, Snowflake.
Some mini-assignments using stream and/or graph databases
You have up to 4 late days per quarter (for unexpected events), max 2 late days per homework. No late days afterwards.
Projects are in teams of 1 or 2. Milestones are due on time.
There are no late days.
I reserve the right to add or subtract points based on your participation in class.
Each week, after lecture, we will have a 50-minute section that will give you hands-on demonstrations and tutorials of various big data systems and cloud services. Please bring your laptop Each section will be connected to either a full assignment or a mini assignment.