In this course, we will study the specialized systems and algorithms that have been developed to work with data at scale, including parallel database systems, MapReduce and its contemporaries, graph systems, and streaming systems. We will also go over core techniques of cloud platforms; and important scalable algorithms.
UW requires everyone to wear a mask in the classroom. See here.
The instructor may temporarily remove their face coverings when formally instructing in a large space from behind a podium or in a stage-like setting. Physical distance of at least six feet from others is required.
Each statement should be at most one page in length, written as a set of bullet points. The statement should demonstrate that you read and thought about the paper.
Reviews are due before the lecturer. There are no late days.
Three assignment involving three big data systems: Redshift, Spark, Snowflake.
Some mini-assignments using stream and/or graph databases
You have up to 4 late days per quarter (for unexpected events), max 2 late days per homework. No late days afterwards.
Projects are in teams of 1 or 2. Milestones are due on time.
There are no late days.
I reserve the right to add or subtract points based on your participation in class.
Each week, after lecture, we will have a 50-minute section that will give you hands-on demonstrations and tutorials of various big data systems and cloud services. Please bring your laptop Each section will be connected to either a full assignment or a mini assignment.