In this advanced graduate course, we will analyze the design and study the effectiveness and performance of a selection of big data management systems. We will study both batch- and stream-processing systems.

Instructor: Magdalena (magda) Balazinska, magda at Office hour: Mondays 12pm-1pm in CSE584.

TA: Cyrus Rashtchian, cyrash at Office hour: Fridays 9:30am-10:30am in the theory lab (CSE 306).

Lectures: Mondays and Wednesdays -- 9am-10:20am

Location: MGH 251

The workload in the class involves the following:



Link to DROP BOX. Please use the dropbox to submit your project idea, milestone, and final paper.

We created a slack channel for the course under The channel is called cse599c-17sp. This is a good place to ask questions if you have any problems with any of the tools or if you have any other questions or comments.

An exciting component of this course are practical, hands-on tutorials in class. All tutorial materials are publicly available on GitHub in the following repository:

Note that this schedule is subject to change, so please check this website regularly for updates. 

How it all fits together?

Week 1: Parallel DBMSs & MapReduce

Week 2: Best of Both Worlds Integration

Week 3: Column-store DBMSs

Week 4: In-memory analytics

Week 5: Parallel DBMS on Hadoop

Week 6: University of Washington Big Data Engine

Week 7: Machine-Learning Focused Systems

Week 9: Stream and Batch Processing

* Subscription: If you are registered for this class, your email address will automatically be added to the class mailing list (refreshed daily). You can setup a forward address at or change your subscription address here.

* Archive: You can access the archive for the class mailing list HERE.