In this advanced graduate course, we will analyze the design and study the effectiveness and performance of a selection of big data management systems. We will study both batch- and stream-processing systems.

Instructor: Magdalena (magda) Balazinska, magda at Office hour: Mondays 12pm-1pm in CSE584.

TA: Cyrus Rashtchian, cyrash at Office hour: Fridays 9:30am-10:30am in the theory lab (CSE 306).

Lectures: Mondays and Wednesdays -- 9am-10:20am

Location: MGH 251

The workload in the class involves the following:



Link to DROP BOX. Please use the dropbox to submit your project idea, milestone, and final paper.

An exciting component of this course are practical, hands-on tutorials in class. All tutorial materials are publicly available on GitHub in the following repository:

Note that this schedule is subject to change, so please check this website regularly for updates. 

How it all fits together?

Week 1: Parallel DBMSs & MapReduce

Week 2: Best of Both Worlds Integration

Week 3: Column-store DBMSs

Week 4: In-memory analytics

Week 5: Parallel DBMS on Hadoop

Week 6: University of Washington Big Data Engine

Week 7: Machine-Learning Focused Systems

Week 9: Stream and Batch Processing

* Subscription: If you are registered for this class, your email address will automatically be added to the class mailing list (refreshed daily). You can setup a forward address at or change your subscription address here.

* Archive: You can access the archive for the class mailing list HERE.