In this course, we will study the specialized systems and algorithms that have been developed to work with data at scale, including parallel database systems, MapReduce and its contemporaries, graph systems, streaming systems, and others. We will also go over core techniques of cloud platforms; and important scalable algorithms.

Instructor: Magdalena (magda) Balazinska, magda at Office hour: Tuesdays 4pm-5pm in CSE584.

TA: Parmita Mehta, parmita at Office hour: Thursdays 4pm-5pm in CSE482.

Lectures: Tuesdays -- 5pm - 7:50pm

Sections: Tuesdays -- 8pm -8:50pm

Location: CSE 403

The workload in the class involves the following:

All assignments and projects are to be done individually.

Link to DROPBOX.


Link to Final Project Presentation Schedule.

Each week, after lecture, we will have a 50-min section that will give you hands-on demonstrations and tutorials of various big data systems and cloud services.

The schedule is subject to change, so please check this website regularly for updates.

How it all fits together?

Week 1: Relational Database Management Systems (review)

Week 2: Parallel shared-nothing DBMSs & Cloud Deployments

Week 3: MapReduce (MapReduce/Hadoop)

Week 4: Best of Both Worlds Integration

Week 5: In-memory analytics

Week 6: In-depth Spark tutorial

Week 7: Column-store DBMSs and Big Data Systems Wrap Up

Week 8: Graph Processing

Week 9: Stream Processing

* Subscription: If you are registered for this class, your email address will automatically be added to the class mailing list (refreshed daily). You can setup a forward address at or change your subscription address here.

* Archive: You can access the archive for the class mailing list HERE.