In this advanced graduate course, we will analyze the design and study the effectiveness and performance of a selection of big data management systems. We will study both batch- and stream-processing systems.

Instructor: Magdalena (magda) Balazinska, magda at cs.washington.edu. Office hour: Mondays 12pm-1pm in CSE584.

TA: Cyrus Rashtchian, cyrash at cs.washington.edu. Office hour: Fridays 9:30am-10:30am in the theory lab (CSE 306).



Lectures: Mondays and Wednesdays -- 9am-10:20am

Location: MGH 251

The workload in the class involves the following:

Link to FINAL PROJECTS REPOSITORY.

Link to GRADEBOOK.

Link to DROP BOX. Please use the dropbox to submit your project idea, milestone, and final paper.

An exciting component of this course are practical, hands-on tutorials in class. All tutorial materials are publicly available on GitHub in the following repository: https://github.com/mbalazin/cse599c-17sp-tutorials

Note that this schedule is subject to change, so please check this website regularly for updates. 

How it all fits together?





Week 1: Parallel DBMSs & MapReduce

Week 2: Best of Both Worlds Integration

Week 3: Column-store DBMSs

Week 4: In-memory analytics

Week 5: Parallel DBMS on Hadoop

Week 6: University of Washington Big Data Engine

Week 7: Machine-Learning Focused Systems

Week 9: Stream and Batch Processing



* Subscription: If you are registered for this class, your email address @u.washington.edu will automatically be added to the class mailing list (refreshed daily). You can setup a forward address at myuw.washington.edu or change your subscription address here.

* Archive: You can access the archive for the class mailing list HERE.