CSE 544M: Readings

As part of 544M, we ask you to read 5 papers related to the material we cover in class. For each paper, we ask you to submit a 1- to 2-page (single-spaced) write-up that answers a few high-level questions about the paper. This material corresponds to a graduate level database course. In fact, we read these papers (and more) in the graduate 544 course.

Each write-up will be graded as CREDIT/NO-CREDIT. To get credit, the write-up must demonstrate that you read the paper and that you reflected on it.

 

See course calendar for deadlines.

Paper 1: Data Models

Michael Stonebraker and Joseph Hellerstein. What Goes Around Comes Around. In "Readings in Database Systems" (aka the Red Book). 4th ed Focus on Sec 1-4, skim over the rest. [pdf]

While reading this paper, try to focus on the following questions

 

Paper 2: DBMS Architecture

Joseph Hellerstein and Michael Stonebraker. The Anatomy of a Database System. In Red Book (4th ed). Focus on Sections 1-4 and skim the rest [pdf].

For this paper, we do not post any specific questions. Please just write a summary of some of the key points in this paper. Make sure that your summary demonstrates that you reflected on the paper. So for example don't state things such as "Some systems use application-level threads while others use processes" but rather summarize the key advantages of each design choice.

 

Paper 3: Query Optimization

Before reading this paper, you may want to read the book chapters on query optimization.

P. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, and T. Price. Access Path Selection in a Relational Database Management System. Proceedings of ACM SIGMOD, 1979. Pages 22-34. Also in the Red Book (3rd ed and 4th ed) [pdf]

While reading this paper, try to focus on the following questions

Papers 4 & 5: Parallel data processing

Dave DeWitt and Jim Gray. Parallel Database Systems: The Future of High Performance Database Systems. Communications of the ACM. 1992. Also in Red Book 4th Ed. Sections 1 and 2 only. [pdf]

Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI 2004. [pdf]

Please submit a single write-up for both papers (no more than 2 pages in length). In your write-up, please discuss the similarities and differences between parallel DBMSs and MapReduce systems.