"Most current researchers [and probably you] were not around for many of the previous eras [of database development], and have limited (if any) understanding of what was previously learned. There is an old adage that he who does not understand history is condemned to repeat it. By presenting "ancient history", we hope to allow future researchers to avoid replaying history." - Stonebraker, Hellerstein
As part of 544M, we ask you to read 5 papers related to the material we cover in class. For each paper, we ask you to submit a 1- to 2-page (single-spaced) write-up that answers a few high-level questions about the paper.
The selected material corresponds to a graduate-level database course. In fact, we read these papers (and more) in the graduate 544 course. The papers collected here are originally sourced from the book "Readings in Database Systems", which is commonly referred to as the "Red Book" within the database community.
Each write-up will be graded as CREDIT/NO-CREDIT. To get credit, the write-up must demonstrate that you read the paper and that you reflected on it.
See course calendar for deadlines.
"What Goes Around Comes Around" by Michael Stonebraker and Joseph Hellerstein [PDF] (focus on sections 1-4 and skim over the rest)
This article was originally published in the Red Book.
While reading this paper, focus on the following questions:
DUE Jan 11th
"The Anatomy of a Database System" by Joseph Hellerstein and Michael Stonebraker [PDF] (focus on sections 1-4 and skim over the rest)
This article was originally published in the Red Book.
For this paper, we do not pose any specific questions. Please just write a summary of some of the key points in this paper. Make sure that your summary demonstrates that you reflected on the paper. So for example don't state things such as "Some systems use application-level threads while others use processes" but rather summarize the key advantages of each design choice.
DUE Jan 29th
Before reading this paper, you may want to read the book chapters on query optimization.
"Access Path Selection in a Relational Database Management System" by P. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, and T. Price [PDF]
This paper was originally published in the Proceedings of ACM SIGMOD, 1979.
While reading this paper, focus on the following questions:
DUE Feb 12th
"Parallel Database Systems: The Future of High Performance Database Systems" by Dave DeWitt and Jim Gray [PDF] (focus on sections 1 and 2 only)
"MapReduce: Simplified Data Processing on Large Clusters" by Jeffrey Dean and Sanjay Ghemawat [PDF]
These papers where originally published in Communications of the ACM, 1992 and OSDI, 2004, respectively.
Please submit a single write-up for both papers (no more than 2 pages in length). In your write-up, please discuss the similarities and differences between parallel DBMSs and MapReduce systems.
DUE Mar 12th