Submit paper reviews here.

Date Description

Data Models

Sept 29

Lecture 1: Introduction and the Relational Model

Reading: None

Additional resources:

  • For each lecture, additional resources refer to extra readings that provide more information about the material we cover in class.
  • In the case of book chapters from the R&G book, the chapters listed provide a less detailed and more accessible overview than the papers. It is a good idea to take a look at these book chapters if the papers become confusing. You should still strive to get a depth of understanding at the level of the papers.
  • Chapters 1 and 3 through 5 in R&G book,
  • E.F. Codd. A relational model of data for large shared data banks. Communications of the ACM, 1970 [pdf]
Oct 10

Lecture 2: Relational Algebrfa and SQL

Reading: None

Additional resources:

  • Chapter 4, "Relational Algebra and Calculus," and Chapter 5, "SQL DML" in R&G.
Oct 11

Lecture 3: Schema Normalization

Reading: None

Additional resources:

  • Chapter 19, "Schema Refinement and Normal Forms" in R&G.
  • Chapter 2, "Introduction to Database Design," and Chapter 3.5, "Logical Database Design: ER to Relational" in R&G
Oct 13

Lecture 4: Data Models: A Never-ending Story

Reading: Stonebraker and Hellerstein, "What Goes Around Comes Around." In "Readings in Database Systems" 4th ed. [pdf]. Read only sections 1-4 and 8-11.
Please focus on the following questions in your review:

  • What is physical and logical data independence?
  • Briefly discuss physical and logical data independence in IMS, Codasyl, and the relational model.
  • Speculate what led to the decline of IMS / Codasyl and rise of the relational model.

Submit your paper review here (please use plain text or pdf).

Additional resources:

  • Carey and DeWitt, "Of Objects and Databases: A Decade of Turmoil," VLDB 1996 [pdf] (Very interesting read for those who are interested in object databases).
  • Copeland and Maier, "Making Smalltalk a Database System," SIGMOD 1984 [pdf] (Origin of the term "impedance mismatch").
  • Cattell, "Scalable SQL and NoSQL Data Stores", SIGMOD Record, December 2010 (original publication date).
Oct 18

Lecture 5: Datalog (1)

Additional resources:

  • Chapter 24 in R&G.
Oct 20

Lecture 6: Datalog (2)

Reading: Hellerstein, "The Declarative Imperative," SIGMOD Record 2010; Sections 1-3 only [pdf].
Please focus on the following questions in your review:

  • What is the urgency of parallelism?
  • How does the datalog approach to parallelism differ from other approaches?
  • If you were to build networking switches or distributed systems, would you use Datalog-like languages as described in the paper? Why or why not?

Submit your paper review here (please use plain text or pdf).

Additional resources:

  • Chapter 24 in R&G.

Query Execution

Oct 25

Lecture 7: Lifecycle of a Query Plan

Reading: Sec. 4 from Hellerstein and Stonebraker, "The Anatomy of a Database System." In "Readings in Database Systems" 4th ed. [pdf], or Sec. 4 from "Architecture of a Database System" (slightly more detailed version of the Red book article) [pdf].

You don't need to turn in a review for this lecture.

Additional resources:

  • Shapiro, "Join processing in database systems with large main memories." ACM Transactions on Database Systems 11(3), 1986. Sections 1 and 2 only [pdf].
  • Chapters 12-14 in R&G.
  • Graefe, "Query Evaluation Techniques for Large Databases." ACM Computing Surveys 25(2), 1993 [pdf].
Oct 27

Lecture 8: Query Optimization

Reading: Selinger et al, "Access Path Selection in a Relational Database Management System." Proceedings of ACM SIGMOD, 1979. Pages 22-34 [pdf].

You don't need to turn in a review for this lecture.

Additional resources:

  • Chaudhuri, "An Overview of Query Optimization in Relational Systems," Proceedings of ACM PODS, 1998 [pdf].

Database Theory

Nov. 1

Lecture 9: Structural Query Optimization

Recommended Reading: Database Theory book Chapter 6.4

Nov. 3

Lecture 10: The AGM Bound

Nov. 8

Lecture 11: Generic Join

Recommended Reading: Hung Q. Ngo, Christopher Ré, Atri Rudra: Skew strikes back: new developments in the theory of join algorithms. SIGMOD Record, 2013

Parallel Data Processing

Nov 15

Lecture 12: Map Reduce and Spark

Recommended Reading: DeWitt and Stonebraker, "MapReduce: A major step backwards," The Database Column, January 2008 (make sure you skim through the comments in addition to the article) [online article].

Recommended Reading: Zaharia et al, "Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing," Proceedings of NSDI 2012 Sections 2, 3, and 4 only [pdf].

  • Do you agree with the arguments made in the DeWitt and Stonebraker paper? Why or why not?
  • Is Spark just another Map Reduce in terms of being a programming model for data analytics? How do they differ?

Additional resources:

Nov 17

Lecture 13: Parallel Databases

Recommended Reading: DeWitt and Gray, "Parallel Database Systems: The Future of High Performance Database Systems," Communications of the ACM. 1992. Sections 1 and 2 only [pdf].

  • What are parallel databases good for?
  • Where do MapReduce and Spark fit into the taxonomy of parallel databases?

Additional resources:

  • Chapter 22 in R&G.
  • DeWitt et al, "The Gamma Database Machine Project," IEEE Transactions on Knowledge and Data Engineering, Volume 2 Issue 1, March 1990, p. 44-62 [pdf].
Nov 22

Lecture 8: Data Warehouses and Column Stores

Reading: Abadi et al, "The Design and Implementation of Modern Column-Oriented Database Systems," Foundations and Trends® in Databases (Vol 5, Issue 3, 2012, pp 197-280) Sections 1, 2, 4 (read 4.1, 4.4., 4.5, skim over the others and skim Section 3) [pdf].
Please focus on the following questions in your review:

  • What are the differences between column and row oriented data stores?
  • Discuss at least one technique from Section 4.
  • What are column stores good for? (Hint: see background reading below)

Submit your paper review here (please use plain text or pdf).

Background on data analytics:

  • Chapter 25 in R&G.
  • Gray et al. "Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals," Data Mining and Knowledge Discovery 1, p. 29-53 (1997) [pdf]

Transactions

Nov 29

Lectures 15-16: Transactions: Concurrency Control (Part 1)

Reading: Franklin, "Concurrency Control and Recovery," from The Handbook of Computer Science and Engineering, A. Tucker, ed., CRC Press, Boca Raton, 1997. [pdf]. Note: review due on Dec. 6.

Additional resources:

  • Chapter 16, 17 in R&G.
Dec 1

Lecture 15-16: Transactions: Concurrency Control (Part 2)

Reading: Franklin, "Concurrency Control and Recovery," from The Handbook of Computer Science and Engineering, A. Tucker, ed., CRC Press, Boca Raton, 1997. [pdf].Note: review due on Dec. 6.

Dec 6

Lecture 17-18: Transactions: Recovery (part 1)

Reading: Franklin, "Concurrency Control and Recovery," from The Handbook of Computer Science and Engineering, A. Tucker, ed., CRC Press, Boca Raton, 1997. [pdf].

Submit your paper review here (please use plain text or pdf).

  • What is a transaction and what are the ACID properties?
  • What is serializability? What is a serializable schedule?
  • How does two-phase locking (2PL) work?
  • What is the "phantom problem"?
  • What are some benefits and drawbacks of providing the notion of a transaction in a DBMS?

Additional resources:

  • Chapter 18 in R&G.
Dec 8

Lecture 17-18: Transactions: Recovery (part 2)

Reading: Franklin, "Concurrency Control and Recovery," from The Handbook of Computer Science and Engineering, A. Tucker, ed., CRC Press, Boca Raton, 1997. [pdf].

Dec 13

PROJECT POSTERS 2-4:30pm (no lecture)

Dec 15

No lecture