Please do not repost or otherwise distribute the materials
available from this website. Some material is available freely on
the web, other is behind a paywall, other is private and we only
have permission to use the material in class, not distribute.
- Review 1 Due date: October 6.
Submit here.
Problems
with SQL. Read sections 1-3; skim over 4-6.
A Case Against
SQL: csenetid or uwnetid. Skip slides 32-35 and 65-end.
Some suggested topics for discussion in your review:
- What was the original motivating principle behind the
design of SQL? Did it age well?
- What are the strength and the weaknesses of SQL? What
does the paper try to change and what does it try to preserve?
- Give an example of a SQL query in standard syntax and show
the same query in the proposed improved syntax.
- Give an example of two SQL queries that are equivalent
when tables do not contain NULLs, but become inequivalent when
the tables contain NULLs.
- Review 2 Due date: October 13.
Submit here.
What goes around
Read sections 1-5 and 10. The other sections are not
recommended and we will not discuss them in class.
Some suggested topics for discussion in your review:
- What is physical and logical data independence?
- Briefly discuss physical and logical data independence in IMS, Codasyl, and the relational model.
- Speculate what led to the decline of IMS / Codasyl and rise of the relational model.
- Optional: read this recent follow-up paper and comment on it instead (we won't discuss it in class)
- Review 3 Due date: October 20.
Submit here.
PAX Read sections 1-4; we
will discuss them in class. Sec. 5-7 are optional and will not
be covered in class.
Column Store Read sections 1 and 2, skim over Sec. 3. If you want to learn about Column Stores in detail, read Sec 4: we don't have time to cover this topic in class
Some suggested topics for discussion in your review:
- Describe the main tradeoffs between row-store and column-store.
- Explain the significance of Fig. 1.2 in the column-store paepr.
- What is the difference between PAX and column-store? What
are their pros and cons?
-
Review 4 Due date: October 27.
Snowflake. Submit here.
Read sections 1,2,3, skim over 4, and read Sec. 6.
Suggestions for topics to address in your review:
- What is elasticity, why is it important, and how is it supported in Snowflake?
- How is data storage handled in Snowflake, and why? What would have been the alternatives?
- How are worker failures handled in Snowflake?
- How does snowflake handle semistructure data?
- Review 5 Due date:
November 3 November 5.
Submit here.
Query Compiler Read sections 1,2,3; skim over section 4 (in particular, check out Fig. 6, make sure you understand it).
Some topics to focus on (may use them in your review)
- Our focus in this paper is on the two query evaluation models: Volcano-style (pull-based) and data-driven (push-based). The most important piece is Fig. 3.
- The Futamura projection is less relevant to us (but a great concept to learn, given the opportunity).
- The Volcano model: is covered in all textbooks, and we cover it in class too.
- The data-driven model: is a new model (introduce in HyPer), which is well described in this paper
- How do these two models work? What is their difference?
- A separate (but important) topic is whether the query engine is interpreted or compiled. How was the original System R implemented? How are most commercial and open-source database systems implemented today? What are the pros and cons?
Optional: Vectorized v.s. Compiled Sections 1 and 2 are a short introduction to vectorized v.s. data driven query processing.
-
Review 6 Due date: Wednesday, November 17.
How good are they? Submit here.
Read carefully the entire paper. We will discuss most of it in
class.
Optional: read this
short follow-up
paper, written ten years later.