Please do not repost or otherwise distribute the materials that
you download from this website. Some of it behind a paywall, other
is private and we only have permission to use the material in class,
not distribute.
-
Review 1 Due date: Wed. January 15.
Submit here.
Problems
with SQL. Read sections 1-3; skim over 4-6.
A
Case Against SQL. Skip slides 32-35 and 65-end.
Some suggested topics for discussion in your review:
- What was the original motivating principle behind the
design of SQL? Did it age well?
- What are the strength and the weaknesses of SQL? What
does the paper try to change and what does it try to preserve?
- Give an example of a SQL query in standard syntax and show
the same query in the proposed improved syntax.
- Give an example of two SQL queries that are equivalent
when tables do not contain NULLs, but become inequivalent when
the tables contain NULLs.
- Review 2 Due date: Wed. January 22.
Submit here.
What goes around
Read sections 1-5 and 10. The other sections are not
recommended and we will not discuss them in class.
Some suggested topics for discussion in your review:
- What is physical and logical data independence?
- Briefly discuss physical and logical data independence in IMS, Codasyl, and the relational model.
- Speculate what led to the decline of IMS / Codasyl and rise of the relational model.
- Optional: read this recent followup paper and comment on it instead (we won't discuss it in class)
- Review 3 Due date: Wed. January 29.
Submit here.
PAX Read sections 1-4; we
will discuss them in class. Sec. 5-7 are optional and will not
be covered in class.
Column Store Read sections 1 and 2, skim over Sec. 3; we may read Sec 4 later (time permitting)
Some suggested topics for discussion in your review:
- Describe the main tradeoffs between row-store and column-store.
- Explain the significance of Fig. 1.2 in the column-store paepr.
- What is the difference between PAX and column-store? What
are their pros and cons?
- Review 4 Due date: Wed. February 5.
Submit here.
Query Compiler Read
sections 1,2,3; skim over section 4 (in particular, check out
Fig. 6, which should become clear).
Some topics to focus on (may use them in your review)
- Our focus in this paper is on the two query evaluation
models: Volcano-style (pull-based) and data-driven
(push-based). The most important piece is Fig. 3.
- The Futamura projection is less relevant to us (but a great
concept to learn, given the opportunity).
- The Volcano model: is covered in all textbooks, and we
cover it in class too.
- The data-driven model: is a new model (introduce in HyPer),
which is well described in this paper
- How do these two models work? What is their difference?
- A separate (but important) topic is whether the query engine
is interpreted or compiled. How was the
original System R implemented? How are most commercial and
open-source database systems implemented today? What are the
pros and cons?
Optional: Vectorized v.s. Compiled Sections 1 and 2 are a short introduction to
vectorized v.s. data driven query processing.
- Review 5 Due date: Wed. February 12.
Submit here.
How good are they?
Read the entire paper. We will discuss most of it in class.
Optional reading: a more recent analysis
on SQL Server
- Review 6 Due date: Wed. February 19.
Submit here.
Machine Learning in SQL.
Topics to focus on:
- Machine Learning can be expressed in SQL, but it's painful!
- ML needs recursive queries. Recursion is supported in SQL
using CTE, but it is cumbersome and limited.
- Can you think of other applications of SQL that might
require recursion?
- Due date: Mon. February 26.
Submit here.
LSM
(watch this
short video first)
This is a very informative, but rather dense paper. Read and
review sections 1,2,3. I recommend reading the rest of the paper
too, but it is optional and we will not discuss it in class.
Optional: watch the conference presentation of this
paper here
start at 52:00.