Week of |
Paper(s) |
Questions to Address
|
March 26 |
Optional: Query Language Primer
Optional: Finite Model Theory (Ch. 1 and 2) |
|
April 2 |
Required: Answering queries using views |
Sections 1, 2, 3 only.
- What are some key applications?
- What is the difference between an equivalent rewriting and a maximally contained rewriting?
- What is the difference between query answering and query rewriting?
|
Optional: Optimizing queries using materialized views |
|
Required: What goes around comes around |
Sections 1, 2, 3, 4, 8, 9, 10, 11 (in other words, skip 5-7) |
April 9 |
Required: The Anatomy of a Database System |
Sections 1, 2, 3, 4 only
- What are the advantages/disadvantages of using a thread v.s. a process per connection?
- What are the three kinds of parallel database architectures? what are their pros/cons?
- Why do database systems use a different buffer pool from that of the OS?
- What is the iterator model for query execution?
|
Required: Join processing in database systems with large main memories |
Sections 1,2 only
- What is the "fudge factor"?
- The main memory hash-join algorithm (called classic hashing in the paper) stores one relation in main memory. What happens to the algorithm if the amount of main memory is smaller than the size of the relation?
- What is grace-hash-join?
- What is hybrid-hash-join?
|
April 18 |
Required: Consistently Estimating the Selectivity of Conjuncts of Predicates |
Sections 1,2,3 only
- What are the challenges in using multidimensional histograms in cardinality estimation (and plan cost estimation)?
- Suppose we have one dimensional histograms on R.A, R.B, R.C, R.AB, and R.BC. How do current systems estimate the size of a query SELECT * FROM R where A='a' and B='b' and C='c'? Discuss separately DB2 and SQL Server.
- How does the approach in the paper estimate the size of the query?
|
April 25 |
Required: MapReduce |
Sections 1,2,3,4 only. To answer the questions below you may need to do Homework 2 first.
- What is the difference between a map job and a map task?
- How is the number of map tasks determined? How is the number of reduce tasks determined?
- What are the pros/cons in choosing a large/small number of map tasks and reduce tasks?
- A reduce task has three phases: copy, sort, and reduce. Which of these phases may be started while the map tasks are still running, and which phases cannot be started until all map tasks have completed?
|
April 30 |
Required: The Declarative Imperative |
Sections 1,2,3 only
- What is the urgency of parallelism?
- How does the datalog approach to parallelism differ from other approaches?
- What are the three types of rules in Dedalus? See Fig. 1.
|
Optional: Datalog with negation (Chapter 14) |
|
May 7 |
Optional: book on Finite Model Theory (ps, pdf) |
|
May 14 |
|
|
May 21 |
Required: Provenance Semirings |
Sections 1,2,3,4 only |
May 28 |
Required: Private Data Analysis |
Read the entire paper. |