CSE 544: Reading Assignments

The drop box for paper reviews is https://catalyst.uw.edu/collectit/dropbox/summary/pkoutris/21136

Week of
Paper(s)
Questions to Address
March 26
Optional: Query Language Primer
Optional: Finite Model Theory (Ch. 1 and 2)
April 2 Required: Answering queries using views

Sections 1, 2, 3 only.

  • What are some key applications?
  • What is the difference between an equivalent rewriting and a maximally contained rewriting?
  • What is the difference between query answering and query rewriting?
Optional: Optimizing queries using materialized views
Required: What goes around comes around
Sections 1, 2, 3, 4, 8, 9, 10, 11 (in other words, skip 5-7)
April 9 Required: The Anatomy of a Database System

Sections 1, 2, 3, 4 only

  • What are the advantages/disadvantages of using a thread v.s. a process per connection?
  • What are the three kinds of parallel database architectures? what are their pros/cons?
  • Why do database systems use a different buffer pool from that of the OS?
  • What is the iterator model for query execution?
Required: Join processing in database systems with large main memories

Sections 1,2 only

  • What is the "fudge factor"?
  • The main memory hash-join algorithm (called classic hashing in the paper) stores one relation in main memory. What happens to the algorithm if the amount of main memory is smaller than the size of the relation?
  • What is grace-hash-join?
  • What is hybrid-hash-join?
April 18 Required: Consistently Estimating the Selectivity of Conjuncts of Predicates

Sections 1,2,3 only

  • What are the challenges in using multidimensional histograms in cardinality estimation (and plan cost estimation)?
  • Suppose we have one dimensional histograms on R.A, R.B, R.C, R.AB, and R.BC. How do current systems estimate the size of a query SELECT * FROM R where A='a' and B='b' and C='c'? Discuss separately DB2 and SQL Server.
  • How does the approach in the paper estimate the size of the query?
April 25 Required: MapReduce

Sections 1,2,3,4 only. To answer the questions below you may need to do Homework 2 first.

  • What is the difference between a map job and a map task?
  • How is the number of map tasks determined? How is the number of reduce tasks determined?
  • What are the pros/cons in choosing a large/small number of map tasks and reduce tasks?
  • A reduce task has three phases: copy, sort, and reduce. Which of these phases may be started while the map tasks are still running, and which phases cannot be started until all map tasks have completed?
April 30 Required: The Declarative Imperative

Sections 1,2,3 only

  • What is the urgency of parallelism?
  • How does the datalog approach to parallelism differ from other approaches?
  • What are the three types of rules in Dedalus? See Fig. 1.
Optional: Datalog with negation (Chapter 14)
May 7 Optional: book on Finite Model Theory (ps, pdf)
May 14  
May 21 Required: Provenance Semirings
Sections 1,2,3,4 only
May 28 Required: Private Data Analysis
Read the entire paper.