Reading List for 544M

The master's portion of 544M will consists for four reading assignments, consisting of one or two research papers each. You will turn in a review for each reading assignment (suggested length: 2 pages) at specific dates, usually on Friday's. I will hold a special office hour each Wednesday, before the due date, in CSE 605, at 11:30. At the end of the quarter you will bundle the reviews into a 9-10 page report on Advanced Data Management Techniques (you can call it differently if you'd like).

What You Will Learn

(1) Advanced database techniques, and (2) How to read/evaluate research papers. Try to understand as many technical details as possible, but remember that your goal is to understand what the paper is about: we will not test your knowledge of the technical content of the paper.

Reading Assignments


  1. Views

    A.Y. Halevy. Answering Queries Using Views: A Survey. VLDB Journal, 10(4), Sections 1-6, and 9.

    You may restrict your discussion (and reading) to the following sections: sec. 1 sec. 2 (without 2.4) sec. 3. Sec. 6.1. Sec. 6.3.

    Questions to Address:

    1. What is the query answering using views problem ?
    2. What are its main applications ?
    3. Illustrate with some examples the difference between an equivalent rewriting and a maximally contained rewriting.
    4. Illustrate through an example how to find a rewriting using the inverse-rules algorithm.

    Office hour: Wednesday, 10/15/2008, 11:30am (right after class) in CSE 605 (the database lab).

    Due date: Monday, 10/20/2008. Email it to me, or bring it to my office.


  2. Transactions

    Michael J. Cahill, Uwe Röhm, Alan David Fekete: Serializable isolation for snapshot databases. SIGMOD Conference 2008: 729-738

    Questions to Address [TO BE UPDATED] What is snapshot isolation and who uses it ? Why is snapshot isolation not serializable ? What are the main approaches to make the snapshot isolation serializable ? What approach is taken by the paper ?

    Office hour: Wednesday, 10/29/2008, 11:30am (right after class) in CSE 605 (the database lab).

    Due date: Friday, 10/31/2008, in my office (NOTE: this is the same date as the midterm)


  3. Database tuning

    Surajit Chaudhuri, Vivek R. Narasayya: Self-Tuning Database Systems: A Decade of Progress. VLDB 2007: 3-14

    Surajit Chaudhuri, Vivek R. Narasayya: An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server. VLDB 1997: 146-155

    Note: These papers come in tandem: the original 1997 paper contains the technical material, while 2007 paper was written as perspective of that work, ten years later.

    Suggested Questions to Address What is the database tuning problem, why is it important, why is it hard ? What are the most important physical design structures considered by modern database systems today ? Describe a few key technical innovation in the cost-driven index selection algorithm.

    Office hour: Wednesday, 11/12/2008, 11:30am (right after class) in CSE 605 (the database lab).

    Due date: Friday, 11/14/2008, in my office.

  4. Data management in a cluster

    Jeffrey Dean, Sanjay Ghemawat: MapReduce: Simplified Data Processing on Large Clusters. OSDI 2004: 137-150

    Ronnie Chaiken, Bob Jenkins, Per-Åke Larson, Bill Ramsey, Darren Shakib, Simon Weaver, Jingren Zhou, SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets., VDLB 2008

    Suggested Questions To Address: [TO BE UPDATED] What are the major differences/similarities between MapReduce and SCOPE ? What extensions/restrictions does SCOPE consider for SQL, and why ? Comment on the way SCOPE is integrated with C# (is this useful or not, good or bad). Discuss how SCOPE approaches query optimization and evaluation.

    Office hour: Wednesday, 11/26/2008, 11:30am (right after class) in CSE 605 (the database lab).

    Due date: Monday, 12/1/2008.


    Final report

    Due date Friday, 12/5/2008.