Reading List for 544M

The master's portion of 544M will consists for four reading assignments, consisting of one or two research papers each. You will turn in a review for each reading assignment (suggested length: 2 pages) at specific dates, usually on Friday's. Office hours for these readings will be on a per need basis. At the end of the quarter you will bundle the reviews into a 9-10 page report on Advanced Data Management Techniques (you can call it differently if you'd like).

What You Will Learn

(1) Advanced database techniques, and (2) How to read/evaluate research papers. Try to understand as many technical details as possible, but remember that your goal is to understand what the paper is about: we will not test your knowledge of the technical content of the paper.

Reading Assignments


  1. Views

    A.Y. Halevy. Answering Queries Using Views: A Survey. VLDB Journal, 10(4), Sections 1-6, and 9.

    You may restrict your discussion (and reading) to the following sections: sec. 1 sec. 2 (without 2.4) sec. 3. Sec. 6.1. Sec. 6.3.

    Questions to Address:

    1. What is the query answering using views problem ?
    2. What are its main applications ?
    3. Illustrate with some examples the difference between an equivalent rewriting and a maximally contained rewriting.
    4. Illustrate through an example how to find a rewriting using the inverse-rules algorithm.

    Due date: Friday, 4/23/2010, by e-mail.


  2. Transactions

    Michael J. Cahill, Uwe Röhm, Alan David Fekete: Serializable isolation for snapshot databases. SIGMOD Conference 2008: 729-738

    Questions to Address What is snapshot isolation and who uses it ? Why is snapshot isolation not serializable ? What are the main approaches to make the snapshot isolation serializable ? What approach is taken by the paper ?

    Due date: Friday, 5/7/2010, by e-mail.


  3. Probablistic Databses

    Nilesh Dalvi, Dan Suciu: Efficient Query Evaluation on Probabilistic Databases. VLDB 2004

    Questions to Adress

    1. What is the semantics of a probabilistic database ?
    2. What are the main approaches to evaluate queries over probabilistic databases ?
    3. Give an example of a query that is easy to compute over a probabilistic database, and one that is hard to compute.

    Due date: Friday, 5/21/2010, by e-mail.


  4. Data management in a cluster

    Note: The readings in this section are subject to change

    Jeffrey Dean, Sanjay Ghemawat: MapReduce: Simplified Data Processing on Large Clusters. OSDI 2004: 137-150

    Ronnie Chaiken, Bob Jenkins, Per-Åke Larson, Bill Ramsey, Darren Shakib, Simon Weaver, Jingren Zhou, SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets., VDLB 2008

    Suggested Questions To Address: What are the major differences/similarities between MapReduce and SCOPE ? What extensions/restrictions does SCOPE consider for SQL, and why ? Comment on the way SCOPE is integrated with C# (is this useful or not, good or bad). Discuss how SCOPE approaches query optimization and evaluation.

    Due date: Friday, 6/4/2008, by e-mail.


    Final report

    Due date Friday, 6/4/2010, by e-mail.