Reading List for 544M

The master's portion of 544M will consists for four reading assignments, consisting of one or two research papers each. You will turn in a review for each reading assignment (suggested length: 2 pages) at specific dates, usually on Friday's. Office hours for these readings will be on a per need basis. At the end of the quarter you will bundle the reviews into a 9-10 page report on Advanced Data Management Techniques (you can call it differently if you'd like).

What You Will Learn

(1) Advanced database techniques, and (2) How to read/evaluate research papers. Try to understand as many technical details as possible, but remember that your goal is to understand what the paper is about: we will not test your knowledge of the technical content of the paper.

Reading Assignments

Views

A.Y. Halevy. Answering Queries Using Views: A Survey. VLDB Journal, 10(4), Sections 1-6, and 9.

You may restrict your discussion (and reading) to the following sections: sec. 1 sec. 2 (without 2.4) sec. 3. Sec. 6.1. Sec. 6.3.

Questions to Address:
1. What is the query answering using views problem ?
2. What are its main applications ?
3. Illustrate with some examples the difference between an equivalent rewriting and a maximally contained rewriting.
4. Illustrate through an example how to find a rewriting using the inverse-rules algorithm.
Due date: Friday, 4/23/2010, by e-mail.
Transactions

Michael J. Cahill, Uwe Röhm, Alan David Fekete: Serializable isolation for snapshot databases. SIGMOD Conference 2008: 729-738

Questions to Address What is snapshot isolation and who uses it ? Why is snapshot isolation not serializable ? What are the main approaches to make the snapshot isolation serializable ? What approach is taken by the paper ?

Due date: Friday, 5/7/2010, by e-mail.
Probablistic Databses
Nilesh Dalvi, Dan Suciu: Efficient Query Evaluation on Probabilistic Databases. VLDB 2004
Questions to Adress
1. What is the semantics of a probabilistic database ?
2. What are the main approaches to evaluate queries over probabilistic databases ?
3. Give an example of a query that is easy to compute over a probabilistic database, and one that is hard to compute.
Due date: Friday, 5/21/2010, by e-mail.
Data management in a cluster

Note: The readings in this section are subject to change

Jeffrey Dean, Sanjay Ghemawat: MapReduce: Simplified Data Processing on Large Clusters. OSDI 2004: 137-150

Ronnie Chaiken, Bob Jenkins, Per-Åke Larson, Bill Ramsey, Darren Shakib, Simon Weaver, Jingren Zhou, SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets., VDLB 2008

Suggested Questions To Address: What are the major differences/similarities between MapReduce and SCOPE ? What extensions/restrictions does SCOPE consider for SQL, and why ? Comment on the way SCOPE is integrated with C# (is this useful or not, good or bad). Discuss how SCOPE approaches query optimization and evaluation.

Due date: Friday, 6/4/2008, by e-mail.

Final report

Due date Friday, 6/4/2010, by e-mail.

Reading List for 544M

What You Will Learn

Reading Assignments

Views

Transactions

Probablistic Databses

Data management in a cluster

Note: The readings in this section are subject to change

Final report