CSE 599C Hot Topics in Data Management Systems
Announcements
- 03/03/06: Just a reminder that your short reports are due at the end of next week. Please see the evaluation section below.
- 01/13/06: We will discuss two papers on Tuesday, January 17th. They are both quite long, but only some of the sections are required reading
(see outline for details).
- 01/09/06: Added reading questions for lecture on active databases. Reading questions will always be posted at least 24 hours before class.
- 01/03/06: Please visit this page regularly for announcements and updates.
Administration
Instructor: Magdalena Balazinska
Meeting times: Tuesdays and Thursdays 12pm-1:20pm, so feel free to bring your lunch to class.
Location: CSE 503.
Overview
Advances in the area of sensor networks have created the need for systems capable of processing continuous streams of information. The dramatic rise in the number of applications based on the Internet and the World Wide Web has increased the need for efficient and scalable mechanisms to manage distributed data repositories. In today's systems, data repositories are also frequently owned by different autonomous parties and contain data in many different formats: structured, semi-structured, text, maps, images, video, etc. In this seminar, we will examine how modern data management systems cope with these new challenges, and explore open questions.
Format
One or two papers will be assigned for each class. Please read the papers and come prepared to discuss them. For each paper, a few reading questions will be provided to help you prepare.
Evaluation
The seminar will be graded as credit or no credit. To get credit, you must read the papers, come to class, and participate in the discussions. Additionally, you should also pick one of the following:
- If you are taking the class to get some breadth, select three papers discussed in the seminar and hand-in written answers to the reading questions. Please do not write more than a total of three pages.
- If you are interested in one topic covered in the seminar in particular, identify an open question related to that topic and briefly discuss the problem, the related work, and some possible solutions. Do not write more than a total of three pages.
- If you would like to get more seriously involved in a topic covered in the seminar, you can start a research project on that topic. Please come and see me for a list of possible projects.
DEADLINE: March 10th 2006 at 6pm.
Course Calendar
|
Topic and readings |
01/03 Background |
Topic: Class introduction and introduction to relational database management systems.
Readings: None assigned. |
01/05 Background |
Topic: Fundamentals of query evaluation in relational database management systems.
Readings: None assigned.
Slides in html (these slides are only available from within cs.washington.edu). |
01/10 Background |
Topic: Active databases.
Readings:
- Eric N. Hanson, Chris Carnes, Lan Huang, Mohan Konyala, Lloyd Noronha, Sashi Parthasarathy, J. B. Park, and Albert Vernon. Scalable Trigger Processing. ICDE 1999. [pdf]
Reading questions
Slides in html (these slides are only available from within cs.washington.edu). |
01/12 Streams |
Topic: Stream processing overview.
Readings:
- B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and Issues in Data Stream Systems. PODS 2002.
[pdf]
Reading questions
No slides today. |
01/17 Streams |
Topic: Stream data models and operators.
Readings:
- D. Abadi, D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik. Aurora: A New Model and Architecture for Data Stream Management. In VLDB Journal (12)2, 2003. [pdf]
(Required reading: Sections 2 and 5)
- Arvind Arasu, Shivnath Babu, and Jennifer Widom. The CQL Continuous Query Language: Semantic Foundations and Query Execution. VLDB Journal (to appear). [pdf] (Required reading: Sections 3, 4, 5, and 6)
Reading questions
Slides in OpenOffice format (.sxi) and html |
01/19 Streams |
Topic: Continuous and adaptive query processing.
Readings:
- Samuel R. Madden, Mehul A. Shah, Joseph M. Hellerstein, and Vijayshankar Raman. Continuously Adaptive Continuous Queries over Streams. SIGMOD 2002. [pdf]
Background:
- Joseph M. Hellerstein and Ron Avnur. Eddies: Continuously Adaptive Query Processing. SIGMOD 2000. [pdf]
Reading questions
Slides in OpenOffice format (.sxi) and html |
01/24 Streams |
Topic: Approximate stream processing.
Readings:
- Required:
N. Tatbul, U. Çetintemel, S. Zdonik, M. Cherniack, and M. Stonebraker. Load Shedding in a Data Stream Manager. VLDB 2003. [pdf]
-
Optional (we will not discuss it in class): Theodore Johnson, S. Muthukrishnan, and Irina Rozenbaum. Sampling Algorithms in a Stream Operator. SIGMOD 2005. [pdf]
Reading questions
Slides in OpenOffice format (.sxi) and html |
01/26 Streams |
Topic: Distributed stream processing. Managing load and resource utilization.
Readings:
- Peter Pietzuch, Jonathan Ledlie, Jeffrey Shneidman, Mema Roussopoulos, Matt Welsh, and Margo Seltzer. Network-Aware Operator Placement for Stream-Processing Systems. ICDE 2006. [pdf]
Reading questions
Slides in OpenOffice format (.sxi) and html
|
01/31 Streams |
Topic: Distributed stream processing. Fault-tolerance.
Readings:
- Jeong-Hyon Hwang, Magdalena Balazinska, Alexander Rasin, Ugur Cetintemel, Michael Stonebraker, and Stan Zdonik. High-Availability Algorithms for Distributed Stream Processing. ICDE 2005. [pdf]
- Magdalena Balazinska, Hari Balakrishnan, Samuel Madden, and Michael Stonebraker. Fault-Tolerance in the Borealis Distributed Stream Processing System. SIGMOD 2005. [pdf]
Reading questions
Slides in OpenOffice format (.sxi) and html
|
02/02
Streams & device heterogeneity |
Topic: Sensor networks.
Readings:
- Samuel Madden, Michael Franklin, Joseph Hellerstein, and Wei Hong. TinyDB: An Acquisitional Query Processing System for Sensor Networks. In TODS 30(1), 2005. [pdf]
Reading questions
Slides in OpenOffice format (.sxi) and html
|
02/07
Distributed data management |
Topics:
- Overview of distributed data management.
- Introduction to traditional distributed database systems.
Readings: None assigned.
Slides in OpenOffice format (.sxi) and html
|
02/09
Background |
Topic: Transactions (background needed to further discuss distributed databases).
Readings:
- Michael J. Franklin. Concurrency Control and Recovery. The Handbook of Computer Science and Engineering, A. Tucker, ed., CRC Press, Boca Raton, 1997. [pdf]
No reading questions.
Slides in OpenOffice format (.sxi) and html
|
02/14 Distributed data management |
Topic: Traditional distributed databases. Uniformity and tight coupling.
Readings:
- C. Mohan, B. Lindsay, and R. Obermarck. Transaction Management in the R* Distributed Database Management System. ACM Transactions On Database Systems 11(4), 1986. [pdf]
Reading questions
Slides in OpenOffice format (.sxi) and html
|
02/16 Distributed data management |
Topic: Federated databases. Autonomy and incentives.
Readings:
- M. Stonebraker, P. M. Aoki, W. Litwin, A. Pfeffer, A. Sah, J. Sidell, C. Staelin, and A. Yu. Mariposa: A Wide-area Distributed Database System. VLDB Journal (5)1, 1996. [pdf]
Reading questions
Slides in OpenOffice format (.sxi) and html
|
02/21 Distributed data management |
Topic: Federated systems. Autonomy and heterogeneity.
Readings:
- U. Srivastava, J. Widom, K. Munagala, and R. Motwani. Query Optimization over Web Services. Technical Report, Stanford University, Oct 2005. [pdf]
Reading questions
Slides in OpenOffice format (.sxi) and html
|
02/23 Distributed data management |
Topic: Federated systems. Fault-tolerance.
Readings:
- R. Barga, D. Lomet, G. Shegalov, and G. Weikum. Recovery Guarantees for Internet Applications. ACM Trans. on Internet Technology. 2004. [pdf]
Reading questions
Slides in OpenOffice format (.sxi) and html
|
02/28 Distributed data management |
Topic: Peer-to-peer systems. Large-scale distribution.
Readings:
- Ryan Huebsch, Joseph M. Hellerstein, Nick Lanham, Boon Thau Loo, Scott Shenker, Ion Stoica. Querying the Internet with PIER. VLDB 2003. [pdf]
Background:
- Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan, Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications. SIGCOMM 2001. [pdf]
Reading questions
Slides in OpenOffice format (.sxi) and html |
03/02
Distributed data management
|
Topic: Caching, replication, and disconnected operation.
Readings:
- Required: Jim Gray, Pat Helland, Patrick O'Neil, and Dennis Shasha. The Dangers of Replication and a Solution. ACM SIGMOD Record (25)2, 1996. [pdf]
- Optional (we will not discuss it in class): Hongfei Guo, Per-Ake Larson, Raghu Ramakrishnan,
and Jonathan Goldstein. Relaxed Currency and Consistency:
How to Say "Good Enough" in SQL. SIGMOD 2004. [pdf]
Reading questions
Slides in OpenOffice format (.sxi) and html |
03/07
Data type heterogeneity
|
Topic: Structured and semi-structured data
Readings:
- Matthias Nicola and Bert van der Linden. Native XML Support in DB2 Universal Database. VLDB 2005. [pdf]
Reading questions
Slides in OpenOffice format (.sxi) and html |
03/09
Data type heterogeneity |
Topic: Unstructured data and more...
Readings:
- Sergey Brin and Lawrence Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. WWW7. 1998. [pdf]
- Google services and tools.
Reading questions
Slides in OpenOffice format (.sxi) and html |