Database Seminar

Organised by: Magda Balazinska

The Database Group meets on Mondays at 2.30pm-3.20pm in CSE 405, Allen Center.

This quarter's theme is Parallel Data Processing.

Upcoming talks are announced on uw-db@cs. Please sign up for the mailing list.

Schedule

Date Presenter Talk Title
Oct 03 Paris Overview of modern parallel data processing engines
Oct 10 Kristi Parallel Data Processing Engines: GAMMA
Oct 17 Prasang Parallel Data Processing Engines: BUBBA
Oct 24 Shengliang Parallel Data Processing Engines: VOLCANO
Oct 31 Cancelled for Sigmod
Nov 07 Emad Query Optimization in Parallel DBMSs
Nov 14 -> Nov 18 YongChul Skew in Parallel DBMSs
Dec 05 Abhay Theory in Parallel DBMSs
Dev 12 Nodira Scheduling in Parallel DBMSs
Nov 16 Jingjing Fault-tolerance in Parallel DBMSs

Details

Overview of modern parallel data processing engines

Teradata, Greenplum, and Neteezza architectures. Presentation based on online documentation.

Parallel Data Processing Engines: GAMMA

The Gamma Database Machine Project, D. J. Dewitt et. al., IEEE Transactions on Knowledge and Data Engineering, Volume 2 Issue 1, March 1990.

Parallel Data Processing Engines: BUBBA

Data placement in Bubba, George Copeland et. al., SIGMOD ’88.

The following is an overview paper of Bubba but we will not discuss it:

Prototyping Bubba, A Highly Parallel Database System H. Boral et. al. IEEE Transactions on Knowledge and Data Engineering Volume 2 Issue 1, March 1990 http:dl.acm.orgcitation.cfm?id=627396 http:ieeexplore.ieee.orgstamp/stamp.jsp?tp=&arnumber=50903

Parallel Data Processing Engines: VOLCANO

Volcano— An Extensible and Parallel Query Evaluation System G. Graefe IEEE Transactions on Knowledge and Data Engineering Volume 6 Issue 1, February 1994 http:dl.acm.org/citation.cfm?id=627558

Query Optimization in Parallel DBMSs

Suggested papers are below. Please feel free to pick a better paper. There are many papers on this topic.

The following paper looks at shared-memory systems but introduces the key idea of two-phase optimization so it would be worth reading it.

Optimization of parallel query execution plans in XPRS Hong, W.; Stonebraker, M.; http:ieeexplore.ieee.orgxplsabs_all.jsp?arnumber=183106&tag=1

The following paper optimizes for runtime instead of throughput: Query optimization for parallel execution Sumit Ganguly et. al. SIGMOD ’92

Scheduling in Parallel DBMSs

Multi-dimensional resource scheduling for parallel queries Minos N. Garofalakis and Yannis E. Ioannidis SIGMOD ’96

Or the following VLDB’97 paper by the same authors instead: Parallel Query Scheduling and Optimization with Time-and Space-Shared Resources.

Skew in Parallel DBMSs

A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins Christopher B. Walton et. al. VLDB’91

Fault-tolerance in Parallel DBMSs

Fault Tolerance Issues in Data Declustering for Parallel Database Systems (1994) by Leana Golubchik , Richard R. Muntz Bulletin of the Technical Committee on Data Engineering

Theory in Parallel DBMSs

Neil Immerman. Expressibility and Parallel Complexity. SIAM J. Comput. 18(3): 625-638 (1989)

Dan Suciu, Val Tannen. A Query Language for NC. PODS 1994: 167-178

Feel free to send comments to Prasang.