Database Seminar

Organised by: Magda Balazinska

The Database Group meets on Mondays at 3.30pm-4.20pm in CSE 405, Allen Center.

This quarter we discuss papers from SIGMOD, 2012 and PODS, 2012.

Upcoming talks are announced on uw-db@cs. Please sign up for the mailing list.

Schedule

Date Presenter Talk Title
Sep 24 VLDB discussion
Oct 1 Prasang Datalog basics
Oct 8 Jingjing Evaluation of Datalog
Oct 15 Paris The three semantics of negation
Oct 22 Emad Parallel Datalog
Oct 29 George Fletcher A Structural Approach to Indexing Triples in SAINT-DB
Nov 5 Jennifer and Konstantin Datalog systems
Nov 12 Cancelled SIGMOD deadline
Nov 19 Shengliang Back to SQL
Nov 26 Parallel Datalog
Dec 3 Two choices

Details

Datalog basics

1) If you don't know anything about Datalog, please read:

Section 5.4 of “Database Systems: the Complete Handbook”, by Hector Garcia-Molina, Jennifer Widom, and Jeffrey Ullman.

or

Chapter 24 from Database Management Systems by Ramakrishnan and Gehrke (Third Edition).

or

Database Encyclopedia entry on “DATALOG”. Grigoris Karvounarakis.

2) The main paper that we will read and discuss is:

What you always wanted to know about datalog (and never dared to ask), by Ceri, Gottlob, and Tanca. IEEE Transactions on Knowledge and Data Engineering, Vol I , No 1, March 1989

Comments: This paper is not an easy read. Sections 6.E and 6.F are obsolete

3) If you want more details, then read Chapter 12 from this book:

Foundations of Databases by Abiteboul-Hull-Vianu:

Comments: this is the ultimate text, containing a complete treatment of all things datalog, although it is not an easy read. We will discuss chapters from this book in the next two seminars, so you may want to start reading.

Evaluation of Datalog (Semi-naive evaluation and Magic sets optimizations)

Chapter 13 of Foundations of Databases by Abiteboul-Hull-Vianu.

Comments: we should nail down once and forever this often-cited, but ill understood optimization technique. Some questions to address: what is the connection to semi-join reduction? What are the distinctions between the different types magic set optimizations (if any), and should we care?

Negation (The three semantics of negation)

Some subset of chapters 14 and 15 of Foundations of Databases by Abiteboul-Hull-Vianu.

Comments: Both stratified and inflationary-fixpoint semantics are popular today; virtually every paper on datalog positions itself w.r.t. these two semantics. Partial-fixpoint is important too: some systems today implement it without calling it that way (e.g. Dedalus). All these three semantics are easy to understand. The only difficult step is understanding that inflationary datalog can express stratified datalog: few people understand this important result and its significance, and we should discuss it in detail. It is covered in AHV, but it's not an easy read. We can restrict our discussion to the key example that proves the result: that the complement of transitive closure can be expressed in inflationary datalog. If time, we should also discuss the move-win game that separates stratified datalog from inflationary datalog.

Parallel datalog

S. Ganguly, A. Silberschatz, and S. Tsur. Parallel bottom-up processing of datalog queries. J. Log. Program., 14(1&2):101–126, 1992.

Comments: Pay special attention to the idea of “redundant” or “partially redundant” computation. The idea is that we want to compute a conjunctive query (aka datalog rule) in parallel, and we can do this in one step by replicating the data. This idea has been exploited recently by Afrati and Ullman, and a series of papers on computing queries in MapReduce. A question: are the techniques in this paper restricted only to conjunctive queries, or are there any techniques that exploit the iterative nature of datalog?

External Visitor

Visitor: George Fletcher, Eindhoven University of Technology, Netherlands. Talk: A Structural Approach to Indexing Triples in SAINT-DB

Datalog systems

1) Discussion of new datalog systems: Dyna, maybe BOOM, logicblox.

Dyna: Extending Datalog For Modern AI? Jason Eisner and Nathaniel W. Filardo Datalog 2.0 http:www.cs.jhu.edu nwfdatalog20-paper.pdf

2) Discussion of old datalog systems

R. Ramakrishnan and J. D. Ullman. A survey of deductive database systems. J. Log. Program., 23(2):125–149, 1995.

Comment: About half of the paper is a survey of topics that we already know by now (magic sets…), but the second half has a discussion of the systems at that time. Skim that second part.

Also related: A Survey of Research on Deductive Database Systems, Ramakrishnan and Ullman, Journal of Logic Programming, 1993

Comment: Most of the paper is a survey, and is totally subsumed by “What you always wanted…” The most interesting part is the extensive list of datalog systems available at the time, which is included at the end of the paper.

Back to SQL

Foto N. Afrati, Anish Das Sarma, Semih Salihoglu, Jeffrey D. Ullman: Upper and Lower Bounds on the Cost of a Map-Reduce Computation. CoRR abs/1206.4377 (2012)

A light weight version is here: http:dl.acm.org/citation.cfm?id=2331042&picked=prox

Comment: This paper is not about recursion, but it is about a single MapReduce step. They discover an interesting tradeoff: on one hand we want to move from few, large reducers to many, small reducers; on the other hand this can be done only at the expense of increasing the amount of data replication. The paper explains this phenomenon in detail, showing several example algorithms. The question to us is: what lesson can be applied to iterative query processing?

See also:

Siddharth Suri, Sergei Vassilvitskii: Counting triangles and the curse of the last reducer. WWW 2011: 607-614

Comment: This is a simple and fundamental MR algorithm. We probably wont be able to study and discuss all algorithms covered in “Upper and lower bounds…”, but we should discuss “counting triangles”.

Parallel Datalog

Transitive Closure and Recursive Datalog Implemented on Clusters, Afrati and Ullman, EDBT 2012

Two options

Option A) We can discuss the various recent systems that provide iterative processing including HaLoop, Twister, Stratosphere, Hyracks, PrIter, Rex, and Daytona.

Option B) Discuss recursive queries in SQL.

Y. Ioannidis. On the computation of the transitive closure of relational operators. In Proc. of the 12th Int. Conf. on Very Large DataBases (VLDB), pages 403–411, 1986.

Optimization of Linear Recursive Queries in SQL February 2010 (vol. 22 no. 2) pp. 264-277 Carlos Ordonez, University of Houston, Houston

C. Ordonez, “Optimizing recursive queries in SQL,” in Proceedings of the 2005 ACM SIGMOD international conference on Management of data, SIGMOD ’05, 2005, pp. 834–839.

Jagadish has half a dozen papers on transitive closure, we need to take a look at these. One of particular interest is:

Rakesh Agrawal, H. V. Jagadish: Multiprocessor Transitive Closure Algorithms. IEEE Data Eng. Bull. 12(1): 30-36 (1989)

Feel free to send comments to Prasang.