The Database Group meets on Mondays at 3.30pm-4.20pm in CSE 405, Allen Center.
This quarter we discuss papers from SIGMOD, 2012 and PODS, 2012.
Upcoming talks are announced on uw-db@cs. Please sign up for the mailing list.
Date | Presenter | Talk Title |
Sep 24 | VLDB discussion | |
Oct 1 | Prasang | Datalog basics |
Oct 8 | Jingjing | Evaluation of Datalog |
Oct 15 | Paris | The three semantics of negation |
Oct 22 | Emad | Parallel Datalog |
Oct 29 | George Fletcher | A Structural Approach to Indexing Triples in SAINT-DB |
Nov 5 | Jennifer and Konstantin | Datalog systems |
Nov 12 | Cancelled | SIGMOD deadline |
Nov 19 | Shengliang | Back to SQL |
Nov 26 | Parallel Datalog | |
Dec 3 | Two choices |
1) If you don't know anything about Datalog, please read:
Section 5.4 of “Database Systems: the Complete Handbook”, by Hector Garcia-Molina, Jennifer Widom, and Jeffrey Ullman.
or
Chapter 24 from Database Management Systems by Ramakrishnan and Gehrke (Third Edition).
or
Database Encyclopedia entry on “DATALOG”. Grigoris Karvounarakis.
2) The main paper that we will read and discuss is:
What you always wanted to know about datalog (and never dared to ask), by Ceri, Gottlob, and Tanca. IEEE Transactions on Knowledge and Data Engineering, Vol I , No 1, March 1989
Comments: This paper is not an easy read. Sections 6.E and 6.F are obsolete
3) If you want more details, then read Chapter 12 from this book:
Foundations of Databases by Abiteboul-Hull-Vianu:
Comments: this is the ultimate text, containing a complete treatment of all things datalog, although it is not an easy read. We will discuss chapters from this book in the next two seminars, so you may want to start reading.
Chapter 13 of Foundations of Databases by Abiteboul-Hull-Vianu.
Comments: we should nail down once and forever this often-cited, but ill understood optimization technique. Some questions to address: what is the connection to semi-join reduction? What are the distinctions between the different types magic set optimizations (if any), and should we care?
Some subset of chapters 14 and 15 of Foundations of Databases by Abiteboul-Hull-Vianu.
Comments: Both stratified and inflationary-fixpoint semantics are popular today; virtually every paper on datalog positions itself w.r.t. these two semantics. Partial-fixpoint is important too: some systems today implement it without calling it that way (e.g. Dedalus). All these three semantics are easy to understand. The only difficult step is understanding that inflationary datalog can express stratified datalog: few people understand this important result and its significance, and we should discuss it in detail. It is covered in AHV, but it's not an easy read. We can restrict our discussion to the key example that proves the result: that the complement of transitive closure can be expressed in inflationary datalog. If time, we should also discuss the move-win game that separates stratified datalog from inflationary datalog.
S. Ganguly, A. Silberschatz, and S. Tsur. Parallel bottom-up processing of datalog queries. J. Log. Program., 14(1&2):101–126, 1992.
Comments: Pay special attention to the idea of “redundant” or “partially redundant” computation. The idea is that we want to compute a conjunctive query (aka datalog rule) in parallel, and we can do this in one step by replicating the data. This idea has been exploited recently by Afrati and Ullman, and a series of papers on computing queries in MapReduce. A question: are the techniques in this paper restricted only to conjunctive queries, or are there any techniques that exploit the iterative nature of datalog?
Visitor: George Fletcher, Eindhoven University of Technology, Netherlands. Talk: A Structural Approach to Indexing Triples in SAINT-DB
1) Discussion of new datalog systems: Dyna, maybe BOOM, logicblox.
Dyna: Extending Datalog For Modern AI? Jason Eisner and Nathaniel W. Filardo Datalog 2.0 http:www.cs.jhu.edu nwfdatalog20-paper.pdf
2) Discussion of old datalog systems
R. Ramakrishnan and J. D. Ullman. A survey of deductive database systems. J. Log. Program., 23(2):125–149, 1995.
Comment: About half of the paper is a survey of topics that we already know by now (magic sets…), but the second half has a discussion of the systems at that time. Skim that second part.
Also related: A Survey of Research on Deductive Database Systems, Ramakrishnan and Ullman, Journal of Logic Programming, 1993
Comment: Most of the paper is a survey, and is totally subsumed by “What you always wanted…” The most interesting part is the extensive list of datalog systems available at the time, which is included at the end of the paper.
Foto N. Afrati, Anish Das Sarma, Semih Salihoglu, Jeffrey D. Ullman: Upper and Lower Bounds on the Cost of a Map-Reduce Computation. CoRR abs/1206.4377 (2012)
A light weight version is here: http:dl.acm.org/citation.cfm?id=2331042&picked=prox
Comment: This paper is not about recursion, but it is about a single MapReduce step. They discover an interesting tradeoff: on one hand we want to move from few, large reducers to many, small reducers; on the other hand this can be done only at the expense of increasing the amount of data replication. The paper explains this phenomenon in detail, showing several example algorithms. The question to us is: what lesson can be applied to iterative query processing?
See also:
Siddharth Suri, Sergei Vassilvitskii: Counting triangles and the curse of the last reducer. WWW 2011: 607-614
Comment: This is a simple and fundamental MR algorithm. We probably wont be able to study and discuss all algorithms covered in “Upper and lower bounds…”, but we should discuss “counting triangles”.
Transitive Closure and Recursive Datalog Implemented on Clusters, Afrati and Ullman, EDBT 2012
Option A) We can discuss the various recent systems that provide iterative processing including HaLoop, Twister, Stratosphere, Hyracks, PrIter, Rex, and Daytona.
Option B) Discuss recursive queries in SQL.
Y. Ioannidis. On the computation of the transitive closure of relational operators. In Proc. of the 12th Int. Conf. on Very Large DataBases (VLDB), pages 403–411, 1986.
Optimization of Linear Recursive Queries in SQL February 2010 (vol. 22 no. 2) pp. 264-277 Carlos Ordonez, University of Houston, Houston
C. Ordonez, “Optimizing recursive queries in SQL,” in Proceedings of the 2005 ACM SIGMOD international conference on Management of data, SIGMOD ’05, 2005, pp. 834–839.
Jagadish has half a dozen papers on transitive closure, we need to take a look at these. One of particular interest is:
Rakesh Agrawal, H. V. Jagadish: Multiprocessor Transitive Closure Algorithms. IEEE Data Eng. Bull. 12(1): 30-36 (1989)
Feel free to send comments to Prasang.