Notes
Slide Show
Outline
1
Lecture 5: Relational Algebra and XML
  • Monday, April 26th, 2004
2
Course Agenda
  • Today, XML and relational algebra
  • Next two weeks: the internals of DBMS.
    • Covered in gory detail in the book, but stay tuned for reading assignments.
  • May 20th (not 17th!): Phil Bernstein on meta-data management.
  • May 24th: data integration.
  • May 27th: final exam.
3
Agenda
  • Relational algebra
  • XML:
    • What is it and why do we care?
    • Data model
    • Query language: XPath
    • Real query language: XQuery.
    • General ruminations about XML.
4
Relational Algebra
  • Formalism for creating new relations from existing ones
  • Its place in the big picture:
5
Relational Algebra
  • Five operators:
    • Union: È
    • Difference: -
    • Selection: s
    • Projection: P
    • Cartesian Product: ´
  • Derived or auxiliary operators:
    • Intersection, complement
    • Joins (natural,equi-join, theta join, semi-join)
    • Renaming: r
6
1. Union and 2. Difference
  • R1 È R2
  • Example:
    • ActiveEmployees È RetiredEmployees


  • R1 – R2
  • Example:
    • AllEmployees -- RetiredEmployees
7
What about Intersection ?
  • It is a derived operator
  • R1 Ç R2 = R1 – (R1 – R2)
  • Also expressed as a join (will see later)
  • Example
    • UnionizedEmployees Ç RetiredEmployees
8
3. Selection
  • Returns all tuples which satisfy a condition
  • Notation:  sc(R)
  • Examples
    •  sSalary > 40000 (Employee)
    •  sname = “Smith” (Employee)
  • The condition c can be =, <, £, >, ³, <>
9
 
10
4. Projection
  • Eliminates columns, then removes duplicates
  • Notation:   P A1,…,An (R)
  • Example: project social-security number and names:
    •   P SSN, Name (Employee)
    • Output schema:   Answer(SSN, Name)



11
 
12
5. Cartesian Product
  • Each tuple in R1 with each tuple in R2
  • Notation: R1 ´ R2
  • Example:
    • Employee ´ Dependents
  • Very rare in practice; mainly used to express joins
13
 
14
Renaming
  • Changes the schema, not the instance
  • Notation: r B1,…,Bn (R)
  • Example:
    •  rLastName, SocSocNo (Employee)
    • Output schema:
      Answer(LastName, SocSocNo)
15
Renaming Example
16
Natural Join
  • Notation: R1 ⋈ R2


  • Meaning:  R1 ⋈ R2 = PA(sC(R1 ´ R2))


  • Where:
    • The selection sC checks equality of all common attributes
    • The projection eliminates the duplicate common attributes
17
 
18
Natural Join
  • R=                                S=




  • R ⋈ S=


19
Natural Join
  • Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R ⋈ S ?


  • Given R(A, B, C),  S(D, E), what is R ⋈ S  ?


  • Given R(A, B),  S(A, B),  what is  R ⋈ S  ?
20
Theta Join
  • A join that involves a predicate
  • R1 ⋈ q R2   =  s q (R1 ´ R2)
  • Here q can be any condition
21
Eq-join
  • A theta join where q is an equality
  • R1 ⋈A=B R2   =  s A=B (R1 ´ R2)
  • Example:
    • Employee ⋈SSN=SSN Dependents

  • Most useful join in practice
22
Semijoin
  • R ⋉ S  = P A1,…,An (R ⋈ S)
  • Where A1, …, An are the attributes in R
  • Example:
    • Employee ⋉ Dependents

23
Semijoins in Distributed Databases
  • Semijoins are used in distributed databases
24
Complex RA Expressions

  •      Person         Purchase          Person          Product
25
Operations on Bags
  • A bag = a set with repeated elements
  • All operations need to be defined carefully on bags
  • {a,b,b,c}È{a,b,b,b,e,f,f}={a,a,b,b,b,b,b,c,e,f,f}
  • {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b,d}
  •  sC(R): preserve the number of occurrences
  •  PA(R): no duplicate elimination
  • Cartesian product, join: no duplicate elimination
  • Important ! Relational Engines work on bags, not sets !
26
Finally: RA has Limitations !
  • How do we compute “transitive closure”?











  • Find all direct and indirect relatives of Fred


27
XML
28
XML
  • eXtensible Markup Language
  • XML 1.0 – a recommendation from W3C, 1998
  • Roots: SGML (a very nasty language).
  • After the roots: a format for sharing data
29
Why XML is of Interest to Us
  • XML is just syntax for data
    • Note: we have no syntax for relational data
    • But XML is not relational: semistructured
  • This is exciting because:
    • Can translate any data to XML
    • Can ship XML over the Web (HTTP)
    • Can input XML into any application
    • Thus: data sharing and exchange on the Web
30
XML Data Sharing and Exchange
31
From HTML to XML
32
HTML
  • <h1> Bibliography </h1>
  • <p> <i> Foundations of Databases </i>
  •          Abiteboul, Hull, Vianu
  •          <br> Addison Wesley, 1995
  • <p> <i> Data on the Web </i>
  •          Abiteoul, Buneman, Suciu
  •          <br> Morgan Kaufmann, 1999
33
XML
  • <bibliography>
    •   <book>    <title> Foundations… </title>
    •                   <author> Abiteboul </author>
    •                   <author> Hull </author>
    •                   <author> Vianu </author>
    •                   <publisher> Addison Wesley </publisher>
    •                   <year> 1995 </year>
    •   </book>
    •   …
  • </bibliography>
34
Web Services
  • A new paradigm for creating distributed applications?
  • Systems communicate via messages, contracts.
  • Example: order processing system.
  • MS .NET, J2EE – some of the platforms
  • XML – a part of the story; the data format.
35
XML Terminology
  • tags: book, title, author, …
  • start tag: <book>,  end tag: </book>
  • elements: <book>…<book>,<author>…</author>
  • elements are nested
  • empty element: <red></red> abbrv. <red/>
  • an XML document: single root element
36
More XML: Attributes
  • <book price = “55” currency = “USD”>
  •    <title> Foundations of Databases </title>
  •    <author> Abiteboul </author>
  •     …
  •    <year> 1995 </year>
  • </book>
37
More XML: Oids and References
  • <person id=“o555”>  <name> Jane </name> </person>


  • <person id=“o456”>  <name> Mary </name>
  •                                   <children idref=“o123 o555”/>
  • </person>


  • <person id=“o123” mother=“o456”><name>John</name>
  • </person>
38
XML Semantics: a Tree !
39
XML Data
  • XML is self-describing
  • Schema elements become part of the data
    • Reational schema: persons(name,phone)
    • In XML <persons>, <name>, <phone> are part of the data, and are repeated many times
  • Consequence: XML is much more flexible
  • XML = semistructured data
40
Relational Data as XML
  • <person>
    • <row> <name>John</name>
    •            <phone> 3634</phone></row>
    •  <row> <name>Sue</name>
    •              <phone> 6343</phone>
    •  <row> <name>Dick</name>
    •              <phone> 6363</phone></row>
  • </person>
41
XML is Semi-structured Data
  • Missing attributes:





  • Could represent in
    a table with nulls
42
XML is Semi-structured Data
  • Repeated attributes





  • Impossible in tables:
43
XML is Semi-structured Data
  • Attributes with different types in different objects






  • Nested collections (no 1NF)
  • Heterogeneous collections:
    • <db> contains both <book>s and <publisher>s
44
Document Type Definitions
DTD
  • part of the original XML specification
  • an XML document may have a DTD
  • XML document:
    • well-formed = if tags are correctly closed
    • Valid = if it has a DTD and conforms to it
  • validation is useful in data exchange
45
Very Simple DTD
46
Very Simple DTD
47
DTD: The Content Model
  • Content model:
    • Complex = a regular expression over other elements
    • Text-only = #PCDATA
    • Empty = EMPTY
    • Any = ANY
    • Mixed content = (#PCDATA | A | B | C)*
48
DTD: Regular Expressions
49
Querying XML Data
  • XPath = simple navigation through the tree
  • XQuery = the SQL of XML
  • XSLT = recursive traversal
    • will not discuss in class
50
Sample Data for Queries
  • <bib>
    <book>  <publisher> Addison-Wesley </publisher>
                  <author> Serge Abiteboul </author>
                  <author> <first-name> Rick </first-name>
                                  <last-name> Hull </last-name>
                  </author>
                  <author> Victor Vianu </author>
                  <title> Foundations of Databases </title>
                  <year> 1995 </year>
    </book>
    <book price=“55”>
                 <publisher> Freeman </publisher>
                  <author> Jeffrey D. Ullman </author>
                  <title> Principles of Database and Knowledge Base Systems </title>
                  <year> 1998 </year>
    </book>
  • </bib>
51
Data Model for XPath
52
XPath: Simple Expressions

  • Result:  <year> 1995 </year>
  •              <year> 1998 </year>



  • Result:  empty          (there were no papers)
53
XPath: Restricted Kleene Closure

  • Result:<author> Serge Abiteboul </author>
  •               <author> <first-name> Rick </first-name>
  •                               <last-name> Hull </last-name>
  •               </author>
  •               <author> Victor Vianu </author>
  •               <author> Jeffrey D. Ullman </author>




  • Result:  <first-name> Rick </first-name>
54
Xpath: Text Nodes

  • Result:   Serge Abiteboul
  •                    Jeffrey D. Ullman


  • Rick Hull doesn’t appear because he has firstname, lastname


  • Functions in XPath:
    • text()    = matches the text value
    • node()  = matches any node (= * or @* or text())
    • name() = returns the name of the current tag
55
Xpath: Wildcard


  • Result: <first-name> Rick </first-name>
  •               <last-name> Hull </last-name>


  • * Matches any element
56
Xpath: Attribute Nodes

  • Result: “55”


  • @price means that price is has to be an attribute
57
Xpath: Predicates

  • Result: <author> <first-name> Rick </first-name>
  •                               <last-name> Hull </last-name>
  •               </author>


58
Xpath: More Predicates



  • Result: <lastname> … </lastname>
  •               <lastname> … </lastname>
59
Xpath: More Predicates




60
Xpath: Summary
  • bib matches a bib element
  • * matches any element
  • / matches the root element
  • /bib matches a bib element under root
  • bib/paper matches a paper in bib
  • bib//paper matches a paper in bib, at any depth
  • //paper matches a paper at any depth
  • paper|book matches a paper or a book
  • @price matches a price attribute
  • bib/book/@price matches price attribute in book, in bib
  • bib/book/[@price<“55”]/author/lastname  matches…
61
Comments on XPath?
  • What’s good about it?
  • What can’t it do that you want it to do?
  • How does it compare, say, to SQL?
62
XQuery
  • Based on Quilt, which is based on XML-QL
  • Uses XPath to express more complex queries
63
FLWR (“Flower”) Expressions
  • FOR ...
  • LET...
  • WHERE...
  • RETURN...
64
XQuery
  • Find all book titles published after 1995:
65
XQuery
  • Find book titles by the coauthors of “Database Theory”:
66
XQuery
  • Same as before, but eliminate duplicates:
67
XQuery: Nesting
  • For each author of a book by Morgan Kaufmann, list all books she published:
68
XQuery
  • <result>
  •           <author>Jones</author>
  •           <title> abc </title>
  •           <title> def </title>
  •    </result>
  •    <result>
  •            <author> Smith </author>
  •            <title> ghi </title>
  •    </result>
69
XQuery
  • FOR $x in expr  -- binds $x to each value in the list expr


  • LET $x = expr   -- binds $x to the entire list expr
    • Useful for common subexpressions and for aggregations
70
XQuery
71
XQuery
  • Find books whose price is larger than average:
72
XQuery
  • Summary:
  • FOR-LET-WHERE-RETURN  = FLWR
73
FOR v.s. LET

  • FOR
  • Binds node variables à iteration


  • LET
  • Binds collection variables à one value
74
FOR v.s. LET
75
Collections in XQuery
  • Ordered and unordered collections
    • /bib/book/author  = an ordered collection
    • Distinct(/bib/book/author) = an unordered collection
  • LET $a = /bib/book   à  $a is a collection
  • $b/author  à a collection (several authors...)
76
Collections in XQuery
  • What about collections in expressions ?
  • $b/price                       à list of n prices
  • $b/price * 0.7              à list of n numbers
  • $b/price * $b/quantity à list of n x m numbers ??
  • $b/price * ($b/quant1 + $b/quant2)  ¹
      $b/price * $b/quant1 + $b/price * $b/quant2    !!
77
Sorting in XQuery
78
If-Then-Else
79
Existential Quantifiers
80
Universal Quantifiers
81
Other Stuff in XQuery
  • BEFORE and AFTER
    • for dealing with order in the input
  • FILTER
    • deletes some edges in the result tree
  • Recursive functions
    • Currently: arbitrary recursion
    • Perhaps more restrictions in the future ?
82
Final Comments on XML
  • How are we going to process XML efficiently?
    • Special purpose XML engines, or
    • Add functionality to relational engines?
  • Need to manage XML streams.
  • Here, data management is much closer to other programming tasks.