|
1
|
|
|
2
|
- Today, XML and relational algebra
- Next two weeks: the internals of DBMS.
- Covered in gory detail in the book, but stay tuned for reading
assignments.
- May 20th (not 17th!): Phil Bernstein on meta-data management.
- May 24th: data integration.
- May 27th: final exam.
|
|
3
|
- Relational algebra
- XML:
- What is it and why do we care?
- Data model
- Query language: XPath
- Real query language: XQuery.
- General ruminations about XML.
|
|
4
|
- Formalism for creating new relations from existing ones
- Its place in the big picture:
|
|
5
|
- Five operators:
- Union: È
- Difference: -
- Selection: s
- Projection: P
- Cartesian Product: ´
- Derived or auxiliary operators:
- Intersection, complement
- Joins (natural,equi-join, theta join, semi-join)
- Renaming: r
|
|
6
|
- R1 È R2
- Example:
- ActiveEmployees È
RetiredEmployees
- R1 – R2
- Example:
- AllEmployees -- RetiredEmployees
|
|
7
|
- It is a derived operator
- R1 Ç R2 = R1 – (R1
– R2)
- Also expressed as a join (will see later)
- Example
- UnionizedEmployees Ç RetiredEmployees
|
|
8
|
- Returns all tuples which satisfy a condition
- Notation: sc(R)
- Examples
- sSalary > 40000 (Employee)
- sname = “Smith” (Employee)
- The condition c can be =, <, £, >, ³, <>
|
|
9
|
|
|
10
|
- Eliminates columns, then removes duplicates
- Notation: P A1,…,An (R)
- Example: project social-security number and names:
- P SSN, Name
(Employee)
- Output schema: Answer(SSN,
Name)
|
|
11
|
|
|
12
|
- Each tuple in R1 with each tuple in R2
- Notation: R1 ´ R2
- Example:
- Very rare in practice; mainly used to express joins
|
|
13
|
|
|
14
|
- Changes the schema, not the instance
- Notation: r B1,…,Bn (R)
- Example:
- rLastName, SocSocNo
(Employee)
- Output schema:
Answer(LastName, SocSocNo)
|
|
15
|
|
|
16
|
- Notation: R1 ⋈ R2
- Meaning: R1 ⋈ R2 = PA(sC(R1
´ R2))
- Where:
- The selection sC checks equality of all common attributes
- The projection eliminates the duplicate common attributes
|
|
17
|
|
|
18
|
|
|
19
|
- Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R ⋈
S ?
- Given R(A, B, C), S(D, E), what
is R ⋈ S ?
- Given R(A, B), S(A, B), what is R ⋈ S ?
|
|
20
|
- A join that involves a predicate
- R1 ⋈ q R2
= s q (R1 ´ R2)
- Here q can be any condition
|
|
21
|
- A theta join where q is an equality
- R1 ⋈A=B R2
= s A=B (R1 ´ R2)
- Example:
- Employee ⋈SSN=SSN Dependents
- Most useful join in practice
|
|
22
|
- R ⋉ S = P A1,…,An
(R ⋈ S)
- Where A1, …, An are the attributes in R
- Example:
|
|
23
|
- Semijoins are used in distributed databases
|
|
24
|
- Person Purchase Person Product
|
|
25
|
- A bag = a set with repeated elements
- All operations need to be defined carefully on bags
- {a,b,b,c}È{a,b,b,b,e,f,f}={a,a,b,b,b,b,b,c,e,f,f}
- {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b,d}
- sC(R): preserve the
number of occurrences
- PA(R): no duplicate
elimination
- Cartesian product, join: no duplicate elimination
- Important ! Relational Engines work on bags, not sets !
|
|
26
|
- How do we compute “transitive closure”?
- Find all direct and indirect relatives of Fred
|
|
27
|
|
|
28
|
- eXtensible Markup Language
- XML 1.0 – a recommendation from W3C, 1998
- Roots: SGML (a very nasty language).
- After the roots: a format for sharing data
|
|
29
|
- XML is just syntax for data
- Note: we have no syntax for relational data
- But XML is not relational: semistructured
- This is exciting because:
- Can translate any data to XML
- Can ship XML over the Web (HTTP)
- Can input XML into any application
- Thus: data sharing and exchange on the Web
|
|
30
|
|
|
31
|
|
|
32
|
- <h1> Bibliography </h1>
- <p> <i> Foundations of Databases </i>
- Abiteboul, Hull, Vianu
- <br> Addison
Wesley, 1995
- <p> <i> Data on the Web </i>
- Abiteoul, Buneman, Suciu
- <br> Morgan
Kaufmann, 1999
|
|
33
|
- <bibliography>
- <book> <title> Foundations… </title>
- <author>
Abiteboul </author>
- <author>
Hull </author>
- <author>
Vianu </author>
- <publisher>
Addison Wesley </publisher>
- <year>
1995 </year>
- </book>
- …
- </bibliography>
|
|
34
|
- A new paradigm for creating distributed applications?
- Systems communicate via messages, contracts.
- Example: order processing system.
- MS .NET, J2EE – some of the platforms
- XML – a part of the story; the data format.
|
|
35
|
- tags: book, title, author, …
- start tag: <book>, end tag:
</book>
- elements: <book>…<book>,<author>…</author>
- elements are nested
- empty element: <red></red> abbrv. <red/>
- an XML document: single root element
|
|
36
|
- <book price = “55” currency = “USD”>
- <title> Foundations of
Databases </title>
- <author> Abiteboul </author>
- …
- <year> 1995 </year>
- </book>
|
|
37
|
- <person id=“o555”> <name>
Jane </name> </person>
- <person id=“o456”> <name>
Mary </name>
-
<children idref=“o123 o555”/>
- </person>
- <person id=“o123” mother=“o456”><name>John</name>
- </person>
|
|
38
|
|
|
39
|
- XML is self-describing
- Schema elements become part of the data
- Reational schema: persons(name,phone)
- In XML <persons>, <name>, <phone> are part of the
data, and are repeated many times
- Consequence: XML is much more flexible
- XML = semistructured data
|
|
40
|
- <person>
- <row> <name>John</name>
- <phone>
3634</phone></row>
- <row> <name>Sue</name>
- <phone>
6343</phone>
- <row> <name>Dick</name>
- <phone>
6363</phone></row>
- </person>
|
|
41
|
- Missing attributes:
- Could represent in
a table with nulls
|
|
42
|
- Repeated attributes
- Impossible in tables:
|
|
43
|
- Attributes with different types in different objects
- Nested collections (no 1NF)
- Heterogeneous collections:
- <db> contains both <book>s and <publisher>s
|
|
44
|
- part of the original XML specification
- an XML document may have a DTD
- XML document:
- well-formed = if tags are correctly closed
- Valid = if it has a DTD and conforms to it
- validation is useful in data exchange
|
|
45
|
|
|
46
|
|
|
47
|
- Content model:
- Complex = a regular expression over other elements
- Text-only = #PCDATA
- Empty = EMPTY
- Any = ANY
- Mixed content = (#PCDATA | A | B | C)*
|
|
48
|
|
|
49
|
- XPath = simple navigation through the tree
- XQuery = the SQL of XML
- XSLT = recursive traversal
- will not discuss in class
|
|
50
|
- <bib>
<book> <publisher>
Addison-Wesley </publisher>
<author>
Serge Abiteboul </author>
<author>
<first-name> Rick </first-name>
<last-name> Hull </last-name>
</author>
<author>
Victor Vianu </author>
<title>
Foundations of Databases </title>
<year>
1995 </year>
</book>
<book price=“55”>
<publisher>
Freeman </publisher>
<author>
Jeffrey D. Ullman </author>
<title>
Principles of Database and Knowledge Base Systems </title>
<year>
1998 </year>
</book>
- </bib>
|
|
51
|
|
|
52
|
- Result: <year> 1995 </year>
- <year> 1998
</year>
- Result: empty (there were no papers)
|
|
53
|
- Result:<author> Serge Abiteboul </author>
- <author> <first-name>
Rick </first-name>
- <last-name>
Hull </last-name>
- </author>
- <author>
Victor Vianu </author>
- <author>
Jeffrey D. Ullman </author>
- Result: <first-name> Rick
</first-name>
|
|
54
|
- Result: Serge Abiteboul
- Jeffrey D.
Ullman
- Rick Hull doesn’t appear because he has firstname, lastname
- Functions in XPath:
- text() = matches the text
value
- node() = matches any node (= *
or @* or text())
- name() = returns the name of the current tag
|
|
55
|
- Result: <first-name> Rick </first-name>
- <last-name>
Hull </last-name>
- * Matches any element
|
|
56
|
- Result: “55”
- @price means that price is has to be an attribute
|
|
57
|
- Result: <author> <first-name> Rick </first-name>
- <last-name>
Hull </last-name>
- </author>
|
|
58
|
- Result: <lastname> … </lastname>
- <lastname> …
</lastname>
|
|
59
|
|
|
60
|
- bib matches a bib element
- * matches any element
- / matches the root element
- /bib matches a bib element under root
- bib/paper matches a paper in bib
- bib//paper matches a paper in bib, at any depth
- //paper matches a paper at any depth
- paper|book matches a paper or a book
- @price matches a price attribute
- bib/book/@price matches price attribute in book, in bib
- bib/book/[@price<“55”]/author/lastname matches…
|
|
61
|
- What’s good about it?
- What can’t it do that you want it to do?
- How does it compare, say, to SQL?
|
|
62
|
- Based on Quilt, which is based on XML-QL
- Uses XPath to express more complex queries
|
|
63
|
- FOR ...
- LET...
- WHERE...
- RETURN...
|
|
64
|
- Find all book titles published after 1995:
|
|
65
|
- Find book titles by the coauthors of “Database Theory”:
|
|
66
|
- Same as before, but eliminate duplicates:
|
|
67
|
- For each author of a book by Morgan Kaufmann, list all books she
published:
|
|
68
|
- <result>
- <author>Jones</author>
- <title> abc </title>
- <title> def </title>
- </result>
- <result>
- <author> Smith
</author>
- <title> ghi </title>
- </result>
|
|
69
|
- FOR $x in expr -- binds $x to
each value in the list expr
- LET $x = expr -- binds $x to the
entire list expr
- Useful for common subexpressions and for aggregations
|
|
70
|
|
|
71
|
- Find books whose price is larger than average:
|
|
72
|
- Summary:
- FOR-LET-WHERE-RETURN = FLWR
|
|
73
|
- FOR
- Binds node variables à iteration
- LET
- Binds collection variables à one value
|
|
74
|
|
|
75
|
- Ordered and unordered collections
- /bib/book/author = an ordered
collection
- Distinct(/bib/book/author) = an unordered collection
- LET $a = /bib/book à $a is a collection
- $b/author à a collection (several authors...)
|
|
76
|
- What about collections in expressions ?
- $b/price à list of n prices
- $b/price * 0.7 à list of n numbers
- $b/price * $b/quantity à list of n x m numbers ??
- $b/price * ($b/quant1 + $b/quant2)
¹
$b/price * $b/quant1 + $b/price
* $b/quant2 !!
|
|
77
|
|
|
78
|
|
|
79
|
|
|
80
|
|
|
81
|
- BEFORE and AFTER
- for dealing with order in the input
- FILTER
- deletes some edges in the result tree
- Recursive functions
- Currently: arbitrary recursion
- Perhaps more restrictions in the future ?
|
|
82
|
- How are we going to process XML efficiently?
- Special purpose XML engines, or
- Add functionality to relational engines?
- Need to manage XML streams.
- Here, data management is much closer to other programming tasks.
|