================================================================
CSE 344 -- Spring 2011
Lecture 11: XML
================================================================
Note:
-- we will cover only subsets of XML, XPath, XQuery.
-- this is sufficent to quite advanced use
-- the details we leave out can be found in the optional readings.
================================================================
Show sample-xml.xml (from http://www.w3.org/TR/xquery-use-cases/)
Key concepts
-- element
-- tag: begin-tag must match end-tag
-- elements are "nested"
-- must have a "root" element
-- empty elements: =
-- attributes
Elements v.s. attributes
-- ordered v.s. unordered
-- repeated v.s. unique
-- nested v.s. flat
Text
-- #PCDATA ("Parsed Character Data") = the text inside elements
-- CDATA ("Character Data") = the text inside attributes
-- There is no #CDATA and no PCDATA
Well formed and Valid XML Documents
-- "well formed" XML document -- when tags are matching
-- "valid" XML document -- matches a given DTD (discussed later)
Use http://validator.w3.org/check to validate
ID's and IDREFS
-- an attribute is called an ID attribute if it is unique
-- an attribute is called IDREF if it references an ID
-- the DTD defines which attribute(s) are ID/IDREFs
Data Section:
]]>
Entity references:
-- < means <
-- > means >
-- that's all we need; the general case is very messy
================================================================
DISCUSSION
*** Question: how to the relational and the XML data model compare ?
*** Which data model would you choose ?
-- University records: students, courses, grades, etc.
Relations or XML? Why?
-- University Web site: news, academics, admissions, events, research, etc.
Relations or XML? Why?
-- A genealogy database (family tree)
Relations or XML? Why?
----------------
Relational data model =
-- rigid flat structure (tables)
-- schema must be fixed in advanced
-- binary representation: good for performance, bad for exchange
-- query language based on Relational Calculus
Semistructured data model / XML
-- flexible, nested structure (trees)
-- does not require predefined schema ("self describing")
-- text representation: good for exchange, bad for performance
-- query language borrows from automata theory
================================================================
The semistructured data model = A tree !
(show tree for sample-data.xml in class)
*** In class: Mappings between relational data model an the
semistructured data model
Student(sid, name)
Takes(sid, cid, grade)
Course(sid, title, instructor)
Represent in XML in two ways (in class)
================================================================
Why do we call it "semistructured" ?
-- missing attributes
John
1234
Joe
can we do this in the relational data model ?
-- multiple attributes
Mary
1234
5678
can we do this in the relational data model ?
-- attributes have different types in different objects:
John Smith
1234
================================================================
DTD = Document Type Definition
(show tree for sample-data-with-dtd.xml in class)
-- goal: impose a structure on the XML document
-- rather old and arcane; to be replaced by XML Schema, but that is
TOOOOO complex
Complex = a regular expression over other elements
where content is one of:
-- Text-only = #PCDATA
-- Empty = EMPTY
-- Any = ANY
-- Mixed content = (#PCDATA | A | B | C)*
-- regular expression using , |, *, ?. EXAMPLES IN CLASS