================================================================ CSE 344 -- Spring 2011 Lecture 11: XML ================================================================ Note: -- we will cover only subsets of XML, XPath, XQuery. -- this is sufficent to quite advanced use -- the details we leave out can be found in the optional readings. ================================================================ Show sample-xml.xml (from http://www.w3.org/TR/xquery-use-cases/) Key concepts -- element -- tag: begin-tag must match end-tag -- elements are "nested" -- must have a "root" element -- empty elements: = -- attributes Elements v.s. attributes -- ordered v.s. unordered -- repeated v.s. unique -- nested v.s. flat Text -- #PCDATA ("Parsed Character Data") = the text inside elements -- CDATA ("Character Data") = the text inside attributes -- There is no #CDATA and no PCDATA Well formed and Valid XML Documents -- "well formed" XML document -- when tags are matching -- "valid" XML document -- matches a given DTD (discussed later) Use http://validator.w3.org/check to validate ID's and IDREFS -- an attribute is called an ID attribute if it is unique -- an attribute is called IDREF if it references an ID -- the DTD defines which attribute(s) are ID/IDREFs Data Section: ]]> Entity references: -- < means < -- > means > -- that's all we need; the general case is very messy ================================================================ DISCUSSION *** Question: how to the relational and the XML data model compare ? *** Which data model would you choose ? -- University records: students, courses, grades, etc. Relations or XML? Why? -- University Web site: news, academics, admissions, events, research, etc. Relations or XML? Why? -- A genealogy database (family tree) Relations or XML? Why? ---------------- Relational data model = -- rigid flat structure (tables) -- schema must be fixed in advanced -- binary representation: good for performance, bad for exchange -- query language based on Relational Calculus Semistructured data model / XML -- flexible, nested structure (trees) -- does not require predefined schema ("self describing") -- text representation: good for exchange, bad for performance -- query language borrows from automata theory ================================================================ The semistructured data model = A tree ! (show tree for sample-data.xml in class) *** In class: Mappings between relational data model an the semistructured data model Student(sid, name) Takes(sid, cid, grade) Course(sid, title, instructor) Represent in XML in two ways (in class) ================================================================ Why do we call it "semistructured" ? -- missing attributes John 1234 Joe can we do this in the relational data model ? -- multiple attributes Mary 1234 5678 can we do this in the relational data model ? -- attributes have different types in different objects: John Smith 1234 ================================================================ DTD = Document Type Definition (show tree for sample-data-with-dtd.xml in class) -- goal: impose a structure on the XML document -- rather old and arcane; to be replaced by XML Schema, but that is TOOOOO complex Complex = a regular expression over other elements where content is one of: -- Text-only = #PCDATA -- Empty = EMPTY -- Any = ANY -- Mixed content = (#PCDATA | A | B | C)* -- regular expression using , |, *, ?. EXAMPLES IN CLASS