review: the essence of XML

From: Joe Xavier (joexav@microsoft.com)
Date: Wed Apr 14 2004 - 21:35:46 PDT

  • Next message: Aaron Chang: "review 1"

    Simeon, Wadler: The essence of XML

    *******************************

    The authors essentially present a formal approach to an XML type system which is both lucid and easy to understand along with brief discussions of XML modeling concepts thrown in. They provide comparisons of their semantics to XML Schema all through the paper which makes it easier to understand.

    I like their treatment of the distinction between named and structural types.

    The best thing I like about the paper is that it provides the semantics for implementing an XML type system without getting entrenched in the semantics of XML schema. They achieve this by abstracting it into simple concepts and dealing with them in detail. Their discussion of Values and Types in section 3 would loosely correspond to the Structures part of XML Schema which is unnecessarily complicated IMO. Section 4 deals with type derivation in a very simple manner.

    Section 7's description of validation essentially describes simply the concept of PSVI without getting into details. I would have liked a bit here.

     

    The title: I actually think the title is quite appropriate. I don't think of XML as purely a data-exchange format. XML is quite powerful as a data model for representing semi-structured data. If XML needs to be anything more than a bunch of pointy tags with text interspersed then it needs a type system and that's precisely what makes XML powerful IMO.. Ergo, a paper describing a XML type system captures the essence of XML =)

     

    Critique:

    The section on Roundtripping could have been dealt with in more detail. For example they didn't deal with what happens in the case of union types. This is one of the more tricky parts and they neatly gloss over this.

    The section on Sensibility didn't add much value over what had been discussed in the previous sections.

     

    Thoughts on some of Dr.Suciu's questions:

     

    1. Consider untyped XML (a.k.a. unvalidated XML, or well-formed XML). Does it also fail to be self-describing and to offer round-tripping ?

    Untyped XML is self-describing with regards to a relational concept of a schema but it loses all other information i.e. no type information. As a result it's not round trippable in the sense that two foo elements could be completely different (different types) but when the document is untyped they're the same. Just a bunch of strings between pointy tags.

     

    2. Why do we need to validate XML documents?

    To ensure data validity and a bunch of obvious reasons. It can also be used to enforce business logic e.g. element age is an integer with a pattern facet that ensures that age has to be above 18.

     

    3. You probably have encountered subtyping or inheritance in other programming languages (or both). How do they relate to, or differ from restriction and extension in XML Schema ?

    This was covered in Dr.Suciu's primer (which was pretty good btw).

     

    4. What are the most important practical implications of the Validation Theorem?

    It tells you how to implement validation.


  • Next message: Aaron Chang: "review 1"

    This archive was generated by hypermail 2.1.6 : Wed Apr 14 2004 - 21:35:53 PDT