Essence of XML

From: Alexander Moshchuk (anm@cs.washington.edu)
Date: Tue Apr 13 2004 - 23:23:21 PDT

  • Next message: Danny Wyatt: "review of The Essence of XML"

    The Essence of XML

    The paper provides a formalization of a simplified subset of XML
    Schema, a new standard for describing XML document structure. Whereas
    previous work has focused exclusively on structural types, this paper
    also considers the names aspects of XML Schema and how they relate to
    structural types. The goal of all the formalizations is to enhance
    our understanding of how the complex XML Schema standard works using a
    simple but rigorous model, which in turn helps in writing Schema
    implementations as well as discovering weaknesses in the standards.

    Most concepts used in the model's type system are already present in
    some form in other programming languages. In particular, type
    restriction is like subtyping in OO languages applied on a coarser
    granularity; type extension is like class inheritance except that the
    fields cannot be overridden. Unlike many OO languages, checking a
    type requires knowledge of the complete type hierarchy. The paper
    goes on to introduce some fundamental operations on values and types
    in their model; most importantly, these are matching, erasure, and
    validation. Matching is essentially typechecking a value with a type.
    Erasure amounts to taking a typed value and stripping the type off as
    well as converting the value to a general (string) representation.
    Validation of an external (typeless) value against a type provides an
    internal value or fails, depending on whether the external value
    really matches the provided type.

    The paper provides formal judgments defining the semantics of these
    concepts. These formal judgments can them be used to prove
    interesting things about XML Schema, such as the Validation Theorem,
    which connects validation to matching and erasure. Understanding XML
    validation is important, because it allows to impose a certain meaning
    and order to the data in an XML document according to the schema
    specification, allowing document manipulations like querying
    to use the knowledge of the document's internal representation. I
    guess using the type model and the validation theorem, XML validation
    can essentially be reduced to an ordinary typechecking problem.

    I thought the paper was well-written, the reasoning was intuitive, and
    the numerous examples helped understand the concepts quickly. It is
    interesting and unusual (in a good way) that the formal model
    extending the one in the paper is a part of the official specification
    for XQuery and XPath standards. On a side note, I thought the title
    was a bit misleading, as the paper didn't focus on XML itself, but
    instead on XML Schema.


  • Next message: Danny Wyatt: "review of The Essence of XML"

    This archive was generated by hypermail 2.1.6 : Tue Apr 13 2004 - 23:23:21 PDT