Review: Essence of XML

From: Li Yan (lanti@u.washington.edu)
Date: Wed Apr 14 2004 - 11:49:21 PDT

  • Next message: Steven Balensiefer: "Essence of XML"

    Review of the Essence of XML

    XML Schema is a fairly complex type system. It uses a named
    type approach which is widely adopted by many programming
    languages.

    + XML without Schema

    The XML data is well-formed if it begins with a declaration,
    has a root element and all tags are properly nested. But to
    obtain type information about individual elements, a DTD is
    required to accompanying the XML, or a XML schema should be
    supplied. XML documents come without a DTD, or XML schema,
    is neither self-describing nor round-tripping. It is not
    self-describing since there is no way to infer the type of
    an element given the XML data alone. Rounding tripping is
    also hard, or impossible because the lack of type
    information will lead to multiple external representations
    of XML data and the converse is also true when data is
    converted back into an internal representation. With DTD or
    XML schema available, we can validate XML data. The
    validation will associate data with a type, and hence a
    matching between data and type will succeed aferwards. This
    is crucial in self-describing and round-tripping because
    given the type information, we are now able to restrict our
    internal representation of a data complying with its type,
    and the erasure of typedValue back to untypedValue can also
    be specified without ambiguity, thus round-tripping is
    achieved.

    + Scope

    The notion of global and local declarations offers some
    flexibility that same element in different places can have
    different types. The introduction of anonymous type becomes
    handy in certain circumstances where a type name can be
    infered from an element.

    + Derivation

    The derivation by restriction on simple types
    looks like creating a subtype of that simple type. e.g.
    define type feet restricts xs:integer The subtypes of
    integer cannot be used in place of another subtype but type
    integer can be used in that case.

    The type derivation from a complex type resembles type
    inheritance in Java, but in a reverse direction, in the
    sense that whenever the former is expected, the latter can
    be used instead. Here the complex type that is derived from
    is a type more "general" than the type derived, to be more
    specific, it might have more fields, or one of its field is
    in a regular expression that describes a language "covers"
    the corresponding field in the derived type. However, for
    all the fields in common in both the "general" and derived
    complex type, they either agree completely or the field in
    the derived complex type describes a language that is a
    subset of its corresponding field in the "general" complex
    type. This observation leads to a serious consequence in
    type checking, namely that one has to check against ALL
    types derivable from the given type before one can claim the
    success of failure in type checking. Given the availabilty
    of regular expression in type definition, The cases of
    different derivation grows exponentially with the number of
    fields in a given type in the worst case. The min/max
    notation in XML Schema furthere complicates this
    problem. Note this is very different from most Object
    Oriented languages like C++, Java, in which type checking a
    base type, or class has nothing to do with the derived
    classes at all.

    An analogy can be drawn between XML complex type derivation
    and the type template mechanism in C++, in the way the
    derived type were derived :). The complex type can have
    regular expression in its element type specification, and
    each instance of that regular expression generates a
    different derived type. Similarly a type paramter
    instantiation will generate a concrete type for a given
    template type in C++.

    + The Validation Theorem

    The Validation Theorem implies both round-tripping and
    reverse round-tripping for unambigious types, and
    fortunately XML Schema prohibits ambigious types. Ambigious
    types occurs in union or list for simple types and when we
    use choice in element type in complex types. The XML Schema
    requires always returning the first match in case of
    ambiguity, which may or may not be what we want in
    conversion between external and internal values.


  • Next message: Steven Balensiefer: "Essence of XML"

    This archive was generated by hypermail 2.1.6 : Wed Apr 14 2004 - 11:49:24 PDT