The Essence of XML Indeed

From: Charles Giefer (cgiefer@cs.washington.edu)
Date: Wed Apr 14 2004 - 02:53:09 PDT

  • Next message: Ankur Jain: "Review: The Essence of XML"

    First of all, this paper was unlike anything I've read before.
    Unfortunately, I am only marginally knowledgeable about databases, XML, and
    programming languages. Also, the style was different from papers that I
    usually read (architecture, systems, and design papers). With little
    foundation, I still found this paper interesting and elegant in the way it
    described a formal semantics for XML Schema.

    I think there are several important points that this paper makes. First,
    that XML Schema is overly complex and XML is a somewhat inadequate form for
    data representation. Second, that well-defined formal semantics can clear
    up many of these complexities.

    A format for representing data must be self-describing and round-tripping.
    XML does neither of these very well. Above that, the standard is complex
    with more than 300 printed pages. Nevertheless, XML Schema is a widely used
    standard. This paper states these facts and proposes methods for cleaning
    up the typing semantics.

    These formal semantics allow for efficient translations between external XML
    data and internal strongly-typed data representations. The way of
    translating external data into internal data is called validation while the
    translation in the other direction is called erasure. The goal here is
    multifaceted. First, the semantics must enforce (as strongly as possible)
    round-tripping. Data must only have one interpretation in each direction.
    Second, the semantics must be simple and powerful. This is achieved using
    several inference rules and definitions.

    The authors are obviously thorough and very intelligent; however there is
    one aspect that I didn't quite understand their reasoning. The bulk of
    their remaining round-tripping problems come from translation to and from
    integer representations. Obviously, we know that numbers have an infinite
    number of equivalent representations (0's in front, different radixes, etc).
    The data in the XML document is written with a certain format in mind. For
    example, if I were James Bond, I would like my name to be 007, not simply 7.
    Their method of translation looses this information. It seems, however,
    that type information could still be assigned and enforced without this loss
    of information. One proposal would be to make the integer type containing
    both the number and the formatting information (which could be as simple as
    the string representation of that number). Something informally like:
    betterInt :== int
            | int, string
            | int, radix
            | int, precision

    Something that confused me about this paper was the meaning of the different
    phrases "XML Schema," "XML," and "Schema." While I have a vague notion of
    what is intended by each of these words, they seemed to use these three
    names interchangeably at times.

    All-in-all, this paper accomplished its immediate goals: express the
    formality missing from XML Schema and propose a set of powerful semantics to
    express this formality. It also appeared, though, to have some subtle
    goals, including discrediting atomic types other than ints and strings. The
    matching syntax also made me want to start writing in OCaml again.


  • Next message: Ankur Jain: "Review: The Essence of XML"

    This archive was generated by hypermail 2.1.6 : Wed Apr 14 2004 - 02:53:19 PDT