From: Charles Giefer (cgiefer@cs.washington.edu)
Date: Wed Apr 14 2004 - 02:53:09 PDT
First of all, this paper was unlike anything I've read before.
Unfortunately, I am only marginally knowledgeable about databases, XML, and
programming languages. Also, the style was different from papers that I
usually read (architecture, systems, and design papers). With little
foundation, I still found this paper interesting and elegant in the way it
described a formal semantics for XML Schema.
I think there are several important points that this paper makes. First,
that XML Schema is overly complex and XML is a somewhat inadequate form for
data representation. Second, that well-defined formal semantics can clear
up many of these complexities.
A format for representing data must be self-describing and round-tripping.
XML does neither of these very well. Above that, the standard is complex
with more than 300 printed pages. Nevertheless, XML Schema is a widely used
standard. This paper states these facts and proposes methods for cleaning
up the typing semantics.
These formal semantics allow for efficient translations between external XML
data and internal strongly-typed data representations. The way of
translating external data into internal data is called validation while the
translation in the other direction is called erasure. The goal here is
multifaceted. First, the semantics must enforce (as strongly as possible)
round-tripping. Data must only have one interpretation in each direction.
Second, the semantics must be simple and powerful. This is achieved using
several inference rules and definitions.
The authors are obviously thorough and very intelligent; however there is
one aspect that I didn't quite understand their reasoning. The bulk of
their remaining round-tripping problems come from translation to and from
integer representations. Obviously, we know that numbers have an infinite
number of equivalent representations (0's in front, different radixes, etc).
The data in the XML document is written with a certain format in mind. For
example, if I were James Bond, I would like my name to be 007, not simply 7.
Their method of translation looses this information. It seems, however,
that type information could still be assigned and enforced without this loss
of information. One proposal would be to make the integer type containing
both the number and the formatting information (which could be as simple as
the string representation of that number). Something informally like:
betterInt :== int
| int, string
| int, radix
| int, precision
Something that confused me about this paper was the meaning of the different
phrases "XML Schema," "XML," and "Schema." While I have a vague notion of
what is intended by each of these words, they seemed to use these three
names interchangeably at times.
All-in-all, this paper accomplished its immediate goals: express the
formality missing from XML Schema and propose a set of powerful semantics to
express this formality. It also appeared, though, to have some subtle
goals, including discrediting atomic types other than ints and strings. The
matching syntax also made me want to start writing in OCaml again.
This archive was generated by hypermail 2.1.6 : Wed Apr 14 2004 - 02:53:19 PDT