From: Alexander Moshchuk (anm@cs.washington.edu)
Date: Tue Apr 13 2004 - 23:23:21 PDT
The Essence of XML
The paper provides a formalization of a simplified subset of XML
Schema, a new standard for describing XML document structure. Whereas
previous work has focused exclusively on structural types, this paper
also considers the names aspects of XML Schema and how they relate to
structural types. The goal of all the formalizations is to enhance
our understanding of how the complex XML Schema standard works using a
simple but rigorous model, which in turn helps in writing Schema
implementations as well as discovering weaknesses in the standards.
Most concepts used in the model's type system are already present in
some form in other programming languages. In particular, type
restriction is like subtyping in OO languages applied on a coarser
granularity; type extension is like class inheritance except that the
fields cannot be overridden. Unlike many OO languages, checking a
type requires knowledge of the complete type hierarchy. The paper
goes on to introduce some fundamental operations on values and types
in their model; most importantly, these are matching, erasure, and
validation. Matching is essentially typechecking a value with a type.
Erasure amounts to taking a typed value and stripping the type off as
well as converting the value to a general (string) representation.
Validation of an external (typeless) value against a type provides an
internal value or fails, depending on whether the external value
really matches the provided type.
The paper provides formal judgments defining the semantics of these
concepts. These formal judgments can them be used to prove
interesting things about XML Schema, such as the Validation Theorem,
which connects validation to matching and erasure. Understanding XML
validation is important, because it allows to impose a certain meaning
and order to the data in an XML document according to the schema
specification, allowing document manipulations like querying
to use the knowledge of the document's internal representation. I
guess using the type model and the validation theorem, XML validation
can essentially be reduced to an ordinary typechecking problem.
I thought the paper was well-written, the reasoning was intuitive, and
the numerous examples helped understand the concepts quickly. It is
interesting and unusual (in a good way) that the formal model
extending the one in the paper is a part of the official specification
for XQuery and XPath standards. On a side note, I thought the title
was a bit misleading, as the paper didn't focus on XML itself, but
instead on XML Schema.
This archive was generated by hypermail 2.1.6 : Tue Apr 13 2004 - 23:23:21 PDT