From: Li Yan (lanti@u.washington.edu)
Date: Wed Apr 14 2004 - 11:49:21 PDT
Review of the Essence of XML
XML Schema is a fairly complex type system. It uses a named
type approach which is widely adopted by many programming
languages.
+ XML without Schema
The XML data is well-formed if it begins with a declaration,
has a root element and all tags are properly nested. But to
obtain type information about individual elements, a DTD is
required to accompanying the XML, or a XML schema should be
supplied. XML documents come without a DTD, or XML schema,
is neither self-describing nor round-tripping. It is not
self-describing since there is no way to infer the type of
an element given the XML data alone. Rounding tripping is
also hard, or impossible because the lack of type
information will lead to multiple external representations
of XML data and the converse is also true when data is
converted back into an internal representation. With DTD or
XML schema available, we can validate XML data. The
validation will associate data with a type, and hence a
matching between data and type will succeed aferwards. This
is crucial in self-describing and round-tripping because
given the type information, we are now able to restrict our
internal representation of a data complying with its type,
and the erasure of typedValue back to untypedValue can also
be specified without ambiguity, thus round-tripping is
achieved.
+ Scope
The notion of global and local declarations offers some
flexibility that same element in different places can have
different types. The introduction of anonymous type becomes
handy in certain circumstances where a type name can be
infered from an element.
+ Derivation
The derivation by restriction on simple types
looks like creating a subtype of that simple type. e.g.
define type feet restricts xs:integer The subtypes of
integer cannot be used in place of another subtype but type
integer can be used in that case.
The type derivation from a complex type resembles type
inheritance in Java, but in a reverse direction, in the
sense that whenever the former is expected, the latter can
be used instead. Here the complex type that is derived from
is a type more "general" than the type derived, to be more
specific, it might have more fields, or one of its field is
in a regular expression that describes a language "covers"
the corresponding field in the derived type. However, for
all the fields in common in both the "general" and derived
complex type, they either agree completely or the field in
the derived complex type describes a language that is a
subset of its corresponding field in the "general" complex
type. This observation leads to a serious consequence in
type checking, namely that one has to check against ALL
types derivable from the given type before one can claim the
success of failure in type checking. Given the availabilty
of regular expression in type definition, The cases of
different derivation grows exponentially with the number of
fields in a given type in the worst case. The min/max
notation in XML Schema furthere complicates this
problem. Note this is very different from most Object
Oriented languages like C++, Java, in which type checking a
base type, or class has nothing to do with the derived
classes at all.
An analogy can be drawn between XML complex type derivation
and the type template mechanism in C++, in the way the
derived type were derived :). The complex type can have
regular expression in its element type specification, and
each instance of that regular expression generates a
different derived type. Similarly a type paramter
instantiation will generate a concrete type for a given
template type in C++.
+ The Validation Theorem
The Validation Theorem implies both round-tripping and
reverse round-tripping for unambigious types, and
fortunately XML Schema prohibits ambigious types. Ambigious
types occurs in union or list for simple types and when we
use choice in element type in complex types. The XML Schema
requires always returning the first match in case of
ambiguity, which may or may not be what we want in
conversion between external and internal values.
This archive was generated by hypermail 2.1.6 : Wed Apr 14 2004 - 11:49:24 PDT