I have mentioned separating structure from presentation several times in the course of discussing XML, XHTML, etc. This is generally thought to be a good thing, because it allows you to provide different presentations of the data without having to rewrite the basic structure of the information itself. Depending on the needs of the user, data can be presented with a different color scheme, in different fonts, linearly or in floating blocks, etc, all without changing the basic content files themselves.
There is another level of decomposition that is sometimes helpful to apply. That is splitting out the information content of the basic data from the structure of the basic data. That is to say, in some sense the structure is also a variable attribute of the data, and there are times when the same information might be structured one way for one usage, and another way for another usage. In both cases, we want to retain the ability to further style the content for presentation.
This situation is illustrated in this diagram taken from the article An Introduction to Client-Side XSLT on Digital Web Magazine.
Initially, data (information content), structure, and design (presentation) were all managed using the ad-hoc tools of HTML, as designed during the browser wars by competing programmers at Microsoft and Netscape. The advent of CSS and XHTML meant that we could separate the presentation from the information content and structure.
There is one further tool that has been defined and is very useful, both in creating web pages and in the more general application of structured XML data files: Extensible Stylesheet Language Transformation XSLT. With XSLT, we can transform the structure of an XML file into another XML file, an XHTML file, or essentially any arbitrary format that is needed. If the resulting file is intended for presentation to the user, it can be further styled using CSS. If the result is intended for another application, then it can be used as is.
XSL Transformations (XSLT) is an XML language for transforming XML documents from one syntax to another. An XSLT document can be thought of as a program that is interpreted by a processor. The program is supplied an input XML document (the source tree) and it produces a second document (the result tree) as output. The format of the result tree is entirely dependent on the XSLT program (or stylesheet).
XSLT is an XML application, that is, an XSLT stylesheet file is written in a dialect of XML. XSLT is a relatively simple pattern based language, although it can be confusing at first. An XSLT stylesheet consists of a collection of template rules which define the transformations to be performed. These template rules can be applied recursively.
The XSLT processor checks which template rules can be applied and executes the associated transformations based on a sequence of priorities. XSLT processors can be standalone programs (eg, Apache Xalan, MSXSL) or part of a browser or other program (eg, Firefox, IE 6, libxml2). Either way, the source file is read in, the transformation is applied, and the result is made available in whatever medium was specified.
As an example, recall the simple XML file ex11/twoelements.xml from the previous set of notes. That file has no attached transformation or style sheet, so the browser (Firefox and IE6, at least) display the XML directly.
For these XSL notes, I've written a short XSLT transformation
ex12/build-element-page.xsl
and associated it with the XML file by adding an
<?xml:stylesheet ...?>
processing instruction in the data file.
Loading the new version of the file causes it to be transformed before
being displayed. For example, see
ex12/twoelements.xml and
ex12/allelements.xml.
Both of these files transformed using XSLT, then the CSS presentation styles are applied
using ex12/elements.css.
The key concept with XSL is the idea that template rules are matched to particular nodes in the source tree, and the output specified by the template is then written out. So there are two important issues:
A template is an XML element. The name of the element is
xsl:template
. The element has an attribute
named match
. The value of the match
attribute is an XPath expression that specifies the nodes in the
source tree to match. So for example, the following template will be
applied to the root node:
<xsl:template match="/">
...
</xsl:template>
The result tree (the output document) is constructed by processing a list initially containing just the root node. A node is processed by finding all the template rules with patterns that match the node, and choosing the best amongst them. Then the chosen rule's template is instantiated with the node as the current node and with the list of source nodes as the current node list. A template typically contains instructions that select an additional list of source nodes for processing. The process of matching, instantiation and selection is continued recursively until no new source nodes are selected for processing.
So the root node /
is matched first. In our example,
the template for the root node is
<xsl:template match="/">
<html xmlns="http://www.w3.org/1999/xhtml">
<xsl:attribute name="xml:lang">en</xsl:attribute>
<xsl:attribute name="lang">en</xsl:attribute>
<head>
<meta http-equiv="content-type" content="text/html;charset=utf-8" />
<meta http-equiv="content-style-type" content="text/css" />
<link>
<xsl:attribute name="href">elements.css</xsl:attribute>
<xsl:attribute name="rel">stylesheet</xsl:attribute>
<xsl:attribute name="type">text/css</xsl:attribute>
</link>
<title>Elements Table</title>
</head>
<xsl:apply-templates/>
</html>
</xsl:template>
The template is defined by the start and end tags
<xsl:template match="/">...</xsl:template>
.
Everthing in between those tags is the template. Most of the text is output as given, with
some changes for the attributes of the html and link tags. The element that "selects
an additional list of source nodes" is
<xsl:apply-templates/>
. This tells the XSLT
processor to compare each child node of the source node (in this case, the root)
against the templates in the style sheet, and if a match is found, then output
the template for the matched code.
When the <xsl:apply-templates/>
element is
processed, the processor finds that there is a child element PERIODIC_TABLE. It
finds the appropriate template and applies that.
<xsl:template match="PERIODIC_TABLE">
<body xmlns="http://www.w3.org/1999/xhtml">
<table>
<tr>
<th>Name</th>
<th>Symbol</th>
<th>Number</th>
</tr>
<xsl:apply-templates select="ATOM">
<xsl:sort select="SYMBOL"/>
</xsl:apply-templates>
</table>
</body>
</xsl:template>
The processor emits the first part of the body of the html document,
then encounters another <xsl:apply-templates>
element.
There is an xsl:sort
child element enclosed in the body
of the <xsl:apply-templates>
element, so the selected ATOM
nodes are sorted according to their SYMBOL elements. Then,
applying the known templates to the ATOM children of the PERIODIC_TABLE, the processor
finds two matches and processes those.
<xsl:template match="ATOM">
<xsl:if test="ATOMIC_NUMBER<10">
<tr xmlns="http://www.w3.org/1999/xhtml">
<td><xsl:value-of select="NAME"/></td>
<td><xsl:value-of select="SYMBOL"/></td>
<td><xsl:value-of select="ATOMIC_NUMBER"/></td>
</tr>
</xsl:if>
</xsl:template>
The ATOM template outputs selected values from the children of each ATOM node, along with some html markup to build a table.
The ATOM template does not select any more nodes, so recursive processing stops at this level and returns to the enclosing PERIODIC_TABLE template. The closing tags for the table and the body are emitted, and then that template is exhausted. Processing control returns to the initial root template, which emits the closing html tag, and then we're done!
A complete html document has been created based on an original xml document and an associated transformation. Notice that the created html document actually contains a reference to a CSS style sheet, so the presentation is polished off using the styles defined in that style sheet.
There are numerous elements defined in the XSLT language. Refer to the various XSL resources for tutorials and more details. The following tags will get you started.
<xsl:apply-templates/>
:
process all of the children of the current node. The
select
attribute can be used to process nodes
selected by an expression instead of processing all children.<xsl:value-of select="NAME"/>
:
Compute the string value of something and copy that text value into the
output document.
The required
select
attribute is an expression;
this expression is evaluated
and the resulting object is converted to a string
as if by a call to the string function. A simple expression like
the name of a child node (eg, NAME in the example) is converted by
taking the value of the element content (in this case, whatever the
name of the ATOM is).<xsl:attribute name="lang">en</xsl:attribute>
:
Create an attribute name and value and add it to the element currently being
constructed. Note that if the attribute is always the same, it can
be included directly in the template, but if the value must be
calculated using information in the input document, this element can
calculate the needed strings. In the case of the example given here,
the attribute could have been specified directly in the html tag rather
than separately.<xsl:if test="ATOMIC_NUMBER<10">...</xsl:if>
:
Optionally include certain text in the output file.
The test attribute contains an expression that evaluates to a boolean (true
or false). If true, the contents of the xsl:if
is
output. Otherwise, they're not. Note that the < in the test is encoded because
the XSL file must be valid XML and cannot contain random < symbols. See also
xsl:choose
for a way to select one of several
possible outputs.<xsl:sort select="SYMBOL"/>
:
Instead of processing the selected nodes in document order, sort the nodes according
to the specified sort key(s) and then process them in sorted order.
An xsl:sort
element appears as a child (an
enclosed element) of an
<xsl:apply-templates>...</xsl:apply-templates>
element or a
<xsl:for-each>...</xsl:for-each>
element
. Note that the enclosing element
has to use a begin and end tag in order to enclose the
xsl:sort
element.
The select
attribute defines the key used to sort the element's
output. More than one xsl:sort
element can be specified.
The first xsl:sort
child specifies the primary sort key,
the second xsl:sort
child specifies the secondary sort key
and so on.