Finding Lots of Stuff: XPath and Query

Query is the complement of database - you dump a bunch of stuff in a big pile, then you rummage through the pile trying to find something. Query has traditionally been supported by the craft handiwork of subject topical experts such as indexers. They built tools that supported your query such as back-of-the-book indexes and library card catalogs.

When database first appeared, query became programming a route up and down hierarchies or through the maze of a network database (Remember Bachman and the 'Programmer as Navigator'?).

Query was transformed in the modern database period by Ted Codd (Remember relational databases?) who suggested targeting information by specifying values or ranges of one or more attributes of a relation (Translation: specifying a value for a column of a relational table, i.e., flavor = "chocolate"). His work led to the development of SQL, Structured Query Language. Here is an example of a SQL, Structured Query Language, query:

	SELECT name FROM personal_info WHERE salary > $55000
	which means select the name data from the personal_info 
	table where the salary value is greater than $55000
	
	Abstractly, a typical SQL query is:
	
	SELECT "column_name" FROM "table_name" 
	WHERE "simple condition" {[AND|OR] "simple condition"}+

The "AND" and "OR" are Boolean operators, named in honor of George Boole, an English mathematician.
With these Boolean operators you can build complex queries such as
((flavor = "chocolate" OR flavor = "vanilla") AND (topping = "nuts" OR topping = "cherry")) OR dessert = "Bananas")
Is this stuff beyond human comprehension? Probably. Only George would ask for a dessert this way. Pity his poor mother.

April 2001
Get to Know XPath--the Key to Unlocking XML Data
by Mike D. Jones

You can't deny that in a very short time, XML has stormed the development world. The new MSXML 3.0 parser includes full support for XML (DOM and SAX), XSLT and XPath.

However, if you want to accomplish anything with the XML technologies, you'll also need to become familiar with XPath. Just like you wouldn't think of programming a database application without learning how to manipulate ADO recordsets, so too it's a good idea to become familiar with XPath if you want to use XML.

The future of XML documents and relational databases
Jon Udell
July 25, 2003

Query Strategies

The foundation of all XML-oriented query strategies is XPath, a syntax built to descend treelike structures and to lop off branches. When an XSLT stylesheet transforms an XML document, it uses XPath to isolate fragments of the document. Relational databases that support XML queries -- including stalwarts Oracle, DB2, and SQL Server, newcomers such as OpenLink Software's Virtuoso, but not yet MySQL -- use XPath in the same way.

XPath

XPath is the technology for branching through the DOM (Document Object Model) of an XML document and finding information. XPath specifies a "path" into an XML document to the desired information. For example:

<?xml version="1.0" encoding="UTF-8"?>
<Stooges>
	<Name>Larry</Name>
	<Name>Moe</Name>
	<Name>Curley</Name>
</Stooges>

What would Mr. Squirrel find if he were to run down these branches?	

	"Stooges/Name" gives "Larry"
	"Stooges/Name[1]" gives "Larry"
	"Stooges/Name[2]" gives "Moe"
	"Stooges/Name[3]" gives "Curley"

Elements of XPath Grammar

<?xml version="1.0" encoding="UTF-8"?>
<King_County>
	<Animal_Control>
		<Telephone>296-PETS</Telephone>
	</Animal_Control>
</King_County>

Selecting XML Elements

King_County/Animal_Control/Telephone targets 296-PETS
King_County/*/Telephone targets 296-PETS

<?xml version="1.0" encoding="UTF-8"?>
<King_County>
	<Animal_Control>
		<Telephone>296-7387</Telephone>
		<Telephone>296-PETS</Telephone>
	</Animal_Control>
</King_County>

Indexing into a collection

King_County/*/Telephone[2] targets 296-PETS
King_County/Animal_Control/Telephone[last()] targets 296-PETS

<?xml version="1.0" encoding="UTF-8"?>
<Government>
	<County>
		<Name>King</Name>
		<Animal_Control>
			<Telephone>296-7387</Telephone>
			<Telephone>296-PETS</Telephone>
		</Animal_Control>
	</County>
	<County>
		<Name>Snohomish</Name>
		<Animal_Control>
				<City>
					<Name>Everett</Name>
					<Telephone>256-6000</Telephone>
				</City>
				<City>
					<Name>Lynnwood</Name>
					<Telephone>787-2500</Telephone>
				</City>
		</Animal_Control>
	</County>
</Government>

Using Filters

Government/County[Name = 'King']/Animal_Control/Telephone[2] targets 296-PETS
Government/County[Name = 'Snohomish']/Animal_Control/City[Name = 'Lynnwood']/Telephone targets 787-2500

XSL Transformations (XSLT)

Sending Mr. Squirrel down various branches of the DOM is fine, but what should he do with the goodies he finds? Will he use the information to create another XML document? HTML document? Feed the information into the backdoor of a database? Write a plain ASCII file? etc., etc.?

Something to think about (or maybe realize at this point):

Your raw data is marked up semantically in XML, you are about to use a transformation technology XSL that dips into the data with XPath and plucks whatever you like. XSL permits you to create any sort of document you like. You could create another XML document, a MS Word document (Bob Boiko could show you how to do that), feed your stuff into a database, fashion it as a Java class, a C# namespace, etc.

There is something deeply fundamental about a process that is capable of feeding data into so many applications.

XSLT - Extensible Stylesheet Transformation - is the technology that mixes the XML information found by XPath with other technologies such as HTML.

We're going to transform XML into HTML so our XSL stylesheet looks like this:


<xsl:stylesheet 
  version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
  xmlns:fo="http://www.w3.org/1999/XSL/Format">
<xsl:template match="/">
<html>
	<head>
		<title></title>
	</head>
	<body>
		... Your HTML and XPath go in here
	</body>
</html>
</xsl:template>
</xsl:stylesheet>

XSL Grammar

An example of XSL grammar is the xsl:value-of command. Our XPath fits inside the select attribute, e.g. select = "Stooges/Name[3]". Mr. Squirrel runs down this branch and finds the value of the XML element at that location.

	<xsl:value-of select="Stooges/Name[3]"></xsl:value-of>

Combining XML, XPath and XSL to create HTML

Step One: Build A StyleSheet

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" 
	xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
	xmlns:fo="http://www.w3.org/1999/XSL/Format">
<xsl:template match="/">
<html>
	<head>
		<title>Class Example</title>
	</head>
	<body>
	<h1>Dog Catcher</h1>
	<p>In Snohomish county there are two numbers to call when animals are running loose:</p>
	<ul>
	<li>	
	<xsl:value-of 
		select="Government/County[Name='Snohomish']/Animal_Control/City[Name='Everett']/Telephone">
		</xsl:value-of></li>
	<li>
	<xsl:value-of 
		select="Government/County[Name='Snohomish']/Animal_Control/City[Name='Lynnwood']/Telephone">
		</xsl:value-of></li>
	</ul>
	</body>
</html>
</xsl:template>
</xsl:stylesheet>

Step Two: Anchor Your Stylesheet in the XML Document

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="classExample.xsl"?>
<Government>
	<County>
		<Name>King</Name>
		<Animal_Control>
			<Telephone>296-7387</Telephone>
			<Telephone>296-PETS</Telephone>
..... omitted for brevity

Step Three: Point Your IE 6.0 Browser at the XML Document

Editorial Comment: Don't be fooled by all this flash and glitter that serious HTMLers would ever transform XML in the client browsers. No way! Too uncontrolled a process. We're doing it because of pedagogical reasons, i.e., you don't need to be able to program and you just need a browser, etc.