CSE 344 section 04 -- Practice with XML and XPath
=================================================
How is homework 3 going?
This section, you will practice using XML and querying it with XPath,
which is a way to specify elements within an XML document using a
notation reminiscent of the Unix file system.
We will use the Mondial database, which you are using in homework 4,
due next week. Note that for homework 4, you will use XQuery instead
of XPath, but every XPath expression is also an XQuery query.
Here is a simplified example of Mondial:
Europe
9562488
Germany835361150.67614522001871-01-18federal republicBaden WurttembergStuttgart9.148.7588482Thjorsa
7530
230-20.863.9
We will use the Saxon XQuery/XPath processor. You can download it from here:
http://saxon.sourceforge.net
Select "Saxon Home Edition" (Saxon-HE). We'll use the Java version.
Download and unzip the file to reveal the Jav JAR file. Don't unzip by
double-clicking on the Mac, because you might end up unzipping the JAR
file too.
Then you can run Saxon on the command line. Cd to the directory
with Saxon and say:
java -cp saxon9he.jar net.sf.saxon.Query YOUR_QUERY_FILE.xq
Unfortunately Saxon pushes all the output onto one line.
An easy way to nicely display all the output is by redirecting it
to a .xml file and then opening the XML file in a web browser:
... net.sf.saxon.Query YOUR_QUERY_FILE.xq > YOUR_QUERY_RESULT.xml
firefox YOUR_QUERY_RESULT.xml
IE and Firefox can display any XML file in a nicely readable form.
Safari cannot, and only later versions of Chrome can.
Note that no browser can display badly-formed XML; however, whenever
an XPath query returns a list of items rather than just one item,
Saxon will return badly formed XML in the output.
Practice XPath Queries for today:
0. List the entire contents of Mondial.
doc("mondial.xml")/mondial
You can specify the file to read data from with the doc() XPath function.
Then you can specify the root XML element with doc()/ELEMENT_NAME, just
like a Unix path. (It's customary to omit the doc() on paper.)
1. Give a list of all the countries in XML.
doc("mondial.xml")//country
The double slash // before "country" tells XPath to find elements
at any point in the XML data tree below the point that precedes
the double //. Because the // is preceded here by the doc(), this
means to find anywhere in Mondial.
Note that this will return badly formed XML. We can fix this by
wrapping it in a dummy "answer" element:
{ doc("mondial.xml")//country }
Note that the curly braces are needed here to tell Saxon that you
actually want to evaluate the XPath expression, instead of including
it as a literal string.
2. Give a list of the countries that Germany borders.
{ doc("mondial.xml")//country[@car_code="D"]/border }
You can filter the elements returned by a boolean expression
in square brackets []. Here, we ask for elements
whose "car_code" attribute (@car_code) is equal to "D", and
then get the elements who are the immediate children.
These rest will have answers posted online later.
3. Give the names of all the countries with populaion at least 10 million.
4. Find all cities located in countries that are partially or fully
part of Europe. (The cities themselves don't have to be in Europe.)
5. Find the names of all rivers that start north of the equator
(at a positive latitude).
6. Find the names of all rivers that start in Iceland.
7. Get the names of all countries in both Asia and Europe.
8. Challenge problem: Get the name of every country that borders France *and*
has either population greater than 20 million *or* GDP greater than 10000.