CSE 344 section 04 -- Practice with XML and XPath ================================================= How is homework 3 going? This section, you will practice using XML and querying it with XPath, which is a way to specify elements within an XML document using a notation reminiscent of the Unix file system. We will use the Mondial database, which you are using in homework 4, due next week. Note that for homework 4, you will use XQuery instead of XPath, but every XPath expression is also an XQuery query. Here is a simplified example of Mondial: Europe 9562488 Germany 83536115 0.67 6 1452200 1871-01-18 federal republic Baden Wurttemberg Stuttgart 9.1 48.7 588482 Thjorsa 7530 230 -18 65 -20.8 63.9 We will use the Saxon XQuery/XPath processor. You can download it from here: http://saxon.sourceforge.net Select "Saxon Home Edition" (Saxon-HE). We'll use the Java version. Download and unzip the file to reveal the Jav JAR file. Don't unzip by double-clicking on the Mac, because you might end up unzipping the JAR file too. Then you can run Saxon on the command line. Cd to the directory with Saxon and say: java -cp saxon9he.jar net.sf.saxon.Query YOUR_QUERY_FILE.xq Unfortunately Saxon pushes all the output onto one line. An easy way to nicely display all the output is by redirecting it to a .xml file and then opening the XML file in a web browser: ... net.sf.saxon.Query YOUR_QUERY_FILE.xq > YOUR_QUERY_RESULT.xml firefox YOUR_QUERY_RESULT.xml IE and Firefox can display any XML file in a nicely readable form. Safari cannot, and only later versions of Chrome can. Note that no browser can display badly-formed XML; however, whenever an XPath query returns a list of items rather than just one item, Saxon will return badly formed XML in the output. Practice XPath Queries for today: 0. List the entire contents of Mondial. doc("mondial.xml")/mondial You can specify the file to read data from with the doc() XPath function. Then you can specify the root XML element with doc()/ELEMENT_NAME, just like a Unix path. (It's customary to omit the doc() on paper.) 1. Give a list of all the countries in XML. doc("mondial.xml")//country The double slash // before "country" tells XPath to find elements at any point in the XML data tree below the point that precedes the double //. Because the // is preceded here by the doc(), this means to find anywhere in Mondial. Note that this will return badly formed XML. We can fix this by wrapping it in a dummy "answer" element: { doc("mondial.xml")//country } Note that the curly braces are needed here to tell Saxon that you actually want to evaluate the XPath expression, instead of including it as a literal string. 2. Give a list of the countries that Germany borders. { doc("mondial.xml")//country[@car_code="D"]/border } You can filter the elements returned by a boolean expression in square brackets []. Here, we ask for elements whose "car_code" attribute (@car_code) is equal to "D", and then get the elements who are the immediate children. These rest will have answers posted online later. 3. Give the names of all the countries with populaion at least 10 million. 4. Find all cities located in countries that are partially or fully part of Europe. (The cities themselves don't have to be in Europe.) 5. Find the names of all rivers that start north of the equator (at a positive latitude). 6. Find the names of all rivers that start in Iceland. 7. Get the names of all countries in both Asia and Europe. 8. Challenge problem: Get the name of every country that borders France *and* has either population greater than 20 million *or* GDP greater than 10000.