CSE 444 Homework 3
- Objectives:
- To be able to manipulate XML: query it with XQuery.
- Reading Assignments:
- Lecture notes on XML and XQuery.
- Number of points:
- 100, 10 for each sub-question.
- Due
date:
- Saturday, May 10, 2008 - 9 pm
- Turn in
format:
- You should
turn in your answers in a single text file with the solutions to the questions
in numeric order. Each query should be preceded by a comment giving the problem
number.
Below the
XQuery,
in a comment, should be the first 3 items it returned (if there are fewer
than 3, list them all). XQuery comments
look like
(: this :)
.
For example, the first question is:
(1) Retrieve all the names of all cities located in Peru, sorted
alphabetically.
- Your file should contain
(: Problem 1. :)
(Insert your XQuery here)
(: Results
<city> abc </city>
<city> def </city>
<city> ghi </city>:)
- We should be able to extract your answer for any of the problems and
place it into a separate file that can be run to verify your solution.
For instance, if one of your answers is placed in a file named X.sq and the following
command is run,
galax-run X.xq
the correct query result for that problem should be printed to standard output.
-
- Turn in link:
- Please turn in your assignment using the regular assignment drop box.
- Assignment Tools:
- XQuery (Galax),
(Here is a tutorial for
using Galax on Linux OR using Galax on
Windows.)
Problems:
[100 points, 10 for each sub-question] Consider the XML data instance
Mondial, available here (about 1.8 MB).
Write
XQueries to answer the
following
questions. In formulating your questions, you need to understand how
various elements are nested: e.g. what is under a country,
under which element is a city etc. For that it
helps if you
inspect the DTD
(ignore the warning that
the data is not valid), or
inspect the data directly.
1. Retrieve all the names of all cities located in Peru, sorted
alphabetically.
2. For each province of China, return its capital. Order the result by
province name.
3. Find all countries with more than 20 provinces.
4. For each province(state) in the United States, compute the ratio of
its
population to area, and return each province's name, its computed
ratio, and order them by ratio.
5. Find all ethnic groups that live in more than more than 10
countries.
6. Find the countries adjacent to the 'Pacific Ocean' sea.
7. Find all the provinces(states) of the United States with
population more
than 11,000,000. Compute the ratio of each qualified state's population
to the whole
population of the country. Return each state's name and the ratio.
Order by the ratio in descending order.
8. For each river which crosses at least 2 countries, return its name,
and the names of the countries it crosses. Order by the numbers of
countries crossed.
9. Find the names of all countries that have at least 3
mountains over 2000m high, and list the names and heights of all
mountains in these countries (regardless of their height). Note: the height
attribute is in meters, so you don' have to do any conversions.
10. One user is interested in long rivers. Produce the following
view of the data, containing only rivers longer than 2000 (all units
are in km), in the format described below:
- The root element is user and contains
several river
elements
- Each river contains a name
element with
the river's name, and several country elements, one
for each
country through which it flows. (Note: some rivers may not have any
country, due to noise in the data. It is OK to include these rivers,
even if they look as they flow through no country at all.)
- Each country element contains only the
name of
the country (a string).
For this question you need to run galax. For a
brief
introduction to
getting started with galax, click on one of the following links: Click here for using
Galax on
Linux OR Click
here for
using Galax on Windows.