CSE 344 Homework 4
- Objectives:
- To be able to manipulate XML: query it with XQuery.
- Reading Assignments:
- Lecture notes on XML and XQuery.
- Number of points:
- 100, 10 for each sub-question.
- Due date:
- Wednesday, April 27th, 2011 - 11:59 pm
- Turn in format:
- Turn in your queries in 9 separate XQuery
files, where each file is named "Problem[problem #].xq". For example, the
solution to problem 1 would be saved in "Problem1.xq". You will also
need to turn in your resulting html file for problem 8, call it "Problem8.html". The
header of each XQuery file (commented) should contain your name and course.
Below the
XQuery,
in a comment, should be the first 3 items it returned (if there are fewer
than 3, list them all). XQuery comments
look like
(: this :)
.
For example, the first question is:
(1) Retrieve all the names of all cities located in Peru, sorted
alphabetically.
- Your file should contain
(: Name
CSEP 544
Other metadata...
:)
(: Problem 1. :)
(Insert your XQuery here)
(: Results
<result>
<country>
<name>Peru</name>
<city>
<name>Abancay</name>
</city>
<city>
<name>Arequipa</name>
</city>
<city>
<name>Ayacucho</name>
</city>
...
</country>
</result>
:)
- We should be able to run your XQuery files for any of the problems and
place the results into a separate file that can be run to verify your solution.
For instance, if your answer to problem X is placed in file ProblemX.xq and the following
command is run,
java -cp saxon9he.jar net.sf.saxon.Query ProblemX.xq
the correct query result for problem X should be printed to
standard output.
-
- Turn in link:
- Please turn in your assignment in the dropbox.
- Assignment Tools:
- XQuery via Saxon (which has both Java and .NET version).
-
Please first
install saxon
on your pc (Saxon Java version only requires unzipping the jar
file).
-
Download Mondial XML dataset and Mondial DTD from here.
-
Follow the brief tutorial to
get started (Saxon with Java+Linux).
- References:
Problems
[100 points, 10 pts for each sub-question (8 pts for correct answer, 2
pts for following the DTD) and 10 pts for the correct HTML file in problem 8] Consider the XML data instance
Mondial, available here (about 1.8 MB).
Write
XQueries to answer the
following
questions. In formulating your questions, you need to understand how
various elements are nested: e.g. what is under a country,
under which element is a city etc. For that it
helps if you
inspect the Mondial DTD
(ignore the warning that
the data is not valid), or
inspect the data directly.
Moreover, the output of the xquery should follow the associated DTD
provided after each question. We will inspect visually if your output follows the DTD, except for problem 9, where we will validate your output automatically.
Furthermore,
the output of each xquery should be a well formed XML after
standard XML headers (<?xml version="1.0" encoding="UTF-8" ?>, etc)
have been added. That is, the output of the first question should be
(along the lines of):
<result>
<country>
<name>Peru</name>
<city>
<name>Abancay</name>
</city>
<city>
<name>Arequipa</name>
</city>
<city>
<name>Ayacucho</name>
</city>
...
</country>
</result>
Note: The amount of white space does not matter.
To test your results to ensure that they are well-formed or that they
follow the appropriate DTD, you can use the
w3 markup validator. Instructions
for how to use this validator are provided here.
You must perform this validation for problem 9, the others are optional.
-
Retrieve all the names of all cities located in Peru, sorted
alphabetically.
<!ELEMENT result (country)>
<!ELEMENT country (name, city+)>
<!ELEMENT city (name)>
<!ELEMENT name (#PCDATA)>
-
For each province of China, return its capital. Order the result by
province name.
<!ELEMENT result (country)>
<!ELEMENT country (name, province+)>
<!ELEMENT province (name, capital)>
<!ELEMENT capital (name)>
<!ELEMENT name (#PCDATA)>
-
Find all countries with more than 20 provinces. Order by the number
of provinces.
<!ELEMENT result (country*)>
<!ELEMENT country (name)>
<!ATTLIST country num_provinces CDATA #REQUIRED>
<!ELEMENT name (#PCDATA)>
-
For each province(state) in the United States, compute the ratio of
its population to area, and return each province's name, its computed
ratio, and order them by ratio.
<!ELEMENT result (country)>
<!ELEMENT country (name, state+)>
<!ELEMENT state (name, population_density)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT population_density (#PCDATA)>
-
Find all ethnic groups that live in more than 10 countries.
<!ELEMENT result (ethnicgroups+)>
<!ELEMENT ethnicgroups (name)>
<!ATTLIST ethnicgroups num_countries CDATA #REQUIRED>
<!ELEMENT name (#PCDATA)>
-
Find all the provinces(states) of the United States with
population more than 11,000,000. Compute the ratio of each qualified
state's population to the whole population of the country. Return each
state's name and the ratio. Order by the ratio in descending order.
<!ELEMENT result (country)>
<!ELEMENT country (name, state+)>
<!ELEMENT state (name, population_ratio)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT population_ratio (#PCDATA)>
-
Find the names of all countries that have at least 3 mountains over
2000m high, and list the names and heights of all mountains in these
countries (regardless of their height). Note: the height attribute is
in meters, so you don't have to do any conversions.
<!ELEMENT result (country+)>
<!ELEMENT country (name, mountains+)>
<!ELEMENT mountains (name, height)>
<!ELEMENT height (#PCDATA)>
<!ELEMENT name (#PCDATA)>
-
For each river which crosses at least 2 countries, return its name,
and the names of the countries it crosses. Order by the numbers of
countries crossed. Place your results into an html file, and verify
whether you can/can't view them in your web browser. Turn in the html file
along with your query.
<!ELEMENT html (head, body)>
<!ELEMENT head (title)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT body (h1, ul)>
<!ELEMENT h1 (#PCDATA)>
<!ELEMENT ul (li+)>
<!ELEMENT li (#PCDATA | font | ol)*>
<!ELEMENT ol (li+)>
<!ELEMENT font (#PCDATA)>
The idea with the <li> containing a font and ol tag is such that
the output looks roughly like:
...
<ul>
<li>
<font>River name</font>
<ol>
<li>Country crossed #1</li>
<li>Country crossed #2</li>
...
</ol>
</li>
...
</ul>
Note: Use the country
attribute for the
tag <river>
to find the respective
countries.
For this problem you need to turn in two files: Problem8.xq and Problem8.html (the output of your query).
-
Find the countries adjacent to the 'Pacific Ocean' (sea).
For this question you are required to validate your output using the
w3 markup validator. We will do this for your answer. Follow the instructions above for validating XML in the
W3 Markup Validator to validate your results for this problem.
whitespace matters when validating the XML, so do not format your
query to include whitespace while validating.
<!ELEMENT result (waterbody)>
<!ELEMENT waterbody (name, adjacent_countries+)>
<!ELEMENT adjacent_countries (country+)>
<!ELEMENT country (name)>
<!ELEMENT name (#PCDATA)>
Note: Use the country
attribute for the
tag <sea>
to find the respective
countries.