CSE373—Data Structures and Algorithms
for Nonmajors
Programming Assignment #5
due: Wednesday, 3/12/08, 10:00 PM
Introduction:
You may have heard terms
such as ontologies, OWL, the Semantic Web, XML, etc. The World Wide Web and HTML are great for
sharing content, but the content is meant to be readable by a person. On the Semantic Web, the content is meant to
be readable by a computer. Already there
is much XML content on the web as part of RSS feeds, accumulated by mashups,
etc. However a challenge is that the
information provided by one site is largely independent of information provided
by another site; there exists no meta-knowledge unifying their information. The goal of the Semantic Web is for
everything to be represented in a form which a computer can reason about. Obviously, we are nowhere close, but great
strides are being made in biology on smaller domains focusing on anatomy,
biochemistry, genetics, etc.
Why is it so important for computers to be able
to reason? Here’s a simple example in
the domain of anatomy. Suppose a doctor
has a patient with a heart defect and wants to research it. A query on Google or PubMed for “heart
defect” might turn up relevant information.
Suppose another doctor had a patient with a heart defect in the left
ventricle and wrote a paper on it, but the paper got indexed under “left
ventricle defect.” The first doctor
would never find that paper using the standard dumb search algorithms. But if the computer knew that the left ventricle
was part of the heart, then that paper could be found!
In this project, you will be working with the
Foundational Model of Anatomy (FMA), an acclaimed knowledge base developed at
the
The current release of the Foundational Model of
Anatomy has roughly 78,000 entities, 200 types of predicates, in excess of a hundred
thousand instances (many just synonyms but some reified relations), and over a
million subject-predicate-object triples.
Its subclass taxonomy has a depth of 19 (Dorsal digital vein of left big
toe, along with 17 other veins of the feet, are of that depth). We’ll be accessing only a few of the
predicates.
Subject-Predicate-Object
triples:
Your access to the FMA is through working with
triples that read much like a sentence.
For example, the FMA contains the triples (“Human body”,HAS_SUBCLASS,”Female human body”) and (“Human body”,HAS_SUBCLASS,”Male human
body”). The FMA also contains the
triple (“Heart”,
IS_PART_OF,”Cardiovascular system”).
You work with these triples by giving a subject s and a predicate p, and
it will return each object o (stored
in a list) such that (s,p,o) is a
triple in the ontology. So a query of getObjects(“Human body”,HAS_SUBCLASS) will return all direct
subclasses of “Human body.”
This representation might remind you of a
graph. In fact, an ontology is a
graph. Each item in the ontology is a
labeled node, and a predicate is a directed edge (with a label) connecting a
subject to an object (the source vertex to the target vertex). The getObjects method in effect is
acting as an adjacency list.
The FMA has been defined robustly such that
every predicate we’ll use has an inverse, and if (s,p,o) is in the ontology, then (o,inv(p),s) is also in the ontology. Note that the FMA is not yet complete, so
don’t be alarmed if you think something should be there or be connected somehow
and it’s not!
Programming
(80 points):
You should use good style, whitespace, naming
conventions, etc., and comment your code.
You may use ANYTHING in java.util.*. You may add as many helper methods as you
want. Do not add any other instance or class
variables. Feel free to change main...
we will test your methods on the same and other input parameters.
The predicates to use are defined as constants
at the top of the code. Desired outputs
are in main. When the output is large, a
checksum (64-bit hash) is shown instead as a way of authenticating/validating
your results. (Checksums are frequently
used to make sure that downloads haven’t been corrupted/tampered.) However you still should look at the outputs
to see what they are and if they make sense.
You may even learn something about anatomy!
To compile and run, you’ll need to add “fma.jar”
to the classpath. For example on the
command-line,
javac –cp fma.jar;. FMAQueries.java and java –cp fma.jar;. FMAQueries
Implement
the following methods however you want:
0) directSubclasses(String
s). Returns a Collection of Strings containing
the direct subclasses of s. The Collection must iterate in alphabetical
order. Already implemented for you!
1) transitiveParts(String
s). Returns a Collection of Strings containing
items that are transitively part of s, in other words, all
items obtained from recursively asking for the parts of s. The Collection must
iterate in alphabetical order.
2) certainOrgansInPart(String
s). Returns a Collection of Strings containing
items that are transitively part of s and are descendents of
(transitively subclasses of) “Organ with organ cavity” or “Parenchymatous
organ”. The Collection must iterate in
alphabetical order.
3) commonPartsOfKids(String
s). Returns a Collection of Strings containing
items that are transitively part of each subclass of s but are not transitively part of s. A way to interpret
this list is that it is the list of all items that are redundant and should be
defined more abstractly in the common parent.
For example, when “Human body” is passed in, it computes everything that
is recursively part of both “Male human body” and “Female human body” but is
not part of “Human body.” The Collection
returned must iterate in alphabetical order.
4) shortestPath(String
s, String t). Returns a Collection of Strings that when
iterated, produces any shortest path from s to t (including the endpoints) via the IS_CONTINUOUS_WITH
predicate. There is no weighting of
edges, i.e. each edge costs 1. For this
problem, we will use just the different myocardial zones as possible
input. The numbers for the different
zones are 1..20 plus 13a..17a.
Cardiologists have demarcated the zones based on what coronary arteries
supply blood to where.
5) allShortestPaths(String
s, String t). Returns a Collection of Collections of
Strings that when iterated, produces ALL shortest paths from s to t (including the
endpoints) via the IS_CONTINUOUS_WITH predicate. The paths should appear in lexicographical
order, that is, one Collection is before another if the first differing element
between them is smaller in the former (see the sample output in main). For this problem, we will use just the
different myocardial zones as possible input.
Turn in
your work electronically from the “assignments” link on the class webpage. If you did not receive an email receipt, you
did not turn in your project.