So, we draw to the end of Scheme. This is unfortunate, because you have scarcely begun to appreciate it. In general, the quality of a language can only be dimly glimpsed in trivial examples; it is only by building significant projects in a language that one can really understand its virtues or vices. In one class, I can't show you a significant project in Scheme, but let's build a small and interesting program that uses most of the features that you have studied so far: a query function for XML-like data.
XML is a markup language for "semi-structured" data. This means that, with minor differences, XML is exactly like a Lisp s-expression. Here's what XML looks like:
<xml> <person> <name> <firstname>Keunwoo</firstname> <lastname>Lee</lastname> </name> <webpage>http://www.cs.washington.edu/homes/klee/index.html</webpage> </person> <person> <name> <firstname>Justin</firstname> <lastname>Huff</lastname> </name> <email>jjhuff@cs</email> </person> </xml>
It is easy to see that the above is equivalent to this Scheme expression:
(define atree '(xml (person (name (firstname "Keunwoo") (lastname "Lee")) (webpage "http://www.cs.washington.edu/homes/klee/index.html")) (person (name (firstname "Justin") (lastname "Huff") (email "jjhuff@cs")))))
A common task one might perform on XML data is process pieces of it into some other form---for example, a web page (HTML, the language of web pages, is a precursor of XML). This commonly proceeds in two steps:
<!-- <p> is the HTML tag for "paragraph", and <em> is the HTML tag for "use emphasis" --> <p>Name 1: <em>Keunwoo Lee</em></p> <p>Name 2: <em>Justin Huff</em></p>
But you already know all about "extract" and "foreach"---they're slight variants on the familiar filter and map operations. In this section, we'll pretend that we have a parser that can translate XML into Lisp s-expressions, and we'll develop a little library for doing tree queries and manipulation.
I'll call our pseudo-XML language "SexpML". Consider the simplest kind of query you might perform over our s-expression: "Give me all the subtrees whose first elements are marked by this symbol."
I'll call this function extract-subtrees
. It will
take two parameters: atree, a SexpML tree, and
tagname, a symbol whose subtree we should extract. Given
the SexpML tree from the previous page, we might get the
following:
(extract-subtrees atree 'name) => ((name (firstname "Keunwoo") (lastname "Lee")) (name (firstname "Justin") (lastname "Huff"))) (extract-subtrees atree 'email) => ((email "jjhuff@cs"))
How would you implement extract-subtrees
? Let's
break it down into pieces:
tag-matches
that takes a sexpml tree and a tagname,
determines whether the top-level tag matches the tagname:
(tag-matches '(foo ....) 'bar) ; => #f (tag-matches '(foo ....) 'foo) ; => #t
extract-subtrees
function to
them. However, notice that some children are simply atoms (for
example, the email strings in the above), and some children are
actually subtrees of elements. So, it would be handy to have a
function that, given a list, extracts only the list
elements. Write such a function; call it
take-lists
:
(take-lists '((foo "bar") "bif" (bang (bong "boo")))) => ((foo "bar") (bang (bong "boo"))) (take-lists "hi" "bye") => ()
extract-subtrees
to each of them, but the result
will be a list of lists of nodes. We would like to join
all these lists into one list---in other words, we need to
"hoist" each node in the lists of lists one level up. Write a
function join-lists
that joins the elements in a
list of lists into one list:
(join-lists '((foo bar) (1 2 3) ("hi" (bye)))) => (foo bar 1 2 3 "hi" (bye)) (join-lists '(() (()) (a))) => (() a) ; Notice that an empty list at top-level is simply omitted
extract-subtrees
; here's some pseudocode to get you
started:
extract-subtrees(atree tagname): if atree is nil, return nil, else: let atree-root-tag = first element of atree children = remaining elements of atree child-lists = children that are lists child-results = result of extract-subtree on all child-lists in assemble result for current node with result for child nodes
How would you perform some action over all elements returned by
extract-subtrees
? Can you write a higher-order
fuction that encapsulates this idea, for convenience?
I basically led you by the hand in thinking through this example. The reason is that having you think it through from scratch would take longer than we have in section. However, we can take several lessons away from this solution:
We probably won't have time to do extensions to this query engine in class. Some things to think about:
Sometimes, you don't want to just extract a tag based on the
name; you care about the enclosing tags. For example, consider
the expression bound to people
in the following:
(define people '(sexpml (student (name "Alice") (email "alice@cs") (type "grad")) (student (name "Bob") (email "bob@cs" "robert@u") (type "ugrad")) (faculty (name "Carol") (email "carol@cs")) (faculty (name "David") (email "david@cs") (office 225))))
We might want to get the email addresses of only students, or only faculty. We could phrase this query as a "path" expression, which would return the last node on each satisfying path:
(extract-path people '(sexpml student email)) => ((email "alice@cs") (email "bob@cs" "robert@u")) (extract-path people '(name)) => () ; No path BEGINS with '(name ...)
But how about a more complex, application-specific version of map---one that maps over paths of SexpML expressions? In other words, what if we wanted to perform some series of actions as we descended the SexpML tree? We might want to print one thing when we reach the 'student node, and another when we reach the 'email node.