341 : 24 Jan 2002 : Scheme wrapup

So, we draw to the end of Scheme. This is unfortunate, because you have scarcely begun to appreciate it. In general, the quality of a language can only be dimly glimpsed in trivial examples; it is only by building significant projects in a language that one can really understand its virtues or vices. In one class, I can't show you a significant project in Scheme, but let's build a small and interesting program that uses most of the features that you have studied so far: a query function for XML-like data.

XML and queries

XML is a markup language for "semi-structured" data. This means that, with minor differences, XML is exactly like a Lisp s-expression. Here's what XML looks like:

 <xml>
   <person>
      <name>
        <firstname>Keunwoo</firstname>
        <lastname>Lee</lastname>
      </name>
      <webpage>http://www.cs.washington.edu/homes/klee/index.html</webpage>
   </person>
   <person>
      <name>
        <firstname>Justin</firstname>
        <lastname>Huff</lastname>
      </name>
      <email>jjhuff@cs</email>
   </person>
 </xml>

It is easy to see that the above is equivalent to this Scheme expression:

 (define atree
   '(xml
     (person
       (name (firstname "Keunwoo")
             (lastname  "Lee"))
       (webpage "http://www.cs.washington.edu/homes/klee/index.html"))
     (person
       (name (firstname "Justin")
             (lastname  "Huff")
       (email "jjhuff@cs")))))

A common task one might perform on XML data is process pieces of it into some other form---for example, a web page (HTML, the language of web pages, is a precursor of XML). This commonly proceeds in two steps:

  1. Extract some subtree of the XML structure, for example the <name> parts of the above XML document.
  2. Massage each of the extracted bits slightly---for example, turn the above names into
        <!-- <p> is the HTML tag for "paragraph", and
             <em> is the HTML tag for "use emphasis" -->
        <p>Name 1: <em>Keunwoo Lee</em></p>
        <p>Name 2: <em>Justin Huff</em></p>
    

But you already know all about "extract" and "foreach"---they're slight variants on the familiar filter and map operations. In this section, we'll pretend that we have a parser that can translate XML into Lisp s-expressions, and we'll develop a little library for doing tree queries and manipulation.

Querying SexpML

I'll call our pseudo-XML language "SexpML". Consider the simplest kind of query you might perform over our s-expression: "Give me all the subtrees whose first elements are marked by this symbol."

I'll call this function extract-subtrees. It will take two parameters: atree, a SexpML tree, and tagname, a symbol whose subtree we should extract. Given the SexpML tree from the previous page, we might get the following:

  (extract-subtrees atree 'name)
  => ((name (firstname "Keunwoo") (lastname "Lee"))
      (name (firstname "Justin") (lastname "Huff")))

  (extract-subtrees atree 'email)
  => ((email "jjhuff@cs")) 

How would you implement extract-subtrees? Let's break it down into pieces:

Performing functions over matching elements

How would you perform some action over all elements returned by extract-subtrees? Can you write a higher-order fuction that encapsulates this idea, for convenience?

Lessons, and extensions

I basically led you by the hand in thinking through this example. The reason is that having you think it through from scratch would take longer than we have in section. However, we can take several lessons away from this solution:

Extending the query engine

We probably won't have time to do extensions to this query engine in class. Some things to think about:

Path queries

Sometimes, you don't want to just extract a tag based on the name; you care about the enclosing tags. For example, consider the expression bound to people in the following:

(define people
  '(sexpml
    (student (name "Alice") (email "alice@cs") (type "grad"))
    (student (name "Bob") (email "bob@cs" "robert@u") (type "ugrad"))
    (faculty (name "Carol") (email "carol@cs"))
    (faculty (name "David") (email "david@cs") (office 225))))

We might want to get the email addresses of only students, or only faculty. We could phrase this query as a "path" expression, which would return the last node on each satisfying path:

  (extract-path people '(sexpml student email))
  => ((email "alice@cs") (email "bob@cs" "robert@u"))

  (extract-path people '(name))
  => ()       ; No path BEGINS with '(name ...)

Further customized map

But how about a more complex, application-specific version of map---one that maps over paths of SexpML expressions? In other words, what if we wanted to perform some series of actions as we descended the SexpML tree? We might want to print one thing when we reach the 'student node, and another when we reach the 'email node.


cse341-webmaster@cs.washington.edu
Last modified: Thu Jan 24 10:29:18 PST 2002