CSE341 Notes for Monday, 5/17/10

I began by discussing how to define a function with a variable number of arguments in Scheme. This concept is sometimes referred to as varargs (short for "variable arguments"). We looked at the following java program:
        // short program to demonstrate variable number of arguments for a method.
        
        import java.util.*;
        
        public class Varargs {
            public static void main(String[] args) {
        	printAll(3, 8, 19.4, "hello");
        	printAll(74.5, "hi", 19);
        	List<Integer> data = Arrays.asList(3, 19, 24, 79, 202);
        	System.out.println(data);
            }
        
            public static void printAll(Object... data) {
        	for (int i = 0; i < data.length; i++)
        	    System.out.println(data[i]);
        	System.out.println();
            }
        }
It defines a method printAll that takes an indefinite number of objects. Java provides the arguments in the form of an array. It also includes a call on the method Arrays.asList that takes a variable number of arguments.

In Scheme you can either indicate exactly how many parameters a function has and it will enforce that number or you can indicate that it has an indefinite number of parameters (zero or more). You choose the one or the other by either including parentheses or not when you define a function using the lambda form, as in:

        (define f1 (lambda (n) (+ n 1)))
        (define f2 (lambda n (display n)))
We saw that f1 required exaclty one parameter while f2 would take an indefinite number of parameters. In that case, the parameters are provided to f2 as a list.

Then I said that I wanted to look at various examples that involved delayed evaluation of code. To explore this, we began with the following function that displays a message every time it is called:

        (define (work x)
          (display "work called: ")
          (display x)
          (newline)
          (+ x 1))
For example, we have discussed the fact that if does not evaluate its third and fourth arguments unless it has to. Notice how in this call we see a message only for the call on work with the parameter 3:

        > (if (< 2 3) (work 3) (work 4))
        work called: 3
        4
While in this case we see just a call on work with the parameter 4:

        > (if (> 2 3) (work 3) (work 4))
        work called: 4
        5
We considered what happens when you write a function with a similar set of parameters to if (a test, a first value and a second value):

        (define (test t e1 e2)
          (if t
              (+ e1 e1 e1)
              e2))
As we've discussed, when you write a simple function like this, Scheme will fully evaluate both parameters once before executing the function, so it's not surprising that we see both messages when we call it:

        > (test (< 2 3) (work 3) (work 4))
        work called: 3
        work called: 4
        12
        > (test (> 2 3) (work 3) (work 4))
        work called: 3
        work called: 4
        5
The first variation I discussed was the idea of delaying evaluation by making these parameters thunks. A thunk is a function of zero arguments (a lambda) that is used to wrap up an expression to delay its evaluation. So for the test function, instead of taking parameters e1 and e2 that are already evaluated, we assume they are thunks (lambdas of zero arguments) that need to be called:

        (define (test2 t e1 e2)
          (if t
              (+ (e1) (e1) (e1))
              (e2)))
In calling the test2 function, we now have to wrap up each expression in a lambda:

        > (test2 (< 2 3) (lambda () (work 3)) (lambda () (work 4)))
        work called: 3
        work called: 3
        work called: 3
        12
        > (test2 (> 2 3) (lambda () (work 3)) (lambda () (work 4)))
        work called: 4
        5
Notice that we have duplicated one of the properties that if has in that we only evaluate the parameter that we end up being interested in. But notice that for e1, we end up evaluating it three different times.

Another way to approach this is to use the built in functions called delay and force. They operate on a data type known as a "promise". For example:

        > (define x (delay (+ 2 2)))
        > x
        #<struct:promise:x>
This says to delay the evaluation of the expression until later. You then request the evaluation by calling force:

        > (force x)
        4
You might think that after calling this that x has now been set to 4, but that's not true. x still refers to the promise:

        > x
        #<struct:promise!4>
But you can always get the value again by calling force and, as we'll see, Scheme uses memoization to ensure that the code is not evaluated multiple times:

        > (force x)
        4
We rewrote the test function to use calls on force:

        (define (test3 t e1 e2)
          (if t
              (+ (force e1) (force e1) (force e1))
              (force e2)))
When we called it, we now had to wrap up the expressions in a call on delay, as in:

        > (test3 (< 2 3) (delay (work 3)) (delay (work 4)))
        work called: 3
        12
        > (test3 (> 2 3) (delay (work 3)) (delay (work 4)))
        work called: 4
        5
Notice that with the combination of force and delay, we have the same properties that we had with if. We only evaluate arguments if we need them and we only evaluate them once, even if the result is referred to multiple times.

I said that I wanted to finish up our discussion of Scheme by exploring some of the areas of the language that we hadn't had time to explore in detail.

I briefly discussed some of the applications people have made of Scheme and its predecessor, Lisp. Richard Stallman wrote the first version of emacs in Lisp and much of emacs is still written in Lisp. Stallman was the founder of the GNU project and an early pioneer of the free software movement. For example, here are some entries from my emacs initialization file (called .emacs):

        ;; set some defaults for text mode
        (setq default-major-mode 'text-mode)
        (setq initial-major-mode 'text-mode)
        (setq default-fill-column 79)
        
        ;; inhibit startup message
        (setq inhibit-startup-message 1)
        
        (setq load-path (cons "/Users/stuartreges/" load-path))
The setq function in Lisp is like the define procedure in Scheme.

Then we spent some time talking about the idea of macros. In an editor like emacs, you can define a macro that performs a sequence of editing commands. I personally use emacs macros all the time to get my work done. The commands for defining macros in emacs are fairly simple:

Emacs Command Description
CTRL-X ( begin defining a macro
CTRL-X ) end macro definition
CTRL-X e execute the macro

In programming macros are used to modify code. For example, the original C and C++ implementations did not include a boolean type, but you could "simulate" a boolean type by including a file called bool.h that included macro definitions that said that the word "bool" should be replaced with "int" and the word "false" should be replaced with "0".

It can be tricky to get macros right. For example, if you were going to change all occurrences of "true" to "1", you wouldn't want to change "construed" to "cons1d" and you wouldn't want to replace the word if it appeared in a quoted string or a comment. Any reasonable macro mechanism would be careful to avoid such obvious mistakes.

But there are lots of other tricky cases. Consider, for example, this C macro:

        #define SUM(x, y) x*x + y*y
The idea is that you might want to often find the sum of squares of two values and rather than use a function, you might want to have the expansion happen in-line in the code itself. I asked people where this might cause a problem and someone pointed out that you might pass a sum like "x + 1":

        z = SUM(x + 1, y);
This would be expaned by the macro into:

        z = x + 1*x + 1 + y*y;
That's not what you would expect. How do we fix it? By adding parentheses to the macro:

        #define SUM(x, y) (x)*(x) + (y)*(y)
Now our code expands to:

        z = (x + 1)*(x + 1) + (y)*(y);
When you write macros in C and C++, you often have to include a lot of parentheses. We saw that even this level of parentheses is not enough. For example, this expression:

        z = c * SUM(a, b);
is expanded into:

        z = c * (a)*(a) + (b)*(b);
So the macro should really be:

        #define SUM(x, y) ((x)*(x) + (y)*(y))
Even this isn't enough in some cases where someone might pass a value like x++. The situation gets even worse if you try to introduce a variable declaration because you might declare a variable like "foo" that shadows another variable called "foo". So people tend to declare very bizarre variable names and hope that there is no conflict.

The point is that macros can be tricky and the macro facility in C and C++ is very weak. Scheme, on the other hand, has a very powerful macro facility that doesn't require you to worry about adding extra parentheses or using weird variable names. The Scheme mechanism provides what programming language researchers refer to as hygienic macros.


Stuart Reges
Last modified: Tue May 18 09:18:40 PDT 2010