MiniScheme Interpreter

Due: Friday, January 25, 2002.

1. Overview

For this assignment, you will be writing an interpreter for a language called MiniScheme (which we'll sometimes call MS). MS is an applicative language that looks a lot like Scheme. We call it MiniScheme because it only defines a subset of the Scheme language. An interpreter is just a program that reads input from a user, evaluates it according to some rules, and returns the result of that evaluation. In other words, the main loop for any interpreter looks like this:

while (TRUE) {
  Expr  := ReadExpression();
  Value := Evaluate(Expr, Environment);
  Print(Value);
}

You'll be writing your interpreter in Scheme, which is nice because MiniScheme shares Scheme syntax, so we can take advantage of Scheme's reader and printer to take care of inputting and outputting the MiniScheme expressions. Relieved of the chore of figuring out how to parse the user's input, we can focus entirely on the definition of the function Evaluate, which we'll call EVAL.

2. Introduction to MiniScheme

MiniScheme looks and behaves a lot like Scheme, it's just smaller. It consists of 4 special forms and about a dozen built-in primitive operators. Below, we'll introduce MS by example, highlighting any differences along the way. Unless we say otherwise, the built-in primitive functions and special forms behave just like their counterparts in Scheme.

Data types: booleans, numbers, symbols, lists.

Literal data: MS provides the special form QUOTE and the quote mark (') as syntactic sugar for this form.

'x                     => X
(QUOTE x)              => X
'(1 2 3)               => (1 2 3)
(QUOTE (1 2 3))        => (1 2 3)

List operations: CAR, CDR, and CONS.

(car '(1 2 3))         => 1
(cons 1 (cdr '(2 3)))  => (1 3)
(cons 1 ())            => (1)

S-expression predicates: NULL? and ATOM?. ATOM? doesn't exist in normal Scheme. Essentially, anything that's not a list is an atom, with the wrinkle that the empty list is also treated as an atom.
```
(null? ())             => #t
(null? (cdr '(1)))     => #t
(null? '(1 2))         => #f
(atom? 'a)             => #t
(atom? '(1 2))         => #f
(atom? ())             => #t
```
"Variables": MS provides DEFINE, which binds a name to a value. DEFINE places a binding (from name to value) onto the top frame of the current environment. Frames and environments will be discussed in greater detail below. It is impossible to write a function in MS that has a side-effect on a global variable, because DEFINE will just add a new binding to the frame of the function's exectution, rather than rebind the global name to a new value. This property is shared by functional languages (which are sometimes also said to be non-destructive).
```
(define a 10)
(define b 20)
(define c 'x)
```

Identity testing: EQ? which is like Schemes's EQ?

(eq? a b)     => #f  ;; see the above defines for definitions of a,b,c
(eq? c 'x)    => #t
(eq? 'z 'z)   => #t

Control forms: MS provides just COND.

(cond ((eq? c 'z) 'hello)
      ((eq? c 'x) 'goodbye)
      (#T         'goodnight))      => GOODBYE

User defined functions: LAMBDA is used to define an anonymous function. LAMBDAs actually evaluate to lexical closures, which we'll talk more about later. Together with DEFINE, the function can be given a name. Notice in the examples below that we use exactly the same syntax for binding variable names to their data as we do for binding function names to their function bodies.
```
(define second (lambda (x) (car (cdr x))))

(define plus1 (lambda (x) (+ x 1)))

(define length 
  (lambda (x)
    (cond ((null? x) 0)
	  (#T (+ 1 (length (cdr x)))))))

(define map                 
  (lambda (fun alist)       
    (cond ((null? alist) ())
	  (#T (cons (fun (car alist)) (map fun (cdr alist)))))))
```

Function application: In MS (as in Scheme) the first expression in the list representing the function call is actually evaluated (which looks up the function body associated with the function name).

(second '(10 20 30))                   => 20
(length '(a b c d))                    => 4
(map plus1 '(1 2 3 4))                 => (2 3 4 5)  
(map (lambda (x) (+ 1 x)) '(1 2 3 4))  => (2 3 4 5)  
((lambda (x) (+ 10 x)) 20)             => 30         ;; anonymous application

Operators for numbers: MS provides arithmetic operators (+, -, /, *) and numerical relational operators (=, <, >)
```
(= a 10)        => #T
(< (+ 5 10) 11) => #F
(> (- 10 5) 4)  => #T
```

3. Syntax

The basic syntax for the language is very simple. Everything in MiniScheme is an S-expression. An S-expression is either an atom or a list. More formally:

<s-expression> :== <atom> | <list>

An atom is composed of one or more non-special characters. The special characters are space, tab, newline, open paren, closed paren, and single quote. More formally:

<character>    :== any character not in {<space>, \t, \n, (, ), '}
<atom>         :== <character>+

A list is composed of zero or more S-expressions:

<list> :== (<s-expression>*)

Luckily, you won't have to worry too much about the syntax, because the Scheme reader will take care of parsing the input for us. However, if we were writing the interpreter in a language like C or Ada, we would have to write our own parser and data structure for representing S-expressions.

4. Semantics

Introduction and Definitions

To understand the working of the interpreter, you'll need to understand the relationship between the functions EVAL and APPLY as well as the notion of an environment. The core of our MS is a function called EVAL. EVAL is a function that takes an s-expression and an environment as arguments, evaluates that s-expression with respect to the given environment, and returns the resulting value (which is just another s-expression).

Conceptually, an environment is just a stack of frames, where each frame is a table of symbol-data bindings. When the interpreter starts up, the environment has just one frame (the global frame) which contains initial bindings for the names of the built in functions. Generally speaking, when the user defines new variables, the bindings are placed in the top frame of the environment. This means that when the user defines variables at the interpreter prompt, the bindings go into the global frame. Variables are looked up in an environment by attempting to find the value associated with the variable's name in the top frame. If the binding is not found in that frame, the next frame down is examined and so on until either the binding is found, or no frames remain. If no binding is found, then the variable is undefined.

EVAL takes care of evaluating self-evaluating atoms, looking up variable bindings, and evaluating special forms. If EVAL's argument is a function call, it uses an important helper function called APPLY to apply the function to its arguments.

When APPLY applies a built-in function (like CAR), it simply looks up the primitive Scheme code which implements that function and applies that code to the arguments. Applying a user-defined function is trickier. First, APPLY extends the function's environment, by adding a new frame to the environment of the function's definition. This frame contains bindings between the function's formal parameters and the values that the function is being applied to. Finally, APPLY calls EVAL on the expressions which make up the body of the function and the new, extended environment.

We'll describe the semantics of MiniScheme by giving the specification for the functions EVAL and APPLY. In the discussion that follows, we'll make use of the following terms:

Sexpr: Any S-expression.
Atom: Any atom.
Self-evaluating: Booleans, numbers and the empty list are self-evaluating atoms.
Symbol: Any atom which is not a self-evaluating atom.
Environment, or just env: A stack of frames, where each frame is a table of bindings from symbols to s-expressions (which are either functions or data).

EVAL

This is the specification for the function EVAL:

EVAL(S:s-expression, Env:environment) returns s-expression

Here is the definition of EVAL, more or less in English. If sexpr is:

An atom:

If the atom is self-evaluating, return it. Otherwise, if the atom is (probably) a variable name, so look up the name and return the data it is bound to in env. If no binding is found, signal an error.

A special form:

If sexpr is of the form:

(define symbol sexpr): Add a binding of symbol to EVAL(sexpr, env) to the top frame of env. Return symbol.
(quote sexpr): Return sexpr.
(cond (test1 e11..e1N) (test2 e21..e2N) ... (testM eM1..eMN)): Evaluate the s-expressions eQ1..eQN, and return the value of eQN, where testQ was the first test to evaluate to non-false. If no test evaluated to non-false return false.
(lambda (arg1 arg2 ... argN) sexpr1 sexpr2 ... sexprN): A lambda-expression evaluates to a closure, which is just a coupling of instructions (a function body) and a lexical environment (or scope). Return the list (CLOSURE ((arg1 arg2 ... argN) sexpr1 sexpr2 ... sexprN) env). How this object is used will become clear when you see the definition of APPLY.

Otherwise:

The S-expression must be a function application, which means it must have the form, (sexpr1 sexpr2 ... sexprN). Apply the function as follows: Let Code be the code for the function, which we find by looking it up in the environment, by calling EVAL(sexpr1, env). Let Values be the list that results from recursively evaluating the argument list (sexpr2 ... sexprN) (all of the s-expressions but the first one in the list). Then call APPLY(Code, Values). APPLY is defined below.

APPLY

A function call has the form (sexpr1 sexpr2 ... sexprN). EVAL calls APPLY on EVAL(sexpr1, env) (which should return the code for the function) and the list formed by the results of EVAL called upon each of sexpr2 ... sexprN (which we call the value-list). More formally:

APPLY(Code: s-expression, Value-list: s-expression) returns s-expression

Here is the definition for APPLY, again, more or less in English: If Code equals:

car

Return first(first(Value-list)).

cdr

Return rest(first(Value-list)).

cons

Return the list formed by prepending first(Value-list) onto second(Value-list).

null?

If first(Value-list) is the empty list return true else return false.

atom?

If first(Value-list) is any atom (#t, #f, (), number, symbol) return true, else return false.

eq?

If first(Value-list) is identical (underlying Scheme eq?) to second(Value-list) return true, else return false.

<numeric-relational-operator>

The numeric relational operators are <, >, and =. These are implemented directly by calling the analogous Scheme relational operators on first(Value-list) and second(Value-list).

<arithmetic-operator>

The arithmetic operators supported are +, -, /, and *. These are implemented directly by calling the analogous Scheme arithmetic operators on first(Value-list) and second(Value-list).

exit

Exit the interpreter.

(CLOSURE (formal-parm-list sexpr1 sexpr2 ... sexprN) env)

This is the application of a user defined function. Apply the function as follows:

Extend the environment by creating a new stack frame which contains bindings between positionally associated symbols from formal-parm-list and values from Value-list.
Push this stack frame onto env. We'll call this new environment env'.
Finally, for each statement, sexprJ in the instruction part of the closure call EVAL(sexprJ, env'). Return the result of the evaluation of sexprN.

5. Implementation

Implementing the interpreter consists of implementing EVAL and APPLY, building a data structure for managing the environment, and implementing about a dozen primitive operations (CAR, CDR, CONS, EQ?, NULL?, ATOM?, EXIT, =, +, -, /, *, <, >) and the 4 special forms (QUOTE, LAMBDA, COND, and DEFINE). Luckily, you won't have to implement the whole interpreter from scratch. We provide you with the environment data structure and some basic operations on it. We also provide you with a full implementation of EVAL and a partial implementation of APPLY, as well as the implementation of two of the special forms (DEFINE and LAMBDA) and 7 of the primitive functions (ATOM?, EQ?, EXIT, +, -, =, and <). Your main challenge will be to implement the application of user defined functions. The implementation we provide you with has the functionality of a very brain dead desk calculator.

Development Environment

I've made every effort to make the code "portable" between the various Scheme environments out there. It's been tested on DrScheme, Guile, and MIT Scheme. The preferred platform for the class is DrScheme (or MzScheme, which is the same thing) and we will be grading on that platform. It is your responsibility to check your work on that environment before you turn it in. However, you're free to do your development on other environments. We've provided some support for these environments (see the starter code for information on using other environments).

Solution Path

The following is a possible path to solution. While you are not required to follow it, we think it will be helpful for many of you. Before writing any code, read this document, in its entirety, several times. Then read the provided Scheme code and be sure you have a good feel for how it does what it does. Run the provided Scheme code and experiment with the interpreter. Be sure to test your interpreter thoroughly between each of the following steps. Note that while this is a conceptually tough assignment, you should only have to write about a page of Scheme code.

To help in your testing efforts (once you get the full interpreter working) we've provided some test code. You should be able to cut-and-paste the code into your interpreter. Remember, just because it runs this code doesn't mean that everything is A-OK. You should come up with your own test cases as well...

Hints

Add the rest of the arithmetic and numeric relational operators (/, *, >). This should give you a feel for extending the interpreter's primitive function table. At this point, your interpreter should have the full functionality of a desk calculator.
Implement the special form QUOTE. When you have done this, you'll be able use symbols and lists as literal data. When this is working add the final S-expression predicate (NULL?) and the three list operators (CAR, CDR, CONS). At this point the interpreter should be able to handle simple operations on lists.
Fix APPLY. The version of APPLY we give you only handles primitive function application. To get APPLY working fully, it has to handle user defined functions as well. This will probably be the hardest part of the project. Be sure you understand the description of APPLY (above), especially with regards to applying user defined functions. When you get this finished, you're almost home: You'll be able to apply your own functions and apply them.
A language isn't interesting without control constructs. To this end, you'll have to implement the special form COND. This is somewhat tricky, so again be sure to read and understand the description of COND given above. When you've finished this step, your interpreter should be able to evaluate and properly execute all of the sample code we've provided.

Deliverables

Please stay tuned for details. We'll probably expect some sort of electronic turnin for this assignment.

Extra Credit

There are tons of options for extra credit on this assignment. Before doing any extra credit, be sure your basic interpreter works well. Also, because this course is graded on a curve, the course policy is to reward extra credit with only minimal points. Extra credit is for your own enrichment and fun. Below are some sample projects, roughly in order from easy to hard:

Change the implementation of the arithmetic operators so that they work for an arbitrary number of operands (rather than exactly two).
Add the logical operators AND, OR, and NOT to your interpreter. AND and OR should be short-circuiting, so you should implement them as special forms.
Add the special forms LET and LET* to your interpreter.
Learn how to use hash tables and then change the implementation of tables we've provided to use hash tables rather than lists. This might result in a faster implementation.
Change the scoping model from lexical to dynamic.
Add destructive operators. See the instructor if you want to do this.

dugan@cs.washington.edu (Last Update: )