MiniScheme Interpreter
Due: Friday, January 25, 2002.
1. Overview
For this assignment, you will be writing an interpreter for a language
called MiniScheme (which we'll sometimes call MS). MS is an
applicative language that looks a lot like Scheme. We call it
MiniScheme because it only defines a subset of the Scheme language.
An interpreter is just a program that reads input from a user,
evaluates it according to some rules, and returns the result of that
evaluation. In other words, the main loop for any interpreter looks
like this:
while (TRUE) {
Expr := ReadExpression();
Value := Evaluate(Expr, Environment);
Print(Value);
}
You'll be writing your interpreter in Scheme, which is nice
because MiniScheme shares Scheme syntax, so we can take advantage of
Scheme's reader and printer to take care of inputting and outputting the
MiniScheme expressions. Relieved of the chore of figuring out how to
parse the user's input, we can focus entirely on the definition of the
function Evaluate
, which we'll call EVAL.
2. Introduction to MiniScheme
MiniScheme looks and behaves a lot like Scheme, it's just smaller. It
consists of 4 special forms and about a dozen built-in primitive
operators. Below, we'll introduce MS by example, highlighting any
differences along the way. Unless we say otherwise, the built-in
primitive functions and special forms behave just like their
counterparts in Scheme.
- Data types: booleans, numbers, symbols, lists.
- Literal data: MS provides the special form
QUOTE and the quote mark (') as syntactic sugar for this form.
'x => X
(QUOTE x) => X
'(1 2 3) => (1 2 3)
(QUOTE (1 2 3)) => (1 2 3)
- List operations: CAR, CDR, and CONS.
(car '(1 2 3)) => 1
(cons 1 (cdr '(2 3))) => (1 3)
(cons 1 ()) => (1)
- S-expression predicates: NULL? and ATOM?. ATOM? doesn't exist in
normal Scheme. Essentially, anything that's not a list is an atom, with
the wrinkle that the empty list is also treated as an atom.
(null? ()) => #t
(null? (cdr '(1))) => #t
(null? '(1 2)) => #f
(atom? 'a) => #t
(atom? '(1 2)) => #f
(atom? ()) => #t
- "Variables": MS provides DEFINE, which binds a name to a value.
DEFINE places a binding (from name to value) onto the top frame of the
current environment. Frames and environments will be discussed in
greater detail below. It is impossible to write a function in MS that
has a side-effect on a global variable, because DEFINE will just add a
new binding to the frame of the function's exectution, rather than
rebind the global name to a new value. This property is shared by
functional languages (which are sometimes also said to be
non-destructive).
(define a 10)
(define b 20)
(define c 'x)
- Identity testing: EQ? which is like Schemes's EQ?
(eq? a b) => #f ;; see the above defines for definitions of a,b,c
(eq? c 'x) => #t
(eq? 'z 'z) => #t
- Control forms: MS provides just COND.
(cond ((eq? c 'z) 'hello)
((eq? c 'x) 'goodbye)
(#T 'goodnight)) => GOODBYE
- User defined functions: LAMBDA is used to define an anonymous
function. LAMBDAs actually evaluate to lexical closures,
which we'll talk more about later. Together with DEFINE, the function
can be given a name. Notice in the examples below that we use exactly
the same syntax for binding variable names to their data as we do for
binding function names to their function bodies.
(define second (lambda (x) (car (cdr x))))
(define plus1 (lambda (x) (+ x 1)))
(define length
(lambda (x)
(cond ((null? x) 0)
(#T (+ 1 (length (cdr x)))))))
(define map
(lambda (fun alist)
(cond ((null? alist) ())
(#T (cons (fun (car alist)) (map fun (cdr alist)))))))
- Function application: In MS (as in Scheme) the first expression in
the list representing the function call is actually evaluated (which
looks up the function body associated with the function name).
(second '(10 20 30)) => 20
(length '(a b c d)) => 4
(map plus1 '(1 2 3 4)) => (2 3 4 5)
(map (lambda (x) (+ 1 x)) '(1 2 3 4)) => (2 3 4 5)
((lambda (x) (+ 10 x)) 20) => 30 ;; anonymous application
- Operators for numbers: MS provides arithmetic operators
(+, -, /, *) and numerical
relational operators (=, <, >)
(= a 10) => #T
(< (+ 5 10) 11) => #F
(> (- 10 5) 4) => #T
3. Syntax
The basic syntax for the language is very simple. Everything in
MiniScheme is an S-expression. An S-expression is either an atom or a
list. More formally:
<s-expression> :== <atom> | <list>
An atom is composed of one or more non-special
characters. The special characters are space, tab, newline,
open paren, closed paren, and single quote. More formally:
<character> :== any character not in {<space>, \t, \n, (, ), '}
<atom> :== <character>+
A list is composed of zero or more S-expressions:
<list> :== (<s-expression>*)
Luckily, you won't have to worry too much about the syntax, because
the Scheme reader will take care of parsing the input for us. However,
if we were writing the interpreter in a language like C or Ada, we
would have to write our own parser and data structure for representing
S-expressions.
4. Semantics
Introduction and Definitions
To understand the working of the interpreter, you'll need to
understand the relationship between the functions EVAL and APPLY as
well as the notion of an environment. The core of our MS is a
function called EVAL. EVAL is a function that takes an s-expression
and an environment as arguments, evaluates that s-expression with
respect to the given environment, and returns the resulting value
(which is just another s-expression).
Conceptually, an environment is just a stack of frames, where each
frame is a table of symbol-data bindings. When the interpreter starts
up, the environment has just one frame (the global frame) which
contains initial bindings for the names of the built in functions.
Generally speaking, when the user defines new variables, the bindings
are placed in the top frame of the environment. This means that when
the user defines variables at the interpreter prompt, the
bindings go into the global frame. Variables are looked up in an
environment by attempting to find the value associated with the
variable's name in the top frame. If the binding is not found in that
frame, the next frame down is examined and so on until either the
binding is found, or no frames remain. If no binding is found, then
the variable is undefined.
EVAL takes care of evaluating self-evaluating atoms, looking up
variable bindings, and evaluating special forms. If EVAL's argument
is a function call, it uses an important helper function called APPLY
to apply the function to its arguments.
When APPLY applies a built-in function (like CAR), it simply looks up
the primitive Scheme code which implements that function and applies
that code to the arguments. Applying a user-defined
function is trickier. First, APPLY extends the function's
environment, by adding a new frame to the environment of the
function's definition. This frame contains bindings between the
function's formal parameters and the values that the function is being
applied to. Finally, APPLY calls EVAL on the expressions which make
up the body of the function and the new, extended environment.
We'll describe the semantics of MiniScheme by giving the specification
for the functions EVAL and APPLY. In the discussion that follows,
we'll make use of the following terms:
- Sexpr
- Any S-expression.
- Atom
- Any atom.
- Self-evaluating
- Booleans, numbers and the empty list are self-evaluating atoms.
- Symbol
- Any atom which is not a self-evaluating atom.
- Environment, or just env
- A stack of frames, where
each frame is a table of bindings from symbols to s-expressions (which
are either functions or data).
EVAL
This is the specification for the function EVAL:
EVAL(S:s-expression, Env:environment) returns s-expression
Here is the definition of EVAL, more or less in English.
If sexpr is:
- An atom:
-
If the atom is self-evaluating, return it. Otherwise, if the atom is
(probably) a variable name, so look up the name and return the data it is
bound to in env. If no binding is found, signal an error.
- A special form:
-
If sexpr is of the form:
- (define symbol sexpr)
- Add a binding of symbol to
EVAL(sexpr, env) to the top frame of env.
Return symbol.
- (quote sexpr)
- Return sexpr.
- (cond (test1 e11..e1N) (test2 e21..e2N) ... (testM eM1..eMN))
- Evaluate the s-expressions
eQ1..eQN
,
and return the value of eQN
, where
testQ was the first test to
evaluate to non-false. If no test evaluated to non-false return
false.
- (lambda (arg1 arg2 ... argN) sexpr1 sexpr2 ... sexprN)
- A lambda-expression evaluates to a closure, which is just a coupling
of instructions (a function body) and a lexical environment (or
scope). Return the list
(CLOSURE ((arg1 arg2 ... argN) sexpr1 sexpr2 ... sexprN) env).
How this object is used will become clear when you see the
definition of APPLY.
- Otherwise:
- The S-expression must be a function application, which means it
must have the form, (sexpr1 sexpr2 ... sexprN). Apply the function as
follows: Let
Code
be the code for the function, which we
find by looking it up in the environment, by calling
EVAL(sexpr1, env)
. Let Values
be the list
that results from recursively evaluating the argument list
(sexpr2 ... sexprN)
(all of the s-expressions but the
first one in the list). Then call APPLY(Code, Values)
.
APPLY is defined below.
APPLY
A function call has the form (sexpr1 sexpr2 ... sexprN). EVAL
calls APPLY on EVAL(sexpr1, env) (which should return the
code for the function) and the list formed by the results of EVAL
called upon each of sexpr2 ... sexprN (which we call the
value-list). More formally:
APPLY(Code: s-expression, Value-list: s-expression) returns s-expression
Here is the definition for APPLY, again, more or less in English:
If Code equals:
- car
- Return first(first(Value-list)).
- cdr
- Return rest(first(Value-list)).
- cons
- Return the list formed by prepending
first(Value-list) onto second(Value-list).
- null?
- If first(Value-list) is the empty list return true else
return false.
- atom?
- If first(Value-list) is any atom (#t, #f, (), number, symbol)
return true, else return false.
- eq?
- If first(Value-list) is identical (underlying Scheme eq?) to
second(Value-list) return true, else return false.
- <numeric-relational-operator>
- The numeric relational operators are <, >, and =. These are
implemented directly by calling the analogous Scheme relational
operators on first(Value-list) and second(Value-list).
- <arithmetic-operator>
- The arithmetic operators supported are +, -, /, and *. These
are implemented directly by calling the analogous Scheme arithmetic
operators on first(Value-list) and second(Value-list).
- exit
- Exit the interpreter.
- (CLOSURE (formal-parm-list sexpr1 sexpr2 ... sexprN) env)
-
This is the application of a user defined function. Apply the function
as follows:
- Extend the environment by creating a new stack frame which
contains bindings between
positionally associated symbols from formal-parm-list
and values from Value-list.
- Push this stack frame onto
env
. We'll call this new
environment env'.
- Finally, for each statement,
sexprJ in the instruction part of the closure
call EVAL(sexprJ, env'). Return the result of the
evaluation of
sexprN
.
5. Implementation
Implementing the interpreter consists of implementing EVAL and APPLY,
building a data structure for managing the environment, and
implementing about a dozen primitive operations (CAR, CDR, CONS, EQ?, NULL?,
ATOM?, EXIT, =, +, -, /, *, <, >) and the 4 special forms (QUOTE,
LAMBDA, COND, and DEFINE). Luckily, you won't have to implement the
whole interpreter from scratch. We provide you with the environment
data structure and some basic operations on it. We also provide you
with a full implementation of EVAL and a partial implementation of
APPLY, as well as the implementation of two of the special forms
(DEFINE and LAMBDA) and 7 of the primitive functions (ATOM?, EQ?,
EXIT, +, -, =, and <). Your main challenge will be to implement the
application of user defined functions. The implementation we provide
you with has the functionality of a very brain dead desk
calculator.
Development Environment
I've made every effort to make the code "portable" between the various
Scheme environments out there. It's been tested on DrScheme, Guile,
and MIT Scheme. The preferred platform for the class is DrScheme (or
MzScheme, which is the same thing) and we will be grading on that
platform. It is your responsibility to check your work on that
environment before you turn it in. However, you're free to do your
development on other environments. We've provided some support for
these environments (see the starter code for information on using
other environments).
Solution Path
The following is a possible path to solution. While you are not
required to follow it, we think it will be helpful for many of you.
Before writing any code, read this document, in its entirety, several
times. Then read the
provided Scheme code
and be sure you have a good
feel for how it does what it does. Run the provided Scheme code and
experiment with the interpreter. Be sure to test your interpreter
thoroughly between each of the following steps. Note that while
this is a conceptually tough assignment, you should only have to
write about a page of Scheme code.
To help in your testing efforts (once you get the full interpreter
working) we've provided some test code. You
should be able to cut-and-paste the code into your interpreter.
Remember, just because it runs this code doesn't mean that everything
is A-OK. You should come up with your own test cases as well...
Hints
- Add the rest of the arithmetic and numeric relational operators
(/, *, >). This should give you a feel for extending the interpreter's
primitive function table. At this point, your interpreter
should have the full functionality of a desk calculator.
- Implement the special form QUOTE. When you have done this, you'll
be able use symbols and lists as literal data. When this is working
add the final S-expression predicate (NULL?) and the three
list operators (CAR, CDR, CONS). At this point the interpreter
should be able to handle simple operations on lists.
- Fix APPLY. The version of APPLY we give you only handles
primitive function application. To get APPLY working fully, it
has to handle user defined functions as well. This will probably
be the hardest part of the project. Be sure you understand the
description of APPLY (above), especially with regards to applying
user defined functions. When you get this finished, you're almost home:
You'll be able to apply your own functions and apply them.
- A language isn't interesting without control constructs. To this
end, you'll have to implement the special form COND. This is somewhat
tricky, so again be sure to read and understand the description of
COND given above. When you've finished this step, your interpreter
should be able to evaluate and properly execute all of the sample
code we've provided.
Deliverables
Please stay tuned for details. We'll probably expect some sort of
electronic turnin for this assignment.
Extra Credit
There are tons of options for extra credit on this assignment.
Before doing any extra credit, be sure your basic interpreter works well.
Also, because this course is graded on a curve, the course policy is to
reward extra credit with only minimal points. Extra credit is for your
own enrichment and fun. Below are some sample projects, roughly
in order from easy to hard:
- Change the implementation of the arithmetic operators so that
they work for an arbitrary number of operands (rather than exactly two).
- Add the logical operators AND, OR, and NOT to your
interpreter. AND and OR should be short-circuiting, so you should
implement them as special forms.
- Add the special forms LET and LET* to your interpreter.
- Learn how to use hash tables and then change the implementation of
tables we've provided to use hash tables rather than lists. This might
result in a faster implementation.
- Change the scoping model from lexical to dynamic.
- Add destructive operators. See the instructor if you want
to do this.
dugan@cs.washington.edu
(Last Update:
)