[ ^ CSE 341, Winter 2004 home page | Lectures index ]

CSE 341: Introduction to Scheme

History of Lisp

Lisp, like ML, is not so much a single language as a set of organizing ideas about language design, embodied in a large number of dialects. Fig. 1 depicts a family DAG with some important members of the Lisp family, along with a few important related languages. Languages inside the rounded box are Lisp dialects.

[Family DAG of Lisp, and related languages, with timeline]

Fig. 1: Family DAG of Lisp, and related languages

Notable events in LISP history:

Late 1950's: LISP (LISt Processor) invented by John McCarthy et al. at MIT for the purpose of AI research.
1960: landmark paper introducing LISP to the wider world: "Recursive Functions of Symbolic Expressions and Their Computation by Machine"
1975: Scheme invented by Guy Steele and Gerald Sussman.
Mid-1980's: Non-Scheme Lisp dialects (principally Maclisp and Interlisp) standardized into "Common Lisp".
Late-1980's: Common Lisp Object System adds OOP constructs to Common Lisp.
1998: R⁵RS (5th revised report on Scheme) defines the Scheme standard still in use today.

Since 1992, researchers (mostly (transitive) students of Matthias Felleisen) have defined many useful extensions to R⁵RS, and implemented them in the PLT Scheme dialect, which forms the heart of the DrScheme programming environment we'll be using in this course. There is something of a cultural divide between the "MIT Scheme" community and the "PLT Scheme" community, which is probably one reason that there are no current plans to add the MzScheme extensions to a "R⁶RS" standard.

In this class, we'll stay (mostly) inside R⁵RS.

Scheme philosophy

Programming languages should be designed not by piling feature on top of feature, but by removing the weaknesses and restrictions that make additional features appear necessary. Scheme demonstrates that a very small number of rules for forming expressions, with no restrictions on how they are composed, suffice to form a practical and efficient programming language that is flexible enough to support most of the major programming paradigms in use today.
-- Introduction to the R⁵RS

Like ML, Scheme is a mostly-functional language with strict evaluation. Unlike ML, Scheme is dynamically typed (a.k.a. latently typed) --- expressions do not have a (meaningful) static type. Instead, values carry a type at runtime, and ill-typed operations are only caught when they are evaluated.

Big ideas in Scheme/Lisp

Many of the major good ideas we've seen in ML originated in dialects of Lisp: garbage collection, uniform by-reference data model, strong typing, lists as a pervasive data structure, heavy use of higher-order, anonymous, and recursive functions.
Dynamic typing: mentioned above; we'll talk about this later in some detail.
Programs are data: Lisp programs are themselves lists, and Lisp has special syntaxes for representing "unevaluated list data". It is therefore easy to represent Lisp programs as Lisp data, and so Lisp lends itself to metaprogramming (writing programs that manipulate other programs) easy.
Simplicity and regularity above all: Scheme demonstrates how small a language can be while remaining powerful and expressive. Scheme has almost no syntax and hardly more semantics. Its reference manual fits in 50 printed pages.
Hygienic syntax macros: The Scheme syntax is extensible through a macro system that (unlike, for example, C macros) is relatively safe and has a clean semantics that fits in well with the rest of the language.

Scheme syntax and semantics

Informally, Scheme's syntax and semantics can be stated in much less than a page. Its syntax is as follows:

program ::= S*
S ::= atom | list
atom ::= IDENTIFIER | NUMBER | SYMBOL | STRING | ...
list ::= ( S* )

A Scheme program is a list of s-expressions (the term is a contraction of "symbolic expression"). An s-expression is either an atom or a list. Atoms include variable names and literal values. A list is a sequence of whitespace-separated s-expressions inside round parens. (DrScheme also allows square brackets, for variety.)

The semantics are equally simple:

If an expression is an atom, return the value of the atom.
If an expression is a syntactic list L, then:
- If the first element of L is the name of a special form, then follow the rules for the special form.
- Otherwise, evaluate the first element of L, and apply its value as a function to a list L2 whose elements are the results of evaluating the remaining elements of L.

(We've left some advanced stuff out, including syntax macros, (quasi)quoted data, and continuations. In the meantime, one can get quite far with just this.)

Core data types and special forms

Atomic expressions and function calls

3         ; The number 3
"hi"      ; The string "hi"
'a        ; The symbol a 
#t        ; The boolean true
#f        ; The boolean false

(+ 1 2)   ; sum of 1 and 2
(+ 1 2 3) ; sum of 1, 2, and 3; note variable argument count

()             ; The empty list
(cons 1 ())    ; 1 consed onto nil
(list 1 2 3)   ; The list (cons 1 (cons 2 (cons 3 ())))

;; Lists can be heterogeneous
(list 1 "hi" 'a #f)
(list 0 (list 1 2))

;; An "improper" list: tail is not a list.  Lisp programmers
;; use cons cells where an ML programmer would use a 2-tuple.
(cons 1 2)

;; In fact, for Lisp programmers, a cons *is* a pair --- the
;; function for testing cons-ness is called pair?
(pair? (list 1 2 3 4 5))  ; returns #t

(car (list 1 2 3)) ; car is the equivalent of "hd"
(cdr (list 1 2 3)) ; cdr is the equivalent of "tl"

Special forms

"Special forms" are Scheme constructs that control evaluation differently from ordinary function calls. One might fairly ask: Why have special forms at all? Why aren't functions and atoms enough? The argument goes as follows ---

Note the semantics of function call (in Scheme or any strict language, e.g. ML):
1. The function value and all the other arguments are evaluated.
2. Then, the function value is applied to the arguments.
This is not appropriate for constructs whose proper operation requires that some parts of the construct must not be evaluated as ordinary expressions.

Therefore, as we discuss each special form, we note what must not be evaluated normally.

`define`

define adds a binding to the environment, or updates the binding if it already exists; it has the syntax:

(define identifier expr)

define has no official return value. It takes an identifier and a value, and binds that identifier to the value in the current environment.

Identifiers in Scheme may have almost any character:

(define hello? "hello?")
(define one+1 (+ 1 1))

In fact, symbols like + that are operators (or otherwise "special") in other languages are actually ordinary names in Scheme. This allows some natural naming conventions; for example, Scheme functions that test conditions end with a question mark:

(number? 5)     ; => #t
(string? 23.5)  ; => #f

What must not be evaluated in define: clearly, the identifier which is about to be bound must not be evaluated, because it is the very name we are trying to define.

`lambda`

The lambda special form is Scheme's syntax for anonymous function values:

(lambda (args) bodyExpr) ; like ML (fn args => bodyExpr)

For example:

(lambda (x) (+ x 1))

Function values are first-class in Scheme, as in ML; they can be applied to arguments, or bound to names, passed or returned, etc.

((lambda (x) (x + 1)) 2)             ; applies function to 2
(define add (lambda (x y) (+ x y)))  ; binds add to function

The special form name lambda comes from Alonzo Church's lambda calculus, a mathematical formalism for reasoning about computable functions. The lambda calculus actually forms the theoretical foundation for all functional languages, includng ML. We may return to it later.

There is a syntactic sugar for binding functions to names:

(define (fnName args) bodyExpr)

For example:

(define (f x) (+ x 1))  ; equiv. to (define f (lambda (x) (+ x 1)))

What must not be evaluated in lambda: The argument list and body should not be evaluated. It makes no sense to evaluate the arugment list, since the argument list is a list of parameters. Also, function bodies should not be evaluated when a function is defined, but when it is applied.

`if` and `cond`

Scheme has both an if form and a more general cond form for conditionals. if has the syntax

(if testExpr thenExpr elseExpr)

If testExpr evaluates to the boolean value #f, then elseExpr is returned; if testExpr evaluates to anything else (a number, a list, whatever) then thenExpr is returned.

Example:

                                     ; Roughly equal to ML:
(define (myMap f aList)              ; fun myMap (f, aList) =
  (if (null? aList)                  ;     if null aList then
      ()                             ;         nil
      (cons (f (car aList))          ;     else (f (hd aList))::
            (myMap f (cdr aList))))) ;         (myMap f (tl aList))

cond has the following syntax:

(cond (test1 bodyExpr1)
      (test2 bodyExpr2)
      ...
      (testN bodyExprN)
      [(else elseExpr)])

Each testI is evaluated in turn. If a test evaluates to true, then its corresponding bodyExpr is returned as the value of the whole expression. Otherwise, the next case is evaluated.

There is an optional else clause, which simply returns the elseExpr.

If none of the cases matches, and there is no else case, then the result of the cond expression is not defined by the standard. (Some Scheme implementations have a special #void value that is returned in this case; this is rather analogous to ML's unit value.)

What must not be evaluated in conditional expressions: A conditional expression would be nearly useless if all branches were always evaluated. A conditional expression must not evaluate the branches not taken.

`let`, `let*`, and `letrec`

Just like ML, Scheme has let-expressions; unlike ML, Scheme has several varieties. The kind that most closely resembles ML's version is let*:

                               ; Roughly equal to ML:
(let* ((name1 expr1)           ; let val name1 = expr1
       (name2 expr2)           ;     val name2 = expr2
       ...                     ;     ...
       (nameN exprN))          ;     val nameN = exprN
       bodyExpr)               ; in bodyExpr end

let resembles let*, and has identical syntax and semantics except that each binding's expression is evaluated in the environment preceding all the let-bindings. In other words, rather than evaluating the bindings "in sequence", let does them "in parallel".

For example, consider the following:

(define a 1)

(let ((a 2)
      (b (+ a 4)))
     (+ a b))  ; returns 6

(let* ((a 2)
       (b (+ a 4)))
     (+ a b))  ; returns 7

letrec allows the bindings to be recursive amongst each other. This is useful for defining mutually recursive local functions; here's a silly example:

(define (mutualFac v)
    (letrec ((a (lambda (x)
                  (if (<= x 0)
                      1
                      (* x (b (- x 1))))))
             (b (lamda (x) (a x))))
       (b v)))

Here, a must be in scope for b's body, and b must also be in scope for a's body.

What must not be evaluated in the let forms: All the names to be bound in the body expression must not be evaluated, because they are names that we are defining. Also, the body expression must be evaluated after all the bound expressions, and in the environment created by the bindings.

Some lawyerly/historical details

car? cdr?

Historical note: car and cdr come from the names of the assembly language instructions used to implement them in 1959 on the IBM 704: CAR was the opcode for "contents of the address register", and CDR was the opcode for "contents of the decrement register". The terminology is horrible but has stayed with us to this day.

Lisp actually has a whole family of convenience functions that use the "a = head" and "d = tail" convention to access "deeper" parts of lists-within-lists:

(cadr (list 1 2 3 4))  ; the car of the cdr = 2
(caddr (list 1 2 3 4)) ; (car (cdr (cdr (list 1 2 3 4)))) ;

; the cdr of the car = (2) (i.e., the list containing only 2)
(cdar (list (list 1 2) (list 3 4)))

; (car (cdr (car (cdr (list ...))))) = 4
(cadadr (list (list 1 2) (list 3 4)))

For these functions, read off the a's and d's right-to-left from the "r".

Mutation

All defines implicitly bind their results to ref cells, and dereference is implicit. To mutate a binding, use set!. For example, type the following into your interpreter; the results will be hard to understand, unless you understand this fact:

                ; Rough ML equivalent:
(define x 5)    ; val x = ref 5
(define y x)    ; val y = ref (!x)
(set! x 4)      ; x := 4
x               ; !x;
y               ; !y;

More generally, almost all data structures in Scheme are mutable, including cons cells (which is nearly everything, since nearly everything in a Scheme program is a list):

> (define x (list 1 2))
> (set-cdr! x (list 3 4))
> x
(1 3 4)

What is a symbol? How does it differ from a string?

A symbol is (roughly) a "canonical, immutable string".

There can be many different string objects that have the same value --- in other words, two strings can have the same value but different identities. This could be accomplished, for example, by creating two different strings, then updating them (Scheme strings can be mutated via string-set! and other functions) to have the same value. However, two symbols can only have the same value if they have the same identity.

Another way of saying this is that when the Scheme runtime is about to create a symbol value, it always checks to see if there exists (somewhere in the runtime) another symbol with that same name. If such a symbol already exists, then Scheme returns a pointer to the old symbol --- the "canonical" value --- rather than creating a new symbol value.

Yet another way of saying this is: for any name, there is one symbol (now and forever) whose value is that name.

Symbols are generally used to represent variable names. We'll see examples of such use this later, when we see how to represent programs as data in Scheme.

Functions with variable numbers of arguments

We noted before that Lisp functions can contain variable numbers of arguments; Scheme has two syntaxes, inherited from previous Lisp dialects, for defining functions with variable numbers of arguments. The first is as follows:

(lambda argListName bodyExpr)

This binds the entire argument list to argListName in the body expression. Since the argument to a function is always passed as a list, and lists can have any length, this allows.

The second form is a little uglier:

(lambda (regularArgs . restArg) bodyExpr)

Here, regularArgs is a list of arguments arg1, ..., argN that is to be bound to the first N arguments to the function, and restArg will be bound to the remaining arguments.

Example: The following function multiples all the elements 2..n in a n-length list by the first element in the list.

> (define allTimesFirst
    (lambda (first . rest)
      (map (lambda (x) (* first x)) rest)))

> (allTimesFirst 5 1 4 7 20)
(5 20 35 100)

Here is an alternate definition:

(define allTimesFirst2
  (lambda aList
    (map (lambda (x) (* (car aList) x)) (cdr aList))))

Suggested exercises

Write filter and foldl in Scheme.
Write a function that, given a list of (possibly heterogeneous) values, produces a list containing all permutations of that list.