CSE 341: Introduction to Scheme

History of Lisp

Lisp, like ML, is not so much a single language as a set of organizing ideas about language design, embodied in a large number of dialects. Fig. 1 depicts a family DAG with some important members of the Lisp family, along with a few important related languages. Languages inside the rounded box are Lisp dialects.

[Family DAG of Lisp, and related languages, with timeline]
Fig. 1: Family DAG of Lisp, and related languages

Notable events in LISP history:

Since 1992, researchers (mostly (transitive) students of Matthias Felleisen) have defined many useful extensions to R5RS, and implemented them in the PLT Scheme dialect, which forms the heart of the DrScheme programming environment we'll be using in this course. There is something of a cultural divide between the "MIT Scheme" community and the "PLT Scheme" community, which is probably one reason that there are no current plans to add the MzScheme extensions to a "R6RS" standard.

In this class, we'll stay (mostly) inside R5RS.

Scheme philosophy

Programming languages should be designed not by piling feature on top of feature, but by removing the weaknesses and restrictions that make additional features appear necessary. Scheme demonstrates that a very small number of rules for forming expressions, with no restrictions on how they are composed, suffice to form a practical and efficient programming language that is flexible enough to support most of the major programming paradigms in use today.

-- Introduction to the R5RS

Like ML, Scheme is a mostly-functional language with strict evaluation. Unlike ML, Scheme is dynamically typed (a.k.a. latently typed) --- expressions do not have a (meaningful) static type. Instead, values carry a type at runtime, and ill-typed operations are only caught when they are evaluated.

Big ideas in Scheme/Lisp

Scheme syntax and semantics

Informally, Scheme's syntax and semantics can be stated in much less than a page. Its syntax is as follows:

program ::= S*
S ::= atom | list
atom ::= IDENTIFIER | NUMBER | SYMBOL | STRING | ...
list ::= ( S* )

A Scheme program is a list of s-expressions (the term is a contraction of "symbolic expression"). An s-expression is either an atom or a list. Atoms include variable names and literal values. A list is a sequence of whitespace-separated s-expressions inside round parens. (DrScheme also allows square brackets, for variety.)

The semantics are equally simple:

(We've left some advanced stuff out, including syntax macros, (quasi)quoted data, and continuations. In the meantime, one can get quite far with just this.)

Core data types and special forms

Atomic expressions and function calls

3         ; The number 3
"hi"      ; The string "hi"
'a        ; The symbol a 
#t        ; The boolean true
#f        ; The boolean false

(+ 1 2)   ; sum of 1 and 2
(+ 1 2 3) ; sum of 1, 2, and 3; note variable argument count

()             ; The empty list
(cons 1 ())    ; 1 consed onto nil
(list 1 2 3)   ; The list (cons 1 (cons 2 (cons 3 ())))

;; Lists can be heterogeneous
(list 1 "hi" 'a #f)
(list 0 (list 1 2))

;; An "improper" list: tail is not a list.  Lisp programmers
;; use cons cells where an ML programmer would use a 2-tuple.
(cons 1 2)

;; In fact, for Lisp programmers, a cons *is* a pair --- the
;; function for testing cons-ness is called pair?
(pair? (list 1 2 3 4 5))  ; returns #t

(car (list 1 2 3)) ; car is the equivalent of "hd"
(cdr (list 1 2 3)) ; cdr is the equivalent of "tl"

Special forms

"Special forms" are Scheme constructs that control evaluation differently from ordinary function calls. One might fairly ask: Why have special forms at all? Why aren't functions and atoms enough? The argument goes as follows ---

Therefore, as we discuss each special form, we note what must not be evaluated normally.

define

define adds a binding to the environment, or updates the binding if it already exists; it has the syntax:

(define identifier expr)

define has no official return value. It takes an identifier and a value, and binds that identifier to the value in the current environment.

Identifiers in Scheme may have almost any character:

(define hello? "hello?")
(define one+1 (+ 1 1))

In fact, symbols like + that are operators (or otherwise "special") in other languages are actually ordinary names in Scheme. This allows some natural naming conventions; for example, Scheme functions that test conditions end with a question mark:

(number? 5)     ; => #t
(string? 23.5)  ; => #f

What must not be evaluated in define: clearly, the identifier which is about to be bound must not be evaluated, because it is the very name we are trying to define.

lambda

The lambda special form is Scheme's syntax for anonymous function values:

(lambda (args) bodyExpr) ; like ML (fn args => bodyExpr)

For example:

(lambda (x) (+ x 1))

Function values are first-class in Scheme, as in ML; they can be applied to arguments, or bound to names, passed or returned, etc.

((lambda (x) (x + 1)) 2)             ; applies function to 2
(define add (lambda (x y) (+ x y)))  ; binds add to function

The special form name lambda comes from Alonzo Church's lambda calculus, a mathematical formalism for reasoning about computable functions. The lambda calculus actually forms the theoretical foundation for all functional languages, includng ML. We may return to it later.

There is a syntactic sugar for binding functions to names:

(define (fnName args) bodyExpr)

For example:

(define (f x) (+ x 1))  ; equiv. to (define f (lambda (x) (+ x 1)))

What must not be evaluated in lambda: The argument list and body should not be evaluated. It makes no sense to evaluate the arugment list, since the argument list is a list of parameters. Also, function bodies should not be evaluated when a function is defined, but when it is applied.

if and cond

Scheme has both an if form and a more general cond form for conditionals. if has the syntax

(if testExpr thenExpr elseExpr)

If testExpr evaluates to the boolean value #f, then elseExpr is returned; if testExpr evaluates to anything else (a number, a list, whatever) then thenExpr is returned.

Example:

                                     ; Roughly equal to ML:
(define (myMap f aList)              ; fun myMap (f, aList) =
  (if (null? aList)                  ;     if null aList then
      ()                             ;         nil
      (cons (f (car aList))          ;     else (f (hd aList))::
            (myMap f (cdr aList))))) ;         (myMap f (tl aList))

cond has the following syntax:

(cond (test1 bodyExpr1)
      (test2 bodyExpr2)
      ...
      (testN bodyExprN)
      [(else elseExpr)])

Each testI is evaluated in turn. If a test evaluates to true, then its corresponding bodyExpr is returned as the value of the whole expression. Otherwise, the next case is evaluated.

There is an optional else clause, which simply returns the elseExpr.

If none of the cases matches, and there is no else case, then the result of the cond expression is not defined by the standard. (Some Scheme implementations have a special #void value that is returned in this case; this is rather analogous to ML's unit value.)

What must not be evaluated in conditional expressions: A conditional expression would be nearly useless if all branches were always evaluated. A conditional expression must not evaluate the branches not taken.

let, let*, and letrec

Just like ML, Scheme has let-expressions; unlike ML, Scheme has several varieties. The kind that most closely resembles ML's version is let*:

                               ; Roughly equal to ML:
(let* ((name1 expr1)           ; let val name1 = expr1
       (name2 expr2)           ;     val name2 = expr2
       ...                     ;     ...
       (nameN exprN))          ;     val nameN = exprN
       bodyExpr)               ; in bodyExpr end

let resembles let*, and has identical syntax and semantics except that each binding's expression is evaluated in the environment preceding all the let-bindings. In other words, rather than evaluating the bindings "in sequence", let does them "in parallel".

For example, consider the following:

(define a 1)

(let ((a 2)
      (b (+ a 4)))
     (+ a b))  ; returns 6

(let* ((a 2)
       (b (+ a 4)))
     (+ a b))  ; returns 7

letrec allows the bindings to be recursive amongst each other. This is useful for defining mutually recursive local functions; here's a silly example:

(define (mutualFac v)
    (letrec ((a (lambda (x)
                  (if (<= x 0)
                      1
                      (* x (b (- x 1))))))
             (b (lamda (x) (a x))))
       (b v)))

Here, a must be in scope for b's body, and b must also be in scope for a's body.

What must not be evaluated in the let forms: All the names to be bound in the body expression must not be evaluated, because they are names that we are defining. Also, the body expression must be evaluated after all the bound expressions, and in the environment created by the bindings.

Some lawyerly/historical details

car? cdr?

Historical note: car and cdr come from the names of the assembly language instructions used to implement them in 1959 on the IBM 704: CAR was the opcode for "contents of the address register", and CDR was the opcode for "contents of the decrement register". The terminology is horrible but has stayed with us to this day.

Lisp actually has a whole family of convenience functions that use the "a = head" and "d = tail" convention to access "deeper" parts of lists-within-lists:

(cadr (list 1 2 3 4))  ; the car of the cdr = 2
(caddr (list 1 2 3 4)) ; (car (cdr (cdr (list 1 2 3 4)))) ;

; the cdr of the car = (2) (i.e., the list containing only 2)
(cdar (list (list 1 2) (list 3 4)))

; (car (cdr (car (cdr (list ...))))) = 4
(cadadr (list (list 1 2) (list 3 4)))

For these functions, read off the a's and d's right-to-left from the "r".

Mutation

All defines implicitly bind their results to ref cells, and dereference is implicit. To mutate a binding, use set!. For example, type the following into your interpreter; the results will be hard to understand, unless you understand this fact:

                ; Rough ML equivalent:
(define x 5)    ; val x = ref 5
(define y x)    ; val y = ref (!x)
(set! x 4)      ; x := 4
x               ; !x;
y               ; !y;

More generally, almost all data structures in Scheme are mutable, including cons cells (which is nearly everything, since nearly everything in a Scheme program is a list):

> (define x (list 1 2))
> (set-cdr! x (list 3 4))
> x
(1 3 4)

What is a symbol? How does it differ from a string?

A symbol is (roughly) a "canonical, immutable string".

There can be many different string objects that have the same value --- in other words, two strings can have the same value but different identities. This could be accomplished, for example, by creating two different strings, then updating them (Scheme strings can be mutated via string-set! and other functions) to have the same value. However, two symbols can only have the same value if they have the same identity.

Another way of saying this is that when the Scheme runtime is about to create a symbol value, it always checks to see if there exists (somewhere in the runtime) another symbol with that same name. If such a symbol already exists, then Scheme returns a pointer to the old symbol --- the "canonical" value --- rather than creating a new symbol value.

Yet another way of saying this is: for any name, there is one symbol (now and forever) whose value is that name.

Symbols are generally used to represent variable names. We'll see examples of such use this later, when we see how to represent programs as data in Scheme.

Functions with variable numbers of arguments

We noted before that Lisp functions can contain variable numbers of arguments; Scheme has two syntaxes, inherited from previous Lisp dialects, for defining functions with variable numbers of arguments. The first is as follows:

(lambda argListName bodyExpr)

This binds the entire argument list to argListName in the body expression. Since the argument to a function is always passed as a list, and lists can have any length, this allows.

The second form is a little uglier:

(lambda (regularArgs . restArg) bodyExpr)

Here, regularArgs is a list of arguments arg1, ..., argN that is to be bound to the first N arguments to the function, and restArg will be bound to the remaining arguments.

Example: The following function multiples all the elements 2..n in a n-length list by the first element in the list.

> (define allTimesFirst
    (lambda (first . rest)
      (map (lambda (x) (* first x)) rest)))

> (allTimesFirst 5 1 4 7 20)
(5 20 35 100)

Here is an alternate definition:

(define allTimesFirst2
  (lambda aList
    (map (lambda (x) (* (car aList) x)) (cdr aList))))

Suggested exercises