Lisp, like ML, is not so much a single language as a set of organizing ideas about language design, embodied in a large number of dialects. Fig. 1 depicts a family DAG with some important members of the Lisp family, along with a few important related languages. Languages inside the rounded box are Lisp dialects.
Notable events in LISP history:
Since 1992, researchers (mostly (transitive) students of Matthias Felleisen) have defined many useful extensions to R5RS, and implemented them in the PLT Scheme dialect, which forms the heart of the DrScheme programming environment we'll be using in this course. There is something of a cultural divide between the "MIT Scheme" community and the "PLT Scheme" community, which is probably one reason that there are no current plans to add the MzScheme extensions to a "R6RS" standard.
In this class, we'll stay (mostly) inside R5RS.
Programming languages should be designed not by piling feature on top of feature, but by removing the weaknesses and restrictions that make additional features appear necessary. Scheme demonstrates that a very small number of rules for forming expressions, with no restrictions on how they are composed, suffice to form a practical and efficient programming language that is flexible enough to support most of the major programming paradigms in use today.-- Introduction to the R5RS
Like ML, Scheme is a mostly-functional language with strict evaluation. Unlike ML, Scheme is dynamically typed (a.k.a. latently typed) --- expressions do not have a (meaningful) static type. Instead, values carry a type at runtime, and ill-typed operations are only caught when they are evaluated.
Informally, Scheme's syntax and semantics can be stated in much less than a page. Its syntax is as follows:
program ::= S* S ::= atom | list atom ::= IDENTIFIER | NUMBER | SYMBOL | STRING | ... list ::= ( S* )
A Scheme program is a list of s-expressions (the term is a contraction of "symbolic expression"). An s-expression is either an atom or a list. Atoms include variable names and literal values. A list is a sequence of whitespace-separated s-expressions inside round parens. (DrScheme also allows square brackets, for variety.)
The semantics are equally simple:
(We've left some advanced stuff out, including syntax macros, (quasi)quoted data, and continuations. In the meantime, one can get quite far with just this.)
3 ; The number 3 "hi" ; The string "hi" 'a ; The symbol a #t ; The boolean true #f ; The boolean false (+ 1 2) ; sum of 1 and 2 (+ 1 2 3) ; sum of 1, 2, and 3; note variable argument count () ; The empty list (cons 1 ()) ; 1 consed onto nil (list 1 2 3) ; The list (cons 1 (cons 2 (cons 3 ()))) ;; Lists can be heterogeneous (list 1 "hi" 'a #f) (list 0 (list 1 2)) ;; An "improper" list: tail is not a list. Lisp programmers ;; use cons cells where an ML programmer would use a 2-tuple. (cons 1 2) ;; In fact, for Lisp programmers, a cons *is* a pair --- the ;; function for testing cons-ness is called pair? (pair? (list 1 2 3 4 5)) ; returns #t (car (list 1 2 3)) ; car is the equivalent of "hd" (cdr (list 1 2 3)) ; cdr is the equivalent of "tl"
"Special forms" are Scheme constructs that control evaluation differently from ordinary function calls. One might fairly ask: Why have special forms at all? Why aren't functions and atoms enough? The argument goes as follows ---
Therefore, as we discuss each special form, we note what must not be evaluated normally.
define
define
adds a binding to the environment, or
updates the binding if it already exists; it has the
syntax:
(define identifier expr)
define
has no official return value. It takes an
identifier and a value, and binds that identifier to the value in
the current environment.
Identifiers in Scheme may have almost any character:
(define hello? "hello?") (define one+1 (+ 1 1))
In fact, symbols like +
that are operators (or
otherwise "special") in other languages are actually ordinary
names in Scheme. This allows some natural naming conventions; for
example, Scheme functions that test conditions end with a question
mark:
(number? 5) ; => #t (string? 23.5) ; => #f
What must not be evaluated in
define
: clearly, the identifier which is
about to be bound must not be evaluated, because it is the very
name we are trying to define.
lambda
The lambda
special form is Scheme's syntax for
anonymous function values:
(lambda (args) bodyExpr) ; like ML (fn args => bodyExpr)
For example:
(lambda (x) (+ x 1))
Function values are first-class in Scheme, as in ML; they can be applied to arguments, or bound to names, passed or returned, etc.
((lambda (x) (x + 1)) 2) ; applies function to 2 (define add (lambda (x y) (+ x y))) ; binds add to function
The special form name lambda
comes from Alonzo
Church's lambda calculus, a mathematical
formalism for reasoning about computable functions. The lambda
calculus actually forms the theoretical foundation for all
functional languages, includng ML. We may return to it later.
There is a syntactic sugar for binding functions to names:
(define (fnName args) bodyExpr)
For example:
(define (f x) (+ x 1)) ; equiv. to (define f (lambda (x) (+ x 1)))
What must not be evaluated in
lambda
: The argument list and body should
not be evaluated. It makes no sense to evaluate the arugment
list, since the argument list is a list of parameters. Also,
function bodies should not be evaluated when a function is
defined, but when it is applied.
if
and cond
Scheme has both an if
form and a more general
cond
form for conditionals. if
has the
syntax
(if testExpr thenExpr elseExpr)
If testExpr evaluates to the boolean value
#f
, then elseExpr is returned; if
testExpr evaluates to anything else (a number, a
list, whatever) then thenExpr is returned.
Example:
; Roughly equal to ML: (define (myMap f aList) ; fun myMap (f, aList) = (if (null? aList) ; if null aList then () ; nil (cons (f (car aList)) ; else (f (hd aList)):: (myMap f (cdr aList))))) ; (myMap f (tl aList))
cond
has the following syntax:
(cond (test1 bodyExpr1) (test2 bodyExpr2) ... (testN bodyExprN) [(else elseExpr)])
Each testI
is evaluated in turn. If a test
evaluates to true, then its corresponding
bodyExpr
is returned as the value of the whole
expression. Otherwise, the next case is evaluated.
There is an optional else
clause, which simply
returns the elseExpr
.
If none of the cases matches, and there is no else
case, then the result of the cond
expression is not
defined by the standard. (Some Scheme implementations have a
special #void
value that is returned in this case;
this is rather analogous to ML's unit
value.)
What must not be evaluated in conditional expressions: A conditional expression would be nearly useless if all branches were always evaluated. A conditional expression must not evaluate the branches not taken.
let
, let*
, and
letrec
Just like ML, Scheme has let-expressions; unlike ML, Scheme has
several varieties. The kind that most closely resembles ML's
version is let*
:
; Roughly equal to ML: (let* ((name1 expr1) ; let val name1 = expr1 (name2 expr2) ; val name2 = expr2 ... ; ... (nameN exprN)) ; val nameN = exprN bodyExpr) ; in bodyExpr end
let
resembles let*
, and has identical
syntax and semantics except that each binding's
expression is evaluated in the environment preceding all
the let
-bindings. In other words, rather than
evaluating the bindings "in sequence", let
does them
"in parallel".
For example, consider the following:
(define a 1) (let ((a 2) (b (+ a 4))) (+ a b)) ; returns 6 (let* ((a 2) (b (+ a 4))) (+ a b)) ; returns 7
letrec
allows the bindings to be recursive amongst
each other. This is useful for defining mutually recursive local
functions; here's a silly example:
(define (mutualFac v) (letrec ((a (lambda (x) (if (<= x 0) 1 (* x (b (- x 1)))))) (b (lamda (x) (a x)))) (b v)))
Here, a
must be in scope for b
's
body, and b
must also be in scope for
a
's body.
What must not be evaluated in the let
forms: All the names to be bound in the body expression
must not be evaluated, because they are names that we are
defining. Also, the body expression must be evaluated after all
the bound expressions, and in the environment created by the
bindings.
Historical note: car
and cdr
come
from the names of the assembly language instructions used to
implement them in 1959 on the IBM 704: CAR was the opcode for
"contents of the address register", and CDR was the opcode for
"contents of the decrement register". The terminology is horrible
but has stayed with us to this day.
Lisp actually has a whole family of convenience functions that
use the "a
= head" and "d
= tail"
convention to access "deeper" parts of lists-within-lists:
(cadr (list 1 2 3 4)) ; the car of the cdr = 2 (caddr (list 1 2 3 4)) ; (car (cdr (cdr (list 1 2 3 4)))) ; ; the cdr of the car = (2) (i.e., the list containing only 2) (cdar (list (list 1 2) (list 3 4))) ; (car (cdr (car (cdr (list ...))))) = 4 (cadadr (list (list 1 2) (list 3 4)))
For these functions, read off the a
's and
d
's right-to-left from the "r".
All define
s implicitly bind their results to
ref cells, and dereference is implicit. To mutate a binding,
use set!
. For example, type the following into your
interpreter; the results will be hard to understand, unless you
understand this fact:
; Rough ML equivalent: (define x 5) ; val x = ref 5 (define y x) ; val y = ref (!x) (set! x 4) ; x := 4 x ; !x; y ; !y;
More generally, almost all data structures in Scheme are mutable, including cons cells (which is nearly everything, since nearly everything in a Scheme program is a list):
> (define x (list 1 2)) > (set-cdr! x (list 3 4)) > x (1 3 4)
A symbol is (roughly) a "canonical, immutable string".
There can be many different string objects that have the same
value --- in other words, two strings can have the same
value but different identities. This could be
accomplished, for example, by creating two different strings, then
updating them (Scheme strings can be mutated via
string-set!
and other functions) to have the same
value. However, two symbols can only have the same value if they
have the same identity.
Another way of saying this is that when the Scheme runtime is about to create a symbol value, it always checks to see if there exists (somewhere in the runtime) another symbol with that same name. If such a symbol already exists, then Scheme returns a pointer to the old symbol --- the "canonical" value --- rather than creating a new symbol value.
Yet another way of saying this is: for any name, there is one symbol (now and forever) whose value is that name.
Symbols are generally used to represent variable names. We'll see examples of such use this later, when we see how to represent programs as data in Scheme.
We noted before that Lisp functions can contain variable numbers of arguments; Scheme has two syntaxes, inherited from previous Lisp dialects, for defining functions with variable numbers of arguments. The first is as follows:
(lambda argListName bodyExpr)
This binds the entire argument list to argListName in the body expression. Since the argument to a function is always passed as a list, and lists can have any length, this allows.
The second form is a little uglier:
(lambda (regularArgs . restArg) bodyExpr)
Here, regularArgs is a list of arguments arg1,
..., argN
that is to be bound to the first N
arguments to the function, and restArg will be bound to the
remaining arguments.
Example: The following function multiples all the elements 2..n in a n-length list by the first element in the list.
> (define allTimesFirst (lambda (first . rest) (map (lambda (x) (* first x)) rest))) > (allTimesFirst 5 1 4 7 20) (5 20 35 100)
Here is an alternate definition:
(define allTimesFirst2 (lambda aList (map (lambda (x) (* (car aList) x)) (cdr aList))))
filter
and foldl
in
Scheme.