CSE 341 -- Scheme Basics

These notes cover the following topics:
  1. Scheme Introduction
  2. Primitive data types and operations.
  3. Introduce an old friend, EVAL, a function which forms the heart of any good Scheme interpreter.
  4. Basic control abstractions, including function definition, the if and cond special forms, identity, equality, and boolean logic
  5. Variables, Binding and Scope (Concept)
  6. Type Checking (Concept)

What is Scheme?

Scheme is sometimes called a "List Processing" language or a Symbol Processing language. It's a cousin (nephew?) to Lisp, the Platypus of languages.

Application areas:

Important attributes/features: Lisp was developed in the late 50s by John McCarthy. Many incompatible lisps emerged. In the 80s, a standardized version emerged, called Common Lisp (CL). CL is a kitchen sink language: many many features. Scheme was developed in the mid-70s as a reaction to the shortcomings and inconsistencies of Lisp.


Primitive Scheme data types and operations

Scheme provides us with the following primitive (atomic) data types: Case is generally not significant (except in characters or strings). Here are some of the basic operators that Scheme provides for the above datatypes. Some operators are predicates, that is, they are truth tests. In Scheme, they return #T or #F. Here are some useful examples:

Applying functions, procedures, operators

Ok, so we know the names of a bunch of operators, how do we use them. Scheme provides us with a uniform syntax for invoking functions:
  (function-name arg1 arg2 ... argN)
Examples:
  (+ 2 3)
  (abs -4)
  (- (- 2 3) 8)
  (* 5 (+ 7 10))


EVAL notation; the EVAL function

Users typically interact with Scheme though a read-eval-print loop (REPL). Scheme waits for the user to type an expression, reads it, evaluates it, and prints the return value. EVAL is a function which forms the heart of any Scheme interpreter. It takes a Scheme expression as an argument, and returns the Scheme exression which represents the value of the evaluated expression. Scheme expressions (sometimes called S-Expressions, for Symbolic Expressions) are either lists or atoms. Lists are composed of other S-Expressions (note the recursive definition). Lists are often used to represent function calls, where the list consists of a function name followed by its arguments. However, lists can also used to represent arbitrary collections of data. Atoms are any kind of atomic data (a number, character, string, etc.)
In these notes, we'll generally write:

<S-expression> => <return-value>

when we want to show an S-expression and the evaluation of that
S-expression.  For instance:

  (+ 2 3)      => 5
  (cons 1 ()) => (1)
Evaluation rules:
  1. Numbers, characters, and strings are "self-evaluating atoms", that is, they evaluate to themselves.
  2. Symbols are treated as variable names, and to evaluate them, their bindings are looked up in the environment.
  3. For lists, the first element specifies the function. The remaining elements of the list specify the arguments. Evaluate each of the arguments in the current environment, and call the function on these values. For instance:
      (+ 2 3)             => 5
      (+ (* 3 3) 10)      => 19
      (= 10 (+ 4 6))      =>  #T
    

The List Data Type

Perhaps the single most important built in data type in Scheme is the list. In Scheme, lists are unbounded, possibly heterogeneous collections of data. Examples:
  (X)
  (BILL HILLARY)
  (2 3 5 7 11)
  (2 3 X Y "Zoo" 2.9)
  ()
Box-and-arrow representation of lists:
                 _______________        ________________ 
                |       |       |      |        |       |
                |   o   |   ----|----->|    o   |   X   |
                |___|___|_______|      |____|___|_______|
                    |                       | 	    
                    |                       |	    
                   BILL                  HILLARY   

Notes:

Here are some important functions that operate on lists: Predicates for lists:

Using Symbols (Atoms) and Lists as Data

If we try evaluating (list bill clinton) we'll get an error. Why? Because Scheme will treat the atom bill as a variable name and try to look for it's binding, which it won't find. We therefore need to "quote" the names bill and clinton, which means that we want Scheme to treat them literally. Scheme provides syntax for doing this. The evaluation for quoted objects is that a quoted object evalutes to itself.
'x                    => x
(list bill clinton)   => error! bill is unbound symbol
(list 'bill 'clinton) => (bill clinton)
(bill clinton)        => error! bill is unknown function
'(bill clinton)       => (bill clinton)
(equal? (x) (x))      => error! x is unknown function
(equal? '(x) '(x))    => T
(cons 'x '(y z))      => (x y z)
(cons 'x ())         => (x)
(car '(1 2 3))        => 1
(cdr (cons 1 '(2 3))) => (2 3)
Note that there are (at least) 3 ways to make a list:
  1. '(x y z) => (x y z)
  2. (cons 'x (cons 'y (cons 'z nil))) => (x y z)
  3. (list 'x 'y 'z) => (x y z)

Syntactic Sugar

Internally, quoted symbols and lists are represented using the special function QUOTE. When the reader reads '(A B) it translates this into (QUOTE (A B)), which is then passed onto the evaluator. When EVAL sees an expression of the form (QUOTE S-expr) it just returns S-expr. QUOTE is sometimes called a "special form" because unlike most other Scheme operations, it doesn't evaluate its argument. The quote mark is an example of what is called "syntactic sugar": this is a bit of syntax added by the language designer/implementor to shorten a common expression. In this case, it allows us to not write (quote ...) every time we want to use a literal expression.
  'X          => X
  (QUOTE X)   => X

To quote or not to quote

Knowing when to use a quote is often confusing to beginning Schemers. If you're thinking of using a quote, you need to ask yourself: Does this expression evaluate to something? Literal numbers, characters, strings, and boolean values all evaluate to themselves, and hence don't need to be quoted. Lists and symbols are a different matter, however. Remember that if the interpreter sees a list, it thinks that it's a function call and will evaluate it as such. If you wish to provide a list of literal data, then you must quote it. Similarly, the interpreter treats symbols as if they are names (either for functions or for data). Hence, symbols must be quoted if they are to be used as simple symbols.

Quoting is allows us to represent lists of things literally. Can you imagine a way to do this in C or Java?

(define names '(bob bill jane ian sarah))
(define president 'george)

'names                   => ???
names                    => ???
(cons president names)   => ???
'(cons president names)  => ???
What is the final form in the above example?

Variables/Names/Environments

Variables in Scheme are in some ways similar to variables in Java or C. Variables are "names" for "things". In Scheme, we can define new names as follows:
 
(define foo 17)   ; this declares a variable called foo (if one doesn't exist)
                  ; and makes it refer to 17

foo          => 17

(* 2 foo) => 34
In Scheme, variables are said to by "typeless." You need never declare them to be of some type, and can arbitrarily bind them to objects of different types at any time. Names may be bound either to data or functions (but not both). We'll see an example of function definition later.

The Environment

Names in an environment refer to things. The rule for evaluating names: a symbol evaluates to the value of the variable it names. Scheme supports a style of programming known as functional programming. In this view, programs do not operate by modifying the contents of a variable, rather they compute functions on data, without ever storing the results anywhere. Hence, redefinition does not modify the old definition. We can redefine names like so:
(define foo 17)

foo                    => 17

(define foo 23)

foo                    => 23
Draw a picture of the above environment (in the functional view and in the C/C++ view). Note that in the functional view, the redefinition of foo shadows the old definition.

Scheme has local and global variables. We'll see examples of local variables later, but for now, all of our definitions take place in the global environment.


Functions

In Scheme, lambda is a special form that evaluates to an anonymous function. The general pattern is:
(lambda <formal parameter list> <body>)
Examples
(lambda (x) (* x 2))       => #<procedure>

; applying a function
((lambda (x) (* x 2)) 5)   => 10

; we can give the procedure a name
(define double (lambda (x) (* x 2)))

; applying our function
(double 10)

double          => ?  what do I get here? 
The rule for applying functions is: evaluate the actual parameters (left-to-right), bind the formal parameter names to the actual parameter values, and then evaluate the expressions in the body (in order). Return the value of the final expression evaluated.

We'll often use a little more syntactic sugar to clean up our function definitions:

(define (double x) 
  (* x 2))

(define (plus1 x)
  (+ x 1))

Let and Let*: creating temporary, local variables

We use the special form let to declare and bind local, temporary variables. Example:
(let ((name1 value1)    ;; general form of let
      (name2 value2)
	  ...
      (nameN valueN))
   expression1
   expression2
   ...
   expressionQ)

;; example

(let
  ((x 1)
   (y 2))
 (+ x y))   => 3
The one wrinkle with let is that while the bindings are being created, expressions cannot refer to bindings that have been made previously, within the same let block. (Why? Because Scheme doesn't guarantee an order of evaluation for the forms.) For example, this doesn't work, since x isn't known outside the body:
(let ((x 3)
      (y (+ x 1)))
    (+ x y))
To get around this problem, Scheme provides us with let*:
(let* ((x 3)          
      (y (+ x 1)))
    (+ x y))


Equality and Identity: EQUAL? and EQ?

Scheme provides two primitives for equality and identity testing:
  1. EQ? is pointer comparison. It returns #T iff its arguments literally refer to the same objects in memory. Symbols are unique ('FRED always evaluates to the same object). Two symbols that look the same are EQ?. Two variables that refer to the same object are EQ?. Two strings that look the same are not EQ?. Two characters that look the same are EQ?. Two integers that look the same are EQ. Two floats that look the same are not EQ.
  2. EQUAL?, roughly speaking, returns true if its arguments "look" the same. EQUAL? returns #T iff its arguments are EQ?, or its arguments are lists, whose corresponding elements are EQUAL?.
Two objects that are EQ? are also EQUAL?. Two objects that are EQUAL? are not necessarily EQ?. EQ? is sometimes called an identity comparison and EQUAL? is called an equality comparison. What does this mean?
(define foo '(1 2 3))
(define bar foo)         ; bar refers to the same list as foo

(eq? 'foo 'foo)              => #T
(eq? foo foo)                => #T
(eq? foo bar)                => #T
(eq? foo '(1 2 3))           => #F
(eq? '(1 2 3) '(1 2 3)       => #F
(eq? 10 10)                  => #T   ; (generally, but imp. dependent)
(eq? 10.0 10.0)              => #F   ; (generally, but imp. dependent)
(equal? foo '(1 2 3))        => #T
(equal? '(1 2 3) '(1 2 3))   => #T
Scheme provides = for comparing two numbers, and will coerce one type to another. For example, (equal? 0 0.0) returns #F, but (= 0 0.0) returns #T.


Logical operators

Scheme provides us with at least three useful logical operators: AND, OR, and NOT. The only value in Scheme that "means" false is #f. All other values are interpreted as true (including the empty list).
(AND EXPR1 EXPR2 ... EXPR-N)   
; return true if all the expr's are true
; ... or more precisely, return expr-n if all the expr's evaluate to
; something other than #f.  Otherwise return #f.

(AND (= 2 3) (= 2 2) #T)  => #F

(OR EXPR1 EXPR2 ... EXPR-N)   
; return true if at least one of the expr's is true
; ... or more precisely, return expr-j if expr-j is the first expr that
; evaluates to something other than #f.  Otherwise return nil.

(OR (= 2 3) (= 2 2) #T)  => #T

(OR (= 2 3) 'FRED (= 3 (/ 1 0)))  => 'FRED

(define (single-digit x)
  (and (> x 0) (< x 10)))

(NOT EXPR)   
; return #t if EXPR is false, returns #f otherwise

(NOT (= 10 20))     => #T

(Small) Concept: Short-circuit AND and OR vs evaluating AND and OR

The book calls the short circuit version of AND and OR a conditional; the book calls the evaluating version a boolean function. In Scheme the short-circuit version is the one that is provided, although one can easily write an evaluating version.

Other languages may provide both. For example, in Ada

example:
  1=1 OR 3=(1/0)
versus
  1=1 OR ELSE 3=(1/0)


IF special form

(IF A B C) if A evaluates to true, then the evalutation of B is returned, otherwise the evaluation of C is returned. IF is a special form, like QUOTE, because it doesn't automatically evaluate all of its arguments.
(IF (EQUAL 5 (+ 2 3)) 10 20)     => 10 
(IF (EQUAL 0 1) (/ 1 0) (+ 2 3)) => 5  ; note that the (/ 1 0) is not evaluated

(define (my-max x y))     
  (if (> x y) x y))   

(my-max 10 20)                   => 20 

(define (my-max3 x y z)
  (if (and (> x y) (> x z))
      x
      (if (> y z)
          y
          z)))

COND: a more general conditional

The general form of the COND special form is:
(COND (test1 expr1)
      (test2 expr2)
      ....
      (testn exprn))
As soon as we find a test that is true (not #f), then we evaluate the corresponding expr and return its value. The remaining tests are not evaluated, and all the other expr's are not evaluated. If we fall off the end of the COND without any test succeeding, then the value of the COND is not defined.

(define (weather f)
   (cond ((> f 80) 'too-hot)
         ((> f 60) 'nice)
         ((< f 35) 'too-cold)
         (#t 'typical-seattle)))
;; notice use of #t as a final catch-all test

Concept: Lexical Scoping (aka static scoping)

Scoping refers to the visibility of names in a program. By default, Scheme uses lexical scoping, which means that the names visible at any point of the program (in a given scope) are only the local names and any names visible in the lexically enclosing scope. The toplevel scope is the global scope. Function definitions, Let, and Let* define new static scopes.
(define z 100)

(define (squid foo)
  (clam foo z))              ; foo and z are visible here

(define (test x)
  (let ((y (* 2 x)))  ; x, y, and z are visible inside the body of let
    (+ x y z)))       ; foo is not visible inside here

(define (clam a)
  (+ a x))            ; x is not visible here, so this is an error
   
(define (snake bar)    ; what about z here?
  (let ((z 42))        ; we are making a new definition of z that shadows
    (+ z bar)))        ; the outer one...

Concept: Type Checking

There is no static type checking in Scheme; type checking is done at run time.

There is unfortunately little agreement about many of the terms which are associated with type checking: so I'll give you the definitions we'll use in class, while pointing out some of the other interpretations.

Type safe
Type safe means that the language guarantees that one type can't be incorrectly used in place of another type, in other words, that all expressions are guaranteed to be type consistent. This can be done checked statically, dynamically, or a mixture. Scheme, Ada, Smalltalk, and Prolog are examples of type safe languages. Fortran and C are examples of languages that aren't type safe.
Static/Dynamic type checking
This refers to when type checking occurs. Static type checking takes place during compilation, dynamic type checking takes place at run-time. This distinction makes no guarantee about the completeness of the type checking. Ada is a statically typed, type-safe language, which is to say that the necessary type checking takes place at compile time. (This was true at least of the old Ada, Ada 95 appears to support dynamic binding, so I'm not certain if it can do all checking statically.) C is also statically typed, but not type safe. Scheme is dynamically typed and type safe. (Some texts use statically typed to imply type safe, but I think this is confusing, because we should be separating out the issues of the completeness of the type checking (type safety) from the time when the type checking occurs (dynamic vs. static).)
Strongly typed
There are two common definitions for strongly typed (sorry) -- one is strongly typed = statically typed and type safe; the other is strongly typed = type safe. I will attempt to use the latter definition. Scheme and Ada are strongly typed by this definition.
Weakly typed
Weakly typed means "not type safe". Fortran and C are examples of weakly typed languages.


dugan@cs.washington.edu (Last Update: )