CSE 505 Lecture Notes:

Implementation of Functional Languages

Sept. 30, 1994


Introduction

There are two classical implementation techniques for functional languages:
  • Landin's SECD machine
  • combinators S = stack E = environment C = code vector D = dump The SECD machine is a little stack-based machine with a stack S. E is the current binding environment. C contains the code. D holds other older contexts (which are restored after we're done evalutating a function).

    We could also use the pure Lisp part of a traditional Lisp interpreter.

    The SECD machine uses applicative order evaluation. To get normal order evaluation, pass an anonymous function rather than a value as an argument. Evaluate the function whenever the parameter value is needed. (This is the same as call-by-name in Algol-60 -- in Algol-60 this anonymous function is called a thunk.) Improve efficiency by replacing the anonymous function call by its value after it is invoked -- this gives lazy evaluation.

    We can use lazy evaluation on an ad hoc basis (e.g. for "if"), or for all arguments. If for all arguments, we can improve efficiency using strictness analysis.


    Strictness

    A function is strict in an argument x if x must be evaluated whenever the function's value is needed.

    plus a b is strict in both arguments const k x is strict in k, but not in x if x y z is strict in x, but not in y and z We can do some analysis and sometimes decide if a user-defined function is strict in some of its arguments.

    Examples:

    double x ;; strict in x squid n x = if n=0 then x+1 else x-n ;; strict in n and x crab n x = if n=0 then x+1 else n ;; strict in n but not x If a function is strict in an argument x, it is correct to pass x by value (even with normal order evaluation semantics).


    Combinators

    Turner's landmark 1979 paper describes an alternative implementation strategy, using combinator graphs

    Loosely, a combinator is a function with no free variables or constants. (See for example Hindly and Seldin, "Introduction to Combinators and Lambda-Calculus for a formal treatment.)

    Schonfinkel (1924) first described combinators. They provide a way of avoiding variables altogether in lambda calculus. (Variables cause a lot of complications in describing the rewrite rules, principally because of the need to avoid accidental collisions of variable names.)

    Basic Combinators

    S, K, I combinators (see Turner page 34) are a basic set S f g x = f x (g x) K x y = x I x = x (In fact we only need S and K, since SKK=I) rewrite functions such as f x = e using the abstraction operation f = [x] e

    Extensionality condition:

    ([x] E) x = E Notice that [x] E is similar to (lambda (x) E), but [x] E is a textual, compile time operation

    Rules for abstracting x:

    [x] (E1 E2) = S ([x] E1) ([x] E2) [x] x = I [x] y = K y, where y is a constant or variable and x not equal to y

    Proofs that these rules are correct:

    take LHS of first rule, and apply it to x: [x] (E1 E2) x = E1 E2 (by extensionality) RHS of first rule: S ([x] E1) ([x] E2) x = (([x] E1) x) (([x] E2) x) = E1 E2 (extensionality, twice) LHS of second rule, applied to x: ([x] x) x = x (extensionality) RHS of second rule I x = x (definition of I) LHS of third rule: ([x] y) x = y (extensionality) RHS of third rule K y x = y (definition of K) Show succ example from Turner paper.

    Add additional combinators B and C to get more compact graphs; show succ example with new combinators

    Another example:

    avg x y = (x+y)/2 replace + and / by curried versions: avg x y = divide (plus x y) 2 abstract y (treating x as a constant) avg x = [y] (divide (plus x y) 2) = S ([y] (divide (plus x y))) ([y] 2) = S (S ([y] divide) ([y] (plus x y))) ([y] 2) = S (S ([y] divide) ([y] (plus x y))) (K 2) = C (S ([y] divide) ([y] (plus x y))) 2 = C (S (K divide) ([y] (plus x y))) 2 = C (S (K divide) (S ([y] (plus x)) ([y] y))) 2 = C (B divide (S ([y] (plus x)) ([y] y))) 2 = C (B divide (S (S ([y] plus) ([y] x)) ([y] y))) 2 = C (B divide (S (S (K plus) (K x)) I)) 2 = C (B divide (S (K (plus x)) I)) 2 = C (B divide (plus x)) 2 avg = [x] (C (B divide (plus x)) 2) avg = [x] (C (B divide (plus x)) 2) = S ([x] (C (B divide (plus x)))) ([x] 2) = S ([x] (C (B divide (plus x)))) (K 2) = C ([x] (C (B divide (plus x)))) 2 = C (S ([x] C) ([x] (B divide (plus x)))) 2 = C (S (K C) ([x] (B divide (plus x)))) 2 = C (B C ([x] (B divide (plus x)))) 2 = C (B C (S ([x] B divide) ([x] (plus x)))) 2 = C (B C (S (S ([x] B) ([x] divide)) ([x] (plus x)))) 2 = C (B C (S (S (K B) (K divide)) ([x] (plus x)))) 2 = C (B C (S (K B divide) ([x] (plus x)))) 2 = C (B C (S (K B divide) ([x] (plus x)))) 2 = C (B C (B (B divide) ([x] (plus x)))) 2 = C (B C (B (B divide) (S ([x] plus) ([x] x)))) 2 = C (B C (B (B divide) (S (K plus) I))) 2 = C (B C (B (B divide) plus)) 2 ugh!

    Pattern Matching Combinators

    Turner also introduces combinators for pattern matching Y combinator -- finds fixedpoints Y f = f (Y f) used in local recursions E where x = ... x ...

    Example:

    ham = 1: my_merge ham2 (my_merge ham3 ham5) where ham2 = map (*2) ham ham3 = map (*3) ham ham5 = map (*5) ham

    S-K reduction machine

    The S-K reduction machine graph rewriting machine to interpret combinator code

    Miranda uses normal order evaluation -- go down left branch of the tree until a combinator is found. Apply it to the args, and replace that node with the result.

    Example 1

    [show suc 2 where suc x = 1+x example]

    Example2: Self-optimizing code

    code is a built-in function that maps characters to numbers (ascii codes) e.g. code '0' = 48 makedigit n = code n - code '0' after the first evaluation the expression (code '0') will be REPLACED by 48

    Example 3

    foldr op r = f where f [] = r f (a:x) = op a(f x) sum = foldr (+) 0 after the first evaluation of sum, it will be rewritten to the equivalent of sum [] = 0 sum (a:x) = a+sum x

    Later developments

    lambda lifting supercombinators (combinators are abstracted from user's program) G machine strictness analysis compilation to conventional single-processor architectures compilation for conventional parallel hardware special-purpose hardware for parallel graph reduction lambda lifting: if we have a local function definition with free variables, we can move it to the top level by adding additional arguments that are then applied to the free variables example: f x = e where e contains a free variable y define a new function f' y x = e at the top level Replace calls to f by f' y supercombinators: combinators are abstracted from user's program Johnnson et al, Chalmers University this technique is used in e.g. one of the Haskell implementations