More Lambda Calculus

Overview

In the previous lecture we saw how the lambda calculus with "Call-by-Value" (CBV) semantics is actually pretty expressive despite its small size and simple semantics. Here we'll show an even smaller semantics which is just as powerful, then show a bigger semantics which gives up on determinism but enables proving many more programs equivalent, and finally we'll dig deeper on substitution which is really the main engine of the lambda calculus.

Review

For reference, here's our CBV semantics from last time:

  e ::= x             // variable
     |  e e           // function application
     |  λ x . e       // function definition
     
  v ::= λ x . e       // values
  
             e[v/x] = e'
  ---------------------------------- call
         (λ x . e) v  -->   e'

             e1 --> e1'
  ---------------------------------- appL
          e1 e2 --> e1' e2

             e --> e'
  ---------------------------------- call
           v e --> v e'

Notice how CBV requires grinding the left down to a value and then the right down to a value before we actually "perform a call" by carrying out the substitution.

One last bit we didn't mention is that you can encode let bindings in lambda calculus simply as:
IMAGE

Alternate Semantics: Call-by-Name (CBN)

Consider this alternate take on semantics for the lambda calculus:

             e[e/x] = e'
  ---------------------------------- subst
         (λ x . e) e  -->   e'

             e1 --> e1'
  ---------------------------------- crunch
          e1 e2 --> e1' e2

Encoded in Lean, this looks like:

IMAGE

Can programs get stuck in this semantics? Yes, consider the program x.

Is it still deterministic? Yes, and we can prove it in Lean:

IMAGE

How does this semantics compare to CBV? It turns out that CBN diverges strictly less often than CBV, because in CBN we only evaluate the arguments to functions on demand. On the other hand, we may take more steps in CBN to the same result we would have in CBV because we re-evaluate arguments if they are copied into multiple places in the function body.

Aside: There is another strategy known as "call by need" or "lazy" evaluation where we only evaluate an argument the first time it is used. We then cache the result so that if it is ever used again we avoid re-evaluating it. This is how Haskell and other lazy programming langugaes work. The upside of this approach is that, for pure code, lazy evaluation is asymptotically no slower than CBV. The downside is that side effects like I/O get a little more tricky which is why Haskellers are always making noise about monads.

As an instructive example, consider how these programs might work in CBV, CBN, and lazy semantics:

let x = factorial 20 in
x + x
let ones = Y (λ x. 1 :: x) in
take 10 ones

One nice advantage of CBN and lazy styles is that it's much easier to roll your own control flow constructs because you don't have to worry about thunking potentially unused arguments that might diverge / have side effects.

Full Semantics and Church Rosser

Consider this set extended semantics:

             e[e/x] = e'
  ---------------------------------- subst
         (λ x . e) e  -->   e'

             e1 --> e1'
  ---------------------------------- appL
          e1 e2 --> e1' e2

             e2 --> e2'
  ---------------------------------- appR
          e1 e2 --> e1 e2'

             e  -->  e'
  ---------------------------------- abs
        λ x . e --> λ x . e'

It allows all steps that you can take under CBV or CBN, but also permits additional steps. The order in which you evaluate expressions is called a "reduction strategy". We've thrown out order and given up on determinism though... why is that a bummer for a programming language?

This semantics does provide some advantages:

The Church-Rosser Theorem gives us a pretty amazing fact about the above semantics though:

  If e -->* e1 and e -->* e2, then
  there exists e3 such that e1  -->* e3 and e2 -->* e3.

This means that no strategy gets "painted into a corner". To actually get this proof through, we will also need alpha conversion and eta expansion as discussed below. If we have those plus the ability to run the rules "backward" (rewrite right side to left), we can show anything about a term's behavior that is true (this is called completeness).

Cool result: under natural denotational semantics (i.e., treating lambdas as mathematical functions), e and e' have the same denotation ([[e]] = [[e]]) if and only if we can find a rewrite under the above rules that shows e -->* e'. This ensures are rules are sound, meaning they respect the semantics. This is particularly nice because the natural denotational semantics for lambda calculus are not very convenient or natural for many uses in compilers etc. (requires denoting to a set D that is isomorphic to D -> D).

Thus to decide if two lambda calculus programs are equivalent, we just need to search for a sequence of rewrites that turns one into the other. Is this an algorithm for deciding program equivalence? Unfortunately, no because we never when we can safely stop the search if we haven't yet found a rewrite.

Substitution

Substitution can be surprisingly tricky. Informally, e[e'/x] just means "replace each x in e with e'. For example:

  x[(λy. y) / x] = λy. y
  
  (λy. y x)[(λz. z) / x] = λy. y (λz. z)
  
  (x x)[(λx. x x)/x] = (λx. x x)(λx. x x)

Folks get substitution wrong. Let's try to formally define it (THIS IS WRONG):

  ---------------------------------- subst_var_same
             x[e/x] = e

             x =/= y
  ---------------------------------- subst_var_diff
            x[e/y] = x

     e1[e/x] = e1'    e2[e/x] = e2'
  ---------------------------------- subst_app
        (e1 e2)[e/x] = e1' e2'

            e1[e/x] = e1'
  ---------------------------------- subst_abs
        (λy. e1)[e/x] = λy. e1'

Essentially this is replacing every "leaf" occurrence of the variable x with e. Unfortunately, it is WRONG for nested functions when the inner function binds the variable as the outer function (this is called shadowing). Consider:

IMAGE

Attempt #2 (still wrong in general, but less so):

  ---------------------------------- subst_var_same
             x[e/x] = e

             x =/= y
  ---------------------------------- subst_var_diff
            x[e/y] = x

    e1[e/x] = e1'    e2[e/x] = e2'
  ---------------------------------- subst_app
        (e1 e2)[e/x] = e1' e2'

  
  ---------------------------------- subst_abs_same
        (λx. e1)[e/x] = λx. e1

      e1[e/x] = e1'   x =/= y
  ---------------------------------- subst_abs_diff
        (λy. e1)[e/x] = λy. e1'

This fixes the problem above because it respects shadowing: stop when you git a binder for the variable you're substituting. However, it's still a bit limited: if a function body e uses an "outer" y, these rules will capture it.

IMAGE

(Note: this doesn't happen in CBV/CBN if there are no free variables, but can arise under full reduction.)

The problem is that we need to be careful with free variables, those variables occuring in a term which are not bound by ("under") a lambda. We can write a function to compute free variables as:

  FV(x)     = {x}
  FV(e1 e2) = FV(e1) U FV(e2)
  FV(λx. e) = FV(e) - {x}

In Lean, we've defined this a predicate which tell us if a particular variable is free in a given term:

IMAGE

Given the definition of free variables, we can fix substitution a bit more.

Attemp #3 (better, but still not quite right):

  ---------------------------------- subst_var_same
             x[e/x] = e

             x =/= y
  ---------------------------------- subst_var_diff
            x[e/y] = x

    e1[e/x] = e1'    e2[e/x] = e2'
  ---------------------------------- subst_app
        (e1 e2)[e/x] = e1' e2'

  
  ---------------------------------- subst_abs_same
        (λx. e1)[e/x] = λx. e1

  e1[e/x] = e1'   x =/= y   f ∉ FV(e)
  ---------------------------------- subst_abs_diff
        (λy. e1)[e/x] = λy. e1'

This will indeed avoid variable capture, but now we can get stuck during substitution (no rule will apply). The solution is to allow implicit renaming. The above definition of substitution is only partial if y "accidentally" is used as a binder. Since the whole point of lambda calculus is the structure of the term, not which variable names happen to be used, we will allow ourselves to change them when needed. This requries renaming a bound variable and all of its occurrences. By renaming, the x =/= y precondition can always be satisfied, which means we can drop the shadowing rule.

In general, we never distinguish between terms based on how their variables happen to be named. This is a key design principle for lambda calculus, but it means that we need to treat different ASTs are equivalent.

Attemp #4 (finally, assuming systematic renaming, we have it):

  ---------------------------------- subst_var_same
             x[e/x] = e

             x =/= y
  ---------------------------------- subst_var_diff
            x[e/y] = x

    e1[e/x] = e1'    e2[e/x] = e2'
  ---------------------------------- subst_app
        (e1 e2)[e/x] = e1' e2'


  e1[e/x] = e1'   x =/= y   f ∉ FV(e)
  ---------------------------------- subst_abs_diff
        (λy. e1)[e/x] = λy. e1'

Getting substitution right a notoriously annoying problem in PL. If you search for "capture avoiding substitution" you can see much wailing and gnashing of teeth.

For some final jargon: