Functions (values of function type) are first-class values, which means that they have the same rights and privileges as values of any other type. In particular, functions may be passed as arguments and returned as results of other functions, and functions may be stored in and retrieved from data structures such as lists and trees. We will see that first-class functions are an important source of expressive power in ML.
Functions which take functions as arguments or yield functions as results are known as higher-order functions (or sometimes as functionals or operators). Higher-order functions arise frequently in mathematics. For example, the differential operator is the higher-order function that, when given a (differentiable) function on the real line, yields its first derivative as a function on the real line. We also encounter functionals mapping functions to real numbers, and real numbers to functions. An example of the former is provided by the definite integral viewed as a function of its integrand, and an example of the latter is the definite integral of a given function on the interval [a,x], viewed as a function of a, that yields the area under the curve from a to x as a function of x.
Higher-order functions are less familiar tools for many prgrammers since the best-known programming languages have only rudimentary mechanisms to support their use. In contrast higher-order functions play a prominent role in ML, with a variety of interesting applications. Their use may be classified into two broad categories:
Before discussing these programming techniques, we will review the
critically important concept of scope as it applies to function definitions.
Recall that Standard ML is a statically scoped language, meaning that
identifiers are resolved according to the static structure of the program. A use
of the variable x
is considered to be a reference to the nearest
lexically enclosing declaration of x
. We say "nearest"
because of the possibility of shadowing; if we re-declare a variable x
, then
subsequent uses of x
refer to the "most recent" (lexically!)
declaration of it; any "previous" declarations are temporarily shadowed by the
latest one.
This principle is easy to apply when considering sequences of
declarations. For example, it should be clear by now that the variable y
is bound to 32
after processing the following sequence of declarations:
val x = 2 (* x=2 *)
val y = x*x (* y=4 *)
val x = y*x (* x=8 *)
val y = x*y (* y=32 *)
In the presence of function definitions the situation is the same, but it can be a bit tricky to understand at first. Here's an example to test your grasp of the lexical scoping principle:
val x = 2
fun f y = x+y
val x = 3
val z = f 4
After processing these declarations the variable z
is bound
to 6
, not to 7
! The reason is that the occurrence of x
in the body of f
refers to the first declaration of x
since it is the nearest lexically enclosing declaration of the occurence, even though
it has been subsequently re-declared. This example illustrates three important
points:
Binding is not assignment! If we were to view the second binding
of x
as an assignment statement, then the value of z
would be 7
,
not 6
.
Scope resolution is lexical, not temporal. We sometimes refer to the "most recent" declaration of a variable, which has a temporal flavor, but we always mean "nearest lexically enclosing at the point of occurrence".
"Shadowed" bindings are not lost. The
"old" binding for x
is still available (through calls to f
),
even though a more recent binding has shadowed it.
One way to understand what's going on here is through the concept of a closure,
a technique for implementing higher-order functions. When a function expression is
evaluated, a copy of the dynamic environment is attached to the function.
Subsequently, all free variables of the function (i.e., those variables not
occurring as parameters) are resolved with respect to the environment attached to the
function; the function is therefore said to be "closed" with respect to the
attached environment. This is achieved at function application time by
"swapping" the attached environment of the function for the environment active
at the point of the call. The swapped environment is restored after the call is
complete. Returning to the example above, the environment associated with the
function f
contains the declaration val x = 2
to record the fact
that at the time the function was evaluated, the variable x
was bound to the
value 2
. The variable x
is subsequently re-bound to 3
,
but when f
is applied, we temporarily reinstate the binding of x
to 2
, add a binding of y
to 4
, then evaluate the
body of the function, yielding 6
. We then restore the binding of x
and drop the binding of y
before yielding the result.
While seemingly very simple, the principle of lexical scope is the source of considerable expressive power. We'll demonstrate this through a series of examples.
To warm up let's consider some simple examples of passing functions as
arguments and yielding functions as results. The standard example of passing a
function as argument is the map
' function, which applies a given function to
every element of a list. It is defined as follows:
fun map' (f, nil) = nil
| map' (f, h::t) = (f h) :: map' (f, t)
For example, the application
map' (fn x => x+1, [1,2,3,4])
evaluates to the list [2,3,4,5]
.
Functions may also yield functions as results. What is surprising is
that we can create new functions during execution, not just return functions that
have been previously defined. The most basic (and deceptively simple) example is the
function constantly
that creates constant functions: given a value k
,
the application constantly k
yields a function that yields k
whenever it is applied. Here's a definition of constantly
:
val constantly = fn k => (fn a => k)
The function constantly has type 'a -> ('b -> 'a)
.
We used the fn
notation for clarity, but the declaration of the
function constantly
may also be written using fun
notation as
follows:
fun constantly k a = k
Note well that a white space separates the two successive
arguments to constantly
! The meaning of this declaration is precisely
the same as the earlier definition using fn
notation.
The value of the application constantly 3
is the function
that is constantly 3
; i.e., it always yields 3
when
applied. Yet nowhere have we defined the function that always yields 3
.
The resulting function is "created" by the application of constantly
to the argument 3
, rather than merely "retrieved" off the shelf of
previously-defined functions. In implementation terms the result of the application constantly
3
is a closure consisting of the function fn a => k
with the
environment val k = 3
attached to it. The closure is a data structure
(a pair) that is created by each application of constantly
to an argument;
the closure is the representation of the "new" function yielded by the
application. Notice, however, that the only difference between any two
results of applying the function constantly
lies in the attached environment;
the underlying function is always fn a => k
. If we think of
the lambda as the "executable code" of the function, then this amounts to the
observation that no new code is created at run-time, just new instances
of existing code.
This discussion illustrates why functions in ML are not the same as code pointers in C. You may be familiar with the idea of passing a pointer to a C function to another C function as a means of passing functions as arguments or yielding functions as results. This may be considered to be a form of "higher-order" function in C, but it must be emphasized that code pointers are significantly less expressive than closures because in C there are only statically many possibilities for a code pointer (it must point to one of the functions defined in your code), whereas in ML we may generate dynamically many different instances of a function, differing in the bindings of the variables in its environment. The non-varying part of the closure, the code, is directly analogous to a function pointer in C, but there is no counterpart in C of the varying part of the closure, the dynamic environment.
The definition of the function map'
given above takes a
function and list as arguments, yielding a new list as result. Often it occurs that
we wish to map the same function across several different lists. It is inconvenient
(and a tad inefficient) to keep passing the same function to map'
, with the
list argument varying each time. Instead we would prefer to create a instance of map
specialized to the given function that can then be applied to many different lists.
This leads to the following (standard) definition of the function map
:
fun map f nil = nil
| map f (h::t) = (f h) :: (map f t)
The function map
so defined has type ('a->'b) ->
'a list -> 'b list
. It takes a function of type 'a -> 'b
as argument, and yields another function of type 'a list -> 'b list
as
result.
The passage from map'
to map
is called currying.
We have changed a two-argument function (more properly, a function taking a pair as
argument) into a function that takes two arguments in succession, yielding after the first
a function that takes the second as its sole argument. This passage can be codified
as follows:
fun curry f x y = f (x, y)
The type of curry
is ('a*'b->'c) -> ('a -> ('b
-> 'c))
. Given a two-argument function, curry returns another function
that, when applied to the first argument, yields a function that, when applied to the
second, applies the original two-argument function to the first and second arguments,
given separately.
Observe that map
may be alternately defined by the binding
fun map f l = curry map' f l
Applications are implicitly left-associated, so that this definition is equivalent to the more verbose declaration
fun map f l = ((curry map') f) l
We turn now to the idea of abstracting patterns of control. There is an obvious similarity between the following two functions, one to add up the numbers in a list, the other to multiply them.
fun add_em nil = 0
| add_em (h::t) = h + add_em t
fun mul_em nil = 1
| mul_em (h::t) = h * mul_em t
What precisely is the similarity? We will look at it from two points
of view. One is that in each case we have a binary operation and a unit element for
it. The result on the empty list is the unit element, and the result on a non-empty
list is the operation applied to the head of the list and the result on the tail.
This pattern can be abstracted as the function reduce
defined as follows:
fun reduce (unit, opn, nil) = unit
| reduce (unit, opn, h::t) = opn (h, reduce (unit, opn, t))
Here is the type of reduce
:
val reduce : 'b * ('a*'b->'b) * 'a list -> 'b
The first argument is the unit element, the second is the operation, and
the third is the list of values. Notice that the type of the operation admits the
possibility of the first argument having a different type from the second argument and
result. Using reduce, we may re-define add_em
and mul_em
as follows:
fun add_em l = reduce (0, op +, l)
fun mul_em l = reduce (1, op *, l)
To further check your understanding, consider the following declaration:
fun mystery l = reduce (nil, op ::, l)
(Recall that "op ::
" is the function of type 'a
* 'a list -> 'a list
that adds a given value to the front of a list.) What
function does mystery
compute?
Another perspective on the commonality between add_em
and mul_em
is that they are both defined by induction on the structure of the list argument, with a
base case for nil
, and an inductive case for h::t
, defined in
terms of its behavior on t
. But this is really just another way of
saying that they are defined in terms of a unit element and a binary operation! The
difference is one of perspective: whether we focus on the pattern part of the clauses (the
inductive decomposition) or the result part of the clauses (the unit and operation).
The recursive structure of add_em
and mul_em
is
abstracted by the reduce
functional, which is then specialized to yield add_em
and mul_em
. Said another way, reduce abstracts the pattern of
defining a function by induction on the structure of a list.
The definition of reduce
leaves something to be
desired. One thing to notice is that the arguments unit
and opn
are carried unchanged through the recursion; only the list parameter changes on recursive
calls. While this might seem like a minor overhead, it's important to remember that
multi-argument functions are really single-argument functions that take a tuple as
argument. This means that each time around the loop we are constructing a new tuple
whose first and second components remain fixed, but whose third component varies. Is
there a better way? Here's another definition that isolates the "inner
loop" as an auxiliary, tail-recursive function:
fun better_reduce (unit, opn, l) =
let
fun red nil = unit
| red (h::t) = opn (h, red t)
in
red l
end
Notice that each call to better_reduce
creates a new
function red
that uses the parameters unit
and opn
of the call to better_reduce
. This means that red
is bound
to a closure consisting of the code for the function together with the environment active
at the point of definition, which will provide bindings for unit
and opn
arising from the application of better_reduce
to its arguments.
Furthermore, the recursive calls to red
no longer carry bindings for unit
and opn
, saving the overhead of creating tuples on each iteration of the
loop.
An interesting variation on reduce
may be obtained by staging
the computation. The motivation is that unit
and opn
often
remain fixed for many different lists (e.g., we may wish to sum the elements of
many different lists). In this case unit
and opn
are said
to be "early" arguments and the list is said to be a "late" argument.
The idea of staging is to perform as much computation as possible on the basis of
the early arguments, yielding a function of the late arguments alone. In the present
case this amounts to building red
on the basis of unit
and opn
,
yielding it as a function that may be later applied to many different lists. Here's
the code:
fun staged_reduce (unit, opn) =
let
fun red nil = unit
| red (h::t) = opn (h, red t)
in
red
end
The definition of staged_reduce
bears a close resemblance to
the definition of better_reduce
; the only difference is that the creation of
the closure bound to red
occurs as soon as unit and opn are
known, rather than each time the list argument is supplied. Thus the overhead
of closure creation is "factored out" of multiple applications of the resulting
function to list arguments.
We could just as well have replaced the body of the let expression with the function
fn l => red l
but a moment's thought reveals that the meaning is precisely the same (apart from one additional function call in the latter case).
Note well that we would not obtain the effect of staging were we to use the following definition:
fun curried_reduce (unit, opn) nil = unit
| curried_reduce (unit, opn) (h::t) = opn (h, curried_reduce (unit, opn) t)
If we unravel the fun
notation, we see that while we are
taking two arguments in succession, we are not doing any useful work in between
the arrival of the first argument (a pair) and the second (a list). A curried
function does not take significant advantage of staging. Since staged_reduce
and curried_reduce
have the same iterated function type, namely
('b * ('a * 'b -> 'b)) -> 'a list -> 'b
the contrast between these two examples may be summarized by saying not
every function of iterated function type is curried. Some are, and some aren't.
The "interesting" examples (such as staged_reduce
) are the
ones that aren't curried. (This directly contradicts established
terminology, but I'm afraid it is necessary to avoid misapprehension.)
The time saved by staging the computation in the definition of staged_reduce
is admittedly minor. But consider the following definition of an append function for
lists that takes both arguments at once:
fun append (nil, l) = l
| append (h::t, l) = h :: append(t,l)
Suppose that we will have occasion to append many lists to the end of a given list. What we'd like is to build a specialized appender for the first list that, when applied to a second list, appends the second to the end of the first. Here's a naive solution that merely curries append:
fun curried_append nil l = l
| curried_append (h::t) l = h :: append t l
Unfortunately this solution doesn't exploit the fact that the first
argument is fixed for many second arguments. In particular, each application of the
result of applying curried_append
to a list results in the first list being
traversed so that the second can be appended to it. We can improve on this by
staging the computation as follows:
fun staged_append nil = fn l => l
| staged_append (h::t) =
let
val tail_appender = staged_append t
in
fn l => h :: tail_appender l
end
Notice that the first list is traversed once for all applications
to a second argument. When applied to a list [v1,
...,vn]
,
the function staged_append
yields a function that is equivalent to, but
not quite as efficient as, the function
fn l => v1 :: v2 :: ... :: vn :: l.
This still takes time proportional to n, but a substantial savings accrues from avoiding the pattern matching required to destructure the original list argument on each call.