A function in ML is written as follows:

fnarg=>returnValue

For example, the following function returns an integer that is one greater than its argument:

- fn x => x + 1;val it = fn : int -> int

- SML/NJ gives the above function the type
`int -> int`

. In general, the type of a function is written

, which is reminiscent of the way mathematicians describe the domain and codomain of functions in math. In the academic programming languages literature, function types are sometimes be called "arrow types".*argType*->*returnType* - Unlike atomic or simple compound types, function
*values*are not echoed by SML/NJ; instead, SML/NJ simply writes`fn`

as a placeholder for the value. - The return value is the entire body of the function. Since ML proceeds by evaluaton of expressions, this is a natural way to define functions: the body is an expression that gets evaluated. This design differs from imperative languages, where a function body is usually a block of code to be executed.

Function arguments (like all names in binding positions) and function bodies (like all expressions), can optionally be ascribed types:

- fn x:int => x + 1; (* ascribing to the argument *)val it = fn : int -> int- fn x => (x + 1):int; (* ascribing to the body *)val it = fn : int -> int

This will sometimes be necessary when the body does not provide enough information to determine the exact type of an argument or return value. For example:

- fn stringPair => #1(stringPair) ^ "!";stdIn:32.1-32.38 Error: unresolved flex record (can't tell what fields there are besides #1)

We know that this function's argument is a tuple --- in fact, the programmer probably intends a pair, . However, we can't tell how many elements the tuple has. ML needs to know this in order to assign a type to the function, so we must ascribe a type to the argument:

- fn stringPair:(string * string) => #1(stringPair) ^ "!"; val it = fn : string * string -> string

In order to construct examples where ascribing to the return value is necessary, we must wait until we see more of the ML type system.

Recall that all values in ML are *first-class*.
Functions are values. All values can be bound to names.
Therefore, functions can be bound to names, which evaluate to
their bound value exactly the same way that any other name
evaluates:

- val addOne = fn x => x + 1;val addOne = fn : int -> int- addOne;val it = fn : int -> int

Since it is so common to bind function values to names, ML has syntactic sugar for function declarations:

- fun addOne x = x + 1;val addOne = fn : int -> int

Notice that SML/NJ echoes the desugared form of the
`val`

declaration. The two *syntactic* forms
are *semantically* equivalent in every way.

ML's treatment of functions and naming contrasts strongly with languages like C (where functions may only occur at top level, and must always be named) or Java (where methods may not be defined independently of classes, and methods only occur as "values" in the sense that object values can have methods).

Functions are applied to arguments by writing the argument
after the function expression, and parenthesis around the argument
are strictly optional. All of the following apply the function
value bound to `addOne`

to the integer 3:

- addOne 3;val it = 4 : int- addOne(3);val it = 4 : int- (addOne 3);val it = 4 : int- (addOne)3;val it = 4 : int

In ML programming, we usually include the parenthesis only where needed to enforce order of evaluation.

Unlike some other languages, functions do not need to be bound
to a name before they are applied; you may use the `fn`

expression (an `anonymous function`

) directly:

(fn x => x + 1) 3;val it = 4 : int

This is yet another instance of ML's regularity. Functions are simply values. Evaluating a function application simply proceeds by three steps:

- Evaluate the function value.
- Evaluating the argument value.
- Apply the function to the argument.

It doesn't matter whether step 1 is a variable expression (for
looking up a function value bound to a name) or an anonymous
function expression. Both expressions evaluate to function
values. More generally, it doesn't matter *where* the
function expression comes from --- it may be obtained from the
return value of a function, or by accessing a component of a data
structure, or any of the other ways that a value may obtained.

Function calls are typechecked in the obvious way: the actual argument must match the formal argument type. When it does not, you get an error:

- addOne "hello";stdIn:22.1-22.15 Error: operator and operand don't agree [tycon mismatch] operator domain: int operand: string in expression: addOne "hello"

Function application has quite high precedence, which can sometimes be confusing. Consider the folowing code fragment:

fun italic s = "<i>" ^ s ^ "</i>";- val italic = fn : string -> stringfun italicGreeting name = italic "Hello, " ^ name;- val italicGreeting = fn : string -> stringitalicGreeting "Keunwoo";val it = "<i>Hello, </i>Keunwoo" : string

The `italic`

function surrounds the input string in
the HTML markup for italic text. You might think that the string
concatenation expression `"Hello, " ^ name`

gets
evaluated, and the result passed to `italic`

, but
function application has higher precedence than string
concatenation (or, indeed, most other operators).

Thought question: Suppose you're typing a list in the square-bracket syntax and you accidentally omit a comma:

[1, 2 3, 4];

What happens? Why?

Sometimes side effects are unavoidable. For now, we will
acknowledge one limited use for side effects: input and output.
The standard library function `print`

must have a side
effect: printing to standard output changes the world. But what
should a function like this return? It might return a status
code, but often such functions have no natural return value.

Languages like Pascal solve this problem by dividing the
universe of control abstractions into two kinds: functions, which
return values, and procedures, which do not. Languages like C
solve this problem by having `void`

functions ---
functions that return nothing. ML uses an approach similar, but
not identical to, the latter: it uses the `unit`

type,
which has one value, written `()`

:

- print;val it = fn : string -> unit- print "hi\n";hi - val it = () : unit

Functions that naturally take no parameters can accept
`unit`

:

- val printHi = fn () => print "hi\n";val printHi = fn : unit -> unit- printHi()hi val it = () : unit

Because `unit`

is written `()`

, this is a
sort of "visual pun" on zero-argument function calls in other
languages.

Imperative languages express branching through conditional
*statements*; functional languages like ML, being
expression-oriented, express branching primarily through
conditional *expressions*.

`if`

expressions`if`

conditional expressions in ML have the
following syntax:

ifbooleanExprthenexpr1elseexpr2

These have the "obvious" semantics (similar to the
`:?`

operator in C):

- First,

is evaluated.*booleanExpr* - If the test expression is true, then the first expression is evaluated and is returned.
- If the test expression is false, then the

is evaluated, and is returned.*expr2*

Here's a simple conditional expression:

- if 1 > 2 then

Like all expressions, `if`

expressions are
first-class. The result of an `if`

expression can be
used anywhere any other expression can be used. For example:

[1, 2, if x = 4 then 5 else 6 ];

if x = 4 then (if x > 10 then y else z, if x > 20 then a else b) else (17, 18)

Be careful --- the first branch in the outermost
`if`

is a tuple (comma-separated value in parens), not
a sequence of two expressions.

Note that conditional expressions do not evaluate the un-taken
branch --- this is why `if`

cannot be naively
implemented as an ordinary function call, which evaluates all its
arguments prior to invoking the function.

(Actually, we can implement a proper `if`

function
using function parameters, but as we shall see this would be
rather more verbose to use given ML's anonymous function
syntax.)

A conditional expression may return either of its branches. What should be the type of the following expression?

if p then 27 else "hello"

In the ML type system, this expression has no sensible type ---
depending on the value of `p`

, either branch may be
returned, so neither `int`

nor `string`

describes the result value adequately.

In ML, branches of a conditional expression must have exactly the same type.

Expression sequences in ML are written as one or more
*semicolon*-separated sequence of expressions in round
parenthesis. Sequences

Expression sequences have the following semantics:

- Evaluate each expression in left-to-right order.
- Return the
*last*expression evaluated as the value of the whole expression

All results besides the last expression are discarded.
Expression sequences are primarily useful for side-effecting
expressions like `print`

calls (in this class, you will
primarily use them for inserting debugging statements):

- val x = (print "hi\n"; 3)hi val x = 3 : int

Thought question: What should the type checking rules for
expression sequences be, if any? Need there be any relationship
among the types of expressions in the sequence, as there are with
`if`

? Why or why not?

`case`

The `if`

expression essentially provides a way to
**match** a boolean value against true or false.
Another way to write this in ML is as follows:

casebooleanExprof true =>expr1| false =>expr2

The `case`

construct takes a value and attempts to
match it against one or more `patterns`

--- in this
case, the two boolean **constant patterns**,
`true`

and `false`

. If a pattern matches,
then its corresponding expression is evaluated and returned as the
value of the entire case expression. Matching is
**first-match**: the pattens are tried in
left-to-right order, and the *first* matching pattern's
expression is evaluated and returned.

As with `if`

expressions, the body expressions of
all branches of a `case`

statement must have the same
type. The reason for this restriction is the same as with
`if`

.

`case`

would be overkill if we only had boolean
values; but `case`

can be used with any type, not just
boolean. Let's try integers:

- val x = 3; val x = 3 : int - case x of = 1 => "one" = | 2 => "two" = | 3 => "three";stdIn:40.1-43.15 Warning: match nonexhaustive 1 => ... 2 => ... 3 => ... val it = "three" : string

We got the answer we expected, but why the warning? The answer
is that the cases are not **exhaustive**, which means
that the cases we gave do not cover the entire possible range of
the data type being tested --- in this case, `int`

. We
have not enumerated all the possible integer values.

ML *does* have a well-defined behavior in the case we
apply the case to a bad value --- it raises a match failure
exception:

- case 25 of 1 => "one" | 2 => "two";stdIn:17.1-19.15 Warning: match nonexhaustive 1 => ... 2 => ... uncaught exception nonexhaustive match failure raised at: stdIn:19.10

But ML raises a warning because it's generally good programming style to cover all the cases. If you're a Java programmer, you might conclude that we need a way to provide a default case. Indeed, that is correct, but ML actually contains a better, more generally useful mechanism that solves this problem: it simply allows more general patterns, some of which can match more than one value.

The first of these is **wildcard** patterns, which
match *any* value:

- case x of = 1 => "one" = | _ => "anything else";val it = "anything else" : string

What if we reversed the order of cases?

- case x of = _ => "anything else" = | 1 => "one";stdIn:53.1-55.13 Error: match redundant _ => ... --> 1 => ...

Oops. What's going on? Recall that ML is first-match --- the second case can never be reached, because the wildcard pattern will always match. More generally, ML will raise an error if you try to define any pattern case after some other case which subsumes it.

The second interesting type of non-constant pattern is
**variable patterns**, which not only match any value
but bind that value to a variable name for later use:

- case x of = 1 => "one" = | y => "x is: " ^ Int.toString y;val it = "x is: 3" : string

This may seem a bit silly --- aren't we just naming a value
that we've either constructed, or already have a name for? --- but
variable patterns really come a live when we add the third kind of
pattern, **constructor patterns**.

When we discussed ML's built-in data types, we talked about
**constructors**, which were functions that produced
values of a given type. ML allows constructors to appear in
patterns. Wherever subexpressions would go in a constructor
expression, sub*patterns* appear in the constructor
*pattern*. For example:

- val aPair = (1, 2);val aPair = (1,2) : int * int- case aPair of (0, 0) => "origin" | (1, _) => "first is one" | (2, snd) => "first is two; second is " ^ Int.toString snd | (a, b) => "other value: (" ^ Int.toString a ^ "," ^ Int.toString b;val it = "first is one" : string

The value is a pair (2-tuple) of `int`

s, so all
pattern cases must match pairs of `int`

s. The first
case has two constant patterns for the two tuple members, and
therefore matches only the value `(0, 0)`

. The second
case has a wildcard as its second value, and therefore matches any
pair with `1`

as its first element. The third pattern
matches any pair with `2`

as its first element, but
then saves and uses second element in the expression body. The
last pattern matches any 2-tuple, binding both elements to names,
and uses them in the expression body.

*Any* of the constructors we have seen may appear in a
pattern. Here are some case expressions that use various
constructors we've seen:

case foo of {x=0, y=0} => "origin" | {x=_, y=y} => "non-origin at y-coord " ^ Int.toString y; case bar of () => "unit has only one value." case aStringList of nil => "empty" | hd::tl => "first list element is: " ^ hd;

The last of these --- matching against the `nil`

case of a list and then against the cons case --- will shortly
become quite familiar to you, because essentially all functions
that operate over lists do this.

Patterns are not restricted to use in `case`

statements. They may appear wherever any name binding may appear,
including `val`

declarations and function arguments.
In fact, *all name binding* in the ML core language is
really pattern matching. Here is a function that concatentates
the elements of a string pair:

- fn (x, y) => x ^ y;val it = fn : string * string -> string

Note the use of a tuple pattern in the argument. This looks almost like a function definition in C or Java, where the parameters are separated by commas, but it's completely different. For example, the argument patterns can be a record rather than a tuple, or it can contain nested subpatterns with structure rather than simply names:

- fn {first=firstName, last=lastName} => firstName ^ " " ^ lastName;val it = fn : {first:string, last:string} -> string- fn {x=_:int, y=(a:int, b:int), z=z:string} => Int.toString a ^ z ^ Int.toString bval it = fn : {x:'a, y:int * int, z:string} -> string

For the last of the above, note the use of type ascriptions inside the pattern, and the nested tuple subpattern.

Functions use `case`

at top-level so often that ML
also has a special syntactic sugar which allows you to define a
function in multiple cases. The following two functions are
exactly equivalent:

- fun emptyTest aList = case aList of nil => "empty!" | (x::xs) = "not empty; first elem: " ^ x;val emptyTest = fn : string list -> string- fun emptyTest nil = "empty!" | emptyTest (x::xs) = "not empty; first elem: " ^ x;val emptyTest = fn : string list -> string

Here is how we use a `val`

binding to take apart the
elements of a record:

- aPoint = {x=1, y=2};val aPoint = {x=1,y=2} : {x:int, y:int}- val {x=x, y=y} = aPoint;val x = 1 : int val y = 2 : int

Notice that you can bind more than one name at a time. For
records, it is so common to bind field names to variables of the
same name that ML provides a syntactic sugar which allows you to
write each field name once, omitting the
`=`

:*name*

- val {x,y} = aPoint;val x = 1 : int val y = 2 : int

Val bindings do not provide a way to handle multiple cases in a pattern, so they fail if there is no match.

This is the complete algorithm, in ML-like pseudocode, for determining whether a value matches a pattern:

fun match(value, pattern) = case pattern of constant => if value equals the constant then true else false | wildcard => true | variable => bind value to variable name; true | constructor => if value has same constructor then match subpatterns of pattern with corresponding parts of value if all parts match then true else false else false

Notice that this definition is recursive. Speaking of which...

Functions in ML may be recursive, and must be bound to a name (Thought exercise: why can't ML anonymous functions be recursive?):

- fun length nil = 0 = | length (x::xs) = 1 + length xs;val length = fn : 'a list -> int

Recursive functions, as this example shows, are ideal for handling recursive data structures like lists, trees, etc. Inductive recursive definitions, whether for data or for functions, are defined in cases:

- At least one base case, where the recursion "bottoms out"
- At least one inductive case, where the recursion continues.

For lists, the base *data* case is `nil`

, and
the inductive *data* case is cons. The length
*function* likewise has two cases, one for the base case
and one for the inductive case.

More generally, to write almost any function over a recursive data type, you generally follow a simple formula:

- Look at the cases of the data type.
- For each data case, write one or more function cases:
- For a base case, (usually) compute the appropriate result directly.
- For an inductive case,
- Compute a partial result for the parts directly available (e.g., the head of the list);
- Call recursively on the recursive portion(s) of the data structure (e.g., the tail of the list).
- If necessary, combine the result of the direct and recursive computations.

This recursive formula will occur again and again in your functional programming. Learn it well, and it will help you organize your thinking about recursive data structures even in non-functional languages.

(Aside: what's this `'a list`

type that ML infers
for the `length`

function's argument? Well, if you
examine the body of `length`

, there's actually no
indication as to the *element* type of this list. The list
could be any type --- and this makes perfect sense, since a
function that takes the length of a list never needs to know the
type of that list's elements. ML's type system allows this
function to be **polymorphic** over different types
of lists --- i.e., the same function can be applied to different
types. `'a`

is a **type variable** --- it
stands for "any type". When the function is applied to an
argument, the type variable will be instantiated with the type of
its argument's element type. We'll discuss type variables and
polymorphism in much more detail next week.)

In Java, you wouldn't write a recursive length function. You would use a loop:

class Node { Object o; Node next; } ... int length = 0; for (Node i = List.firstNode; i != null; i = i.next) { length++; }

Observe, however, that a loop of this kind requires mutation:
the `i = i.next`

changes what `i`

points to,
and the length field must be incremented. In functional
programming, you typically use recursion instead of iteration.
Functional programming advocates claim recursion is typically
clearer and less error-prone:

- It is "intuitively obvious" that the ML length function is correct (indeed, it is hard to imagine how to get it wrong), whereas iteration presents many opportunities for error because of the numerous assignments.
- The ML length function is "natural" because it parallels directly the inductive definition of the list data type.

On the other hand, naively implemented recursion often has greater overhead than naively implemented iteration:

- Time and space overhead for procedure call, stack allocation, and return.
- The "natural" inductive definitions of some algorithms are less efficient efficient than iterative definitions.

The `length`

function, as defined above, has one
important drawback. It must keep an activation record on the
procedure call stack for every recursive call.

But this is not true of all recursive functions; or, of all
functions that call another function. In particular, consider the
case where a function returns directly the value of another
function --- this is called a **tail call**. A very
simple example:

fun f aList = length aList;

In this case, it is clear that once `f`

passes
control to `length`

, then the compiler need not keep
the activation record for `length`

around (including,
e.g., the space for the `aList`

parameter), because
`f`

does nothing after `length`

returns.
The compiler can *reuse* that space on the call stack for
the activation record of the `length`

call.

This space-saving optimization is called **tail call
elimination**, because the call is at the "tail" of the
function. This optimization plays a crucial role in functional
language implementation, because of the heavy use of recursion;
indeed, most functional languages specify that implementations
*must* perform tail call elimination. Here are a couple of
tail-recursive functions:

fun last nil = raise Empty | last (x::nil) = x | last (_::rest) = last rest; fun includes (aValue, nil) = false | includes (aValue, (x::xs)) = if aValue = x then true else includes (aValue, xs)

Every case of these functions either "bottoms out" or directly returns the result of a recursive call. Therefore, they are tail recursive.

So what prevents a function from being tail recursive? And is
there any way to *make* a function tail recursive when it
isn't to begin with? It is instructive to examine ordinary tail
calls first. Here is a function that resembles `f`

,
but is *not* tail call:

fun g aList = 1 + length aList;

This function's body does not tail call, because the result of
the call is not returned directly --- `g`

must do more
work (namely, adding one to the result) before returning. The
compiler must keep the activation record for `g`

around
while it is waiting for `length`

to return.

Well, what if we could "push down" that work into the callee,
so that `g`

didn't have work remaining? That would be
great, but in general the caller has no way to modify what the
callee will do. On the other hand, in a recursive function, the
callee is the caller...

fun helper (nil, lengthSoFar) = lengthSoFar | helper (x::xs, lengthSoFar) = helper (xs, lengthSoFar + 1); fun length aList = helper (aList, 0);

How these functions work:

`helper`

is tail-recursive, and has an extra parameter that keeps track of the "length so far".- For
`nil`

,`helper`

returns the length computed so far, because an empty list cannot add more length to the list. - For cons,
`helper`

adds one to`lengthSoFar`

and calls itsself on the tail of the list. - To complete the function, we add a "driver",
`length`

, that invokes the helper on its argument with the whole list and a length so far of zero.

This sort of conversion can be performed on any singly
recursive function. Simply add a helper function that keeps the
"results computed so far" as a parameter, and invoke it with a
suitable initial value. See Ullman 3.5.3 (background in 3.2,
3.3.1) for a discussion of `reverse`

using this
trick.