A function may bind more than one argument by using a pattern, rather than a variable, in the argument position. Function expressions may have the form
fn
pat=>
exp
where pat is a pattern and exp is an expression. Application of such a function proceeds much as before, except that the argument value is matched against the parameter pattern to determine the bindings of zero or more variables, which are then used during the evaluation of the body of the function.
For example, we may make the following definition of the Euclidean distance function:
val dist : real * real -> real = fn (x:real, y:real) => sqrt (x*x + y*y)
This function may then be applied to a pair (two-tuple!) of arguments to yield the
distance between them. For example, dist (2.0,3.0)
evaluates to
(approximately) 4.0
.
Using fun
notation, the distance function may be defined more concisely as
follows:
fun dist (x:real, y:real):real = sqrt (x*x + y*y)
The meaning is the same as the more verbose val
binding given earlier.
Keyword parameter passing is supported through the use of record patterns. For example, we may define the distance function using keyword parameters as follows:
fun dist' {x=x:real, y=y:real} = sqrt (x*x + y*y)
The expression dist' {x=2.0,y=3.0}
invokes this function with the
indicated x and y values.
Functions with multiple results may be thought of as functions yielding tuples (or records). For example, we might compute two different notions of distance between two points at once as follows:
fun dist2 (x:real, y:real):real*real = (sqrt (x*x+y*y), abs(x-y))
Notice that the result type is a pair, which may be thought of as two results.
These examples illustrate a pleasing regularity in the design of ML. Rather than introduce ad hoc notions such as multiple arguments, multiple results, or keyword parameters, we make use of the general mechanisms of tuples, records, and pattern matching.
It is sometimes useful to have a function to select a particular component from a tuple
or record (e.g., the third component or the component labeled url
).
Such functions may be easily defined using pattern matching. But since they
arise so frequently, they are pre-defined in ML using sharp notation. For
any record type typ1*
...*
typn,
and each i between 1 and n, there is a function #i
of type typ1*
...*
typn->
typi
defined as follows:
fun #i (_, ..., _, x, _, ..., _) = x
where x
occurs in the ith position of the tuple (and there are
underscores in the other n-1 positions). Thus we may refer to the second
field of a three-tuple val by writing #2
val. It is
bad style, however, to over-use the sharp notation; code is generally clearer and easier
to maintain if you use patterns wherever possible. Compare, for example, the
following definition of the Euclidean distance function written using sharp notation with
the original.
fun dist (p:real*real):real = sqrt((#1 p)*(#1 p)+(#2 p)*(#2 p))
You can easily see that this gets out of hand very quickly, leading to unreadable code. Use of the sharp notation is strongly discouraged!
A similar notation is provided for record field selection. The following function
#lab
selects the component of a record with label lab.
fun #lab {lab=x,...} = x
Notice the use of ellipsis! Bear in mind the disambiguation requirement: any use
of #lab
must be in a context sufficient to determine the full record type of
its argument.
Tuple types have the property that all values of that type have the same form (n-tuples,
for some n determined by the type); they are said to be homogeneous.
For example, all values of type int*real
are pairs whose first component is
an integer and whose second component is a real. Any type-correct pattern will match
any value of that type; there is no possibility of failure of pattern matching. The
pattern (x:int,y:real)
is of type int*real
and hence will match
any value of that type. On the other hand the pattern (x:int,y:real,z:string)
is of type int*real*string
and cannot be used to match against values of type
int*real
; attempting to do so fails at compile time.
Other types have values of more than one form; they are said to be heterogeneous types.
For example, a value of type int
might be 0
, 1
, ~1
,
... or a value of type char
might be #"a"
or #"z"
.
(Other examples of heterogeneous types will arise later on.) Corresponding to each
of the values of these types is a pattern that matches only that value. Attempting
to match any other value against that pattern fails at execution time.
For the time being we will think of match failure as a fatal run-time error, but later on
we will see that the extent of the failure can be controlled.
Here are some simple examples of pattern-matching against values of a heterogeneous type:
val 0 = 1-1
val (0,x) = (1-1,
34
)
val (0, #"0") = (2-1, #"0")
The first two bindings succeed, the third fails. In the case of the second, the
variable x
is bound to
after the match. No
variables are bound in the first or third examples.34
The importance of constant patterns becomes clearer once we consider how to define functions over heterogeneous types. This is achieved in ML using a clausal function definition. The general form of a function is
fn
pat1=>
exp1|
...|
patn=>
expn
where each pati is a pattern and each expi is
an expression involving the variables of the pattern pati. Each
component pat =>
exp is called a clause or rule;
the entire assembly of rules is called a match.
The typing rules for matches ensure consistency of the clauses. Specifically,
The type of a function whose body is a match satisfying these requirements is typ->
typ'.
Note that there is no requirement that typ and typ'
coincide!
Application of functions with multiple clauses to a value val proceeds by considering each clause in the order written. At stage i the argument value val is matched against the pattern pati; if the pattern match succeeds, evaluation continues with the evaluation of expression expi, with the variables replaced by the values determined by the pattern matching process. Otherwise we proceed to stage i+1. If no pattern matches (i.e., we reach stage n+1), then the application fails with an execution error. Here's an example.
val recip : int -> int = fn 0 => 0 | n:int => 1 div n
This defines an integer-valued reciprocal function on the integers, where the
reciprocal of 0
is arbitrarily defined to be 0
. The
function has two clauses, one for the argument 0
, the other for non-zero
arguments n
. (Note that n
is guaranteed to be non-zero
because the patterns are considered in order: we reach the pattern n:int
only
if the argument fails to match the pattern 0
.)
Using fun
notation we may define recip
as follows:
fun recip 0 = 0
| recip (n:int) = 1 div n
One annoying thing to watch out for is that the "fun
" form uses
an equal sign to separate the pattern from the expression in a clause, whereas the "fn
"
form uses an arrow.
Heterogeneous types abound. Perhaps the most fundamental one is the type bool
of booleans. Its values are true
and false
.
Functions may be defined on booleans using clausal definitions that dispatch on true
and false
. For example, the negation function is defined clausally as
follows:
fun not true = false
| not false = true
In fact, this function is pre-defined in ML.
Case analysis on the values of a heterogeneous type is performed by application of a clausally-defined function. The notation
case
expof
pat1=>
exp1|
...|
patn=>
expn
is short for the application
(fn
pat1=>
exp1|
...|
patn=>
expn)
exp.
Evaluation proceeds by first evaluating exp, then matching its value
successively against the patterns in the match until one succeeds, and continuing with
evaluation of the corresponding expression. The case
expression fails
if no pattern succeeds to match the value.
The conditional expression
if
expthen
exp1else
exp2
is short-hand for the case analysis
case
expof
true
=>
exp1|
false
=>
exp2
which is itself short-hand for the application
(
fn
true
=>
exp1|
false
=>
exp2) exp.
The "short-circuit" conjunction and disjunction operations are defined as
follows. The expression exp1 andalso
exp2
is short for if
exp1 then
exp2
else
false
and the expression exp1 orelse
exp2 is short for if
exp1 then
true
else
exp2. You should expand
these into case expressions and check that they behave as expected. Pay particular
attention to the evaluation order, and observe that the call-by-value principle is not
violated by these expressions.
Conceptually, equality and comparison operations on the types int
, char
,
and string
are defined by infinite (or, at any rate, enormously large)
matches, but in practice they are built into the language as primitives. (The
ordering on char
and string
are the lexicographic
orderings.) Thus we may write
fun is_alpha c:char =
(#"a" <= c andalso c <= #"z") orelse (#"A" <= c andalso c <= #"Z")
to test for alphabetic characters.
All this talk of success and failure of pattern matching brings up the issue of exhaustiveness and redundancy in a match. A clause in a match is redundant if any value matching its pattern must have matched the pattern of a preceding clause in the match. A redundant rule can never be reached during execution. It is always an error to have a redundant clause in a match. Redundant clauses often arise accidentally. For example, the second clause of the following function definition is redundant for annoyingly subtle reasons:
fun not True = false
| not false = true
The mistake is to have capitalized True
so that it is no longer the
boolean-typed constant pattern, but is rather a variable that matches any value of Boolean
type. Hence the second clause is redundant. Reversing the order of clauses can
also lead to redundancy, as in the following mistaken definition of recip
:
fun recip (n:int) = 1 div n
| recip 0 = 0
The second clause is redundant because the first clause will always match any
integer value, including 0
.
A match (as a whole) is exhaustive if every possible value of the domain type
of the match must match some clause of that match. In other words, pattern matching
against an exhaustive pattern cannot fail at run-time. The clauses in the (original)
definition of recip
are exhaustive because they cover every possible integer
value. The match comprising the body of the following function is not exhaustive:
fun is_numeric #"0" = true
| is_numeric #"1" = true
| is_numeric #"2" = true
| is_numeric #"3" = true
| is_numeric #"4" = true
| is_numeric #"5" = true
| is_numeric #"6" = true
| is_numeric #"7" = true
| is_numeric #"8" = true
| is_numeric #"9" = true
When applied to, say, #"a"
, this function fails.
It is often, but not always, an error to have an inexhaustive match. The
reason is that the type system cannot record many invariants (such as the property that is_numeric
is only called with a character representing a decimal digit), and hence the
compiler will issue a warning about inexhaustive matches. It is a good idea to check
each such warning to ensure that you have not accidentally omitted a clause from the
match.
Any match can be made exhaustive by the inclusion of a catch-all clause of the form
_ =>
exp
where exp is an expression of the appropriate type. It is a bad idea to simply stick such a clause at the end of every match in order to eliminate "inexhaustive pattern" warnings. By doing so you give up the possibility that the compiler may warn you of a legitimate error (due to a forgotten case) in your program. The compiler is your friend! Use it to your advantage!