In languages like Java (or C/C++), every object reference (or pointer) may be null, and by default can be mutated.
In ML, as we have seen, all references to data default to being
immutable (unchangeable), and always point to some value; an
int * string
tuple must contain an integer value and
a string value. This simplifies reasoning considerably:
However, sometimes you need mutation (the ability to update a value) and optional-ness. ML provides data types in the standard library that provide them. Optional data is fairly straightforward. Mutable data, however, represent a significant departure from what we've covered previously.
Usually, when you have a data type that requires an "empty" case, you will define a customized constructor for that data type --- for example, our polymorphic tree:
datatype 'a Tree = Empty | Node of 'a * 'a Tree * 'a Tree
However, sometimes it's annoying to define a new data type
whenever something is optional. What if you want to define a
find
function over lists that only optionally returns
a value? You could define a new datatype:
datatype 'a FindResult = NotFound | Found of 'a
and then find
could have type
(('a -> bool) * 'a list) -> 'a FindResult
But this is overkill; and you would have to do it for every
function that might optionally return an empty value. So ML
provides a standard polymorphic library datatype
option
:
datatype 'a option = NONE | SOME of 'a
This is used the same way that any other datatype is used:
- val v = SOME 5; val v = SOME 5 : int option - fun find f nil = NONE | find f (x::xs) = if f x then SOME x else find f xs; val find = fn : ('a -> bool) -> 'a list -> 'a option case find (fn x => x > 0.0) [~2.5, 0.0, ~4.4, 30.0, ~15.0] of NONE => "No value" | SOME v => "Found: " ^ (Real.toString v) val it = "Found: 30.0" : string
There's also a standard function valOf
that is
defined as follows:
fun valOf NONE = raise Option | valOf SOME s = s;
You, as a user, can choose whether to use pattern-matching over
both cases, or raise an exception in the case of none. There's
also a getOpt
function that allows you to provide a
default value to be returned in the NONE case:
- getOpt (NONE, ~1); val it = ~1 : int
Note that we could have used option
instead of
defining multiple cases for our tree data:
datatype 'a Tree = Node of ('a * 'a Tree * 'a Tree) option;
In this representation, the argument of Node is optional; an empty value is represented as follows:
- Node NONE; val it = Node NONE : 'a Tree
A non-empty tree is represented using SOME
:
- Node (SOME (10, Node NONE, Node NONE)); val it = Node (SOME (10,Node NONE,Node NONE)) : int Tree
This is more cumbersome, obviously. But actually, this is how many languages --- e.g., Java and C --- typically encode data types with an "empty" case. This is because in such languages, all pointers can be null. Consider the Java tree node class:
public class Node { final Object v; final Node left, right; public Node(Object val, Node left, Node right) { this.val = val; this.left = left; this.right = right; } }
What is an empty tree? It is an empty Node reference:
final Node n = null;
A tree with two empty children uses two null pointers:
final Node m = new Node("hi", null, null);
Therefore, in Java-like languages, every reference to a type T is really a reference to a type "T option". This means that the programmer always has to consider whether some value might be null and lead to a null pointer exception.
Mutable data is handled in ML primarily using the 'a
ref
polymorphic datatype, which has a single constructor,
ref
:
- ref; val it = fn : 'a -> 'a ref - val x = ref 5 : int ref val x = ref 5;
ref
allocates a fresh mutable
(alterable/assignable) reference which can be read or
changed (the value is sometimes called a ref
cell). For any value v
of type T
ref
, you can perform two operations:
!
(exclamation point), producing a value of type T:
op !; val it = fn : 'a ref -> 'a !x; val it = 5 : int - val i:int = x; stdIn:18.1-18.14 Error: pattern and expression in val dec don't agree [tycon mismatch] pattern: int expression: int ref in declaration: i : int = x
:=
, as follows:
- op :=; val it = fn : 'a ref * 'a -> unit - x := 10; val it = () : unit - !x; val it = 10 : int
Note that this does not alter the binding --- bindings are immutable. The binding continues to point to the same ref cell; it is only the contents of the cell that are updated.
Fig. 1 shows how allocation and updating work. The
ref
constructor allocates a cell and fixes an initial
value. The :=
operation updates the value in the
cell, making it point to a different integer.
The fact that x
points to the same ref cell should
become clear when we produce an alias to the same
ref cell (another pointer that points to the same location:
- val y = x; val y = ref 10 : int ref - x := 20 val it = () : unit - y := 30; val it = () : unit - !x; val it = 30 : int
ref
values are first class --- they can be parts
of any value, in the usual way:
- val name = {first=ref "Keunwoo", last=ref "Lee"}; val name = {first=ref "Keunwoo",last=ref "Lee"} : {first:string ref, last:string ref} - #last(name); val it = ref "Lee" : string ref - #last(name) := "Kim"; val it = () : unit - name; val it = {first=ref "Keunwoo",last=ref "Kim"} : {first:string ref, last:string ref}
In languages like Java or C, essentially all bindings --- including object fields, local variables, and class variables --- are actually bound to refs, because they can be updated. In fact, in Java, all non-final object references are actually references to options, because they point to updatable locations that may be null.
(Thought question: what is the difference between a int
option ref
and a int ref option
?)
This is another example of ML's clean design and orthogonality --- you do not get "more than you asked for" in a type, but you can freely combine properties like mutability or optional-ness when you want them.
Suppose you wanted to write an iterative sumList
function instead of a recursive one. Now that we have assignment,
we can do so --- it looks like this:
fun sumList aList = let val sum = ref 0 val current = ref aList in (while not (null(!current)) do (sum := hd(!current) + !sum; current := tl(!current)); !sum) end;
Note our use of the (expr;
... ;expr)
expression sequence syntax. Even
allowing some ugliness for the fact that ML forces you to put lots
of dereferences, I claim this is clearly uglier than the recursive
version, even taking into account the tail-recursion
conversion.
Suggested exercise: try to write map
,
filter
, and foldl
using iteration.
Which do you prefer, the iterative or recursive formulations of
these functions?
Mutable data brings us to an interesting and rather type system
problem. Suppose we could have a value of type 'a
ref
(note: the following is not legal ML code, for
reasons we'll discuss shortly):
val x:'a list ref = ref [];
Seems to make perfect sense: []
has type 'a
list
(it's a polymorphic value), so we should be able to
allocate a ref cell and assign that to a binding of type 'a
list ref
. But now suppose we have the following
code:
fun f y = x := y; f [17];
Since x
has the type 'a list ref
, the
function f
ought to have the type 'a list ->
unit
, and the body of f
ought to typecheck ---
we're updating the contents of 'a list ref
with a
value of type 'a list
.
We should then be able to apply f
to the value
[17]
by instantiating f
's type to
int list ref -> unit
. Evaluation of f
[17]
results in the list value [17]
becoming
the target of x
's ref cell.
Now, suppose we do this:
fun g () = !x; val y:bool list = g(); if hd(y) then "hi" else "bye";
(Pretend you don't know about f
and f
[17]
, because the typechecker doesn't.) This code ought to
typecheck as well! Consider the body of g
: it
dereferences x
, which has type 'a list
ref
. Therefore, g
should get type unit
-> 'a list
(the return type is the result type from
dereferencing a 'a list ref
).
Now, when we bind the result of 'a list
to a
bool list
binding, we simply instantiate
'a
with bool
, so that binding is
well-typed.
Finally, we take the head of y
and use it as a
boolean value. But, supposing we executed f [17]
as
we did above, the head of y
will not be a boolean
value --- it will be an integer. We have just violated type
safety. This is known as the "polymorphic ref problem" and comes
up wherever we have mutation and polymorphism together.
Where did we go wrong?
ML's answer is that we should not allow the type 'a list
ref
for a val
binding, because it could be
instantiated later with two different types for 'a
--- which, as we've shown, can lead to writing the ref cell at one
type, and reading it at another.
More generally, ML strongly restricts the introduction of
polymorphic types for val
bindings. For a
binding
val name = expr
name
is given polymorphic type only if
expr
is a syntactic value.
Recall that a value is an expression that is "done" evaluating ---
a syntactic value is a syntactic representation
of an immutable value. Syntactic values include only the
following kinds of expressions:
fn ... =>
...
).Note that function calls are not included. This rule is called the value restriction. It suffices to make sure that you're not creating mutable locations, either directly (by constructing a mutable location) or indirectly (e.g., by calling a function that constructs a ref cell).
When you get a polymorphic type from a non-syntactic-value expression, and attempt to bind it to a name, ML will instantiate the polymorphic type with a dummy type. This is why ML gives an error when you write:
- val x = ref NONE; stdIn:46.1-46.17 Warning: type vars not generalized because of value restriction are instantiated to dummy types (X1,X2,...) val x = ref NONE : ?.X1 option ref
Recall that NONE
has polymorphic type 'a
option
. ref NONE
therefore, naively, has type
'a option ref
; but this is not a syntactic value, so
the 'a
, rather than being "passed through" to the
type of x
, is instantiated with a fresh,
non-polymorphic dummy type that SML/NJ prints as
?.X1
.
ML has other updatable data structures, including arrays, which
work similarly to refs. Array functions are found in the
Array
structure (we haven't covered structures, but
for now think of a structure as something like a Java package or a
C++ namespace):
- Array.array; val it = fn : int * 'a -> 'a array - val array = Array.array(10, 0); val a = [|0,0,0,0,0,0,0,0,0,0|] : int array - val b = Array.fromList [1, 2, 3]; val b = [|1,2,3|] : int array - Array.update(a, 0, 1); val it = () : unit - a; val it = [|1,0,0,0,0,0,0,0,0,0|] : int array - Array.sub(a, 0); val it = 1 : int
ML also has an immutable array type, called
vector
. You might wonder: if you have
vector
and ref
, why do you need arrays?
Couldn't you just have a ref vector
? The answer is
yes---
- Vector.fromList [1, 2, 3]; val it = #[1,2,3] : int vector - val c = Vector.fromList [ref 1, ref 2, ref 3]; val it = #[ref 1,ref 2,ref 3] : int ref vector - Vector.sub(c, 0) := 4; val it = () : unit - !(Vector.sub (c, 0)); val it = 4 : int
The problem with this is that using ref cell has some overhead compared to using an ordinary value reference; and it is quite challenging to remove this overhead in the general case. The naive implementation of a vector of ref cells is shown in Fig. 2.
Because programs that use arrays (for example, numerical
programs) typically require high time and space performance in
array operations, this cost was considered prohibitive. ML chose
to compromise its "purity" and offer an Array
data
type that stands for a direct array of mutable locations.