All sensible languages provide the user with some mechanisms to define data types customized to the needs of their application. ML provides two primary mechanisms:
The ML syntax for type synonyms is as follows:
type [vars] name = type
where [vars]
is an optional list of type
variables, and name
is the name of the type.
Here is an example of a type synonym:
type ISPair = int * string;
Type synonyms are "user-defined syntactic sugar"; a type synonym expands in-place to exactly its type. Therefore, for example:
- val x = (123, "hi"); val x = (123,"hi") : int * string - val y:ISPair = x; val y = (123,"hi") : ISPair - val z:(int * string) = y; val z = (123,"hi") : (int * string)
Every value or expression of type ISPair
is also a
value of type int * string
, and vice versa. When
printing types interactively, the SML/NJ interpreter will pick a
type based on what it thinks you want to see, but it doesn't
really matter, because the types are exactly equivalent.
Type synonyms can be used wherever other types are, including inside other type synonyms:
type Name = {first:string, last:string}; type Date = {month:string, day:int, year:int}; type Person = {name:Name, birthdate:Date};
A type synonym can be polymorphic --- that is, it can incorporate type variables in the body --- provided those type variables are declared in the variable list:
- type 'a myRecord = {x:'a, y:int}; type 'a myRecord = {x:'a, y:int}; - type 'a sequence = 'a list; type 'a sequence = 'a list
Polymorphic type synonyms can be instantiated just like the polymorphic types we've seen in the past:
- val p:'a sequence = [1, 2, 3]; val p = [1,2,3] : int sequence
Because of the richness of ML's built-in types, you can get pretty far using only types and type synonyms. However, they do not allow you to express some important idioms.
Consider ML's built-in list type: it has two constructors,
nil
and cons. Suppose we want to define a binary
tree data type --- this must also have two constructors: some base
case for empty trees, and an inductive case for the nodes.
You might imagine a language that builds-in a special type for binary trees as well as lists. But then what about more general trees, or the myriad of other data that needs more than one case? Clearly, we would like a way to declare user-defined types with multiple cases; and ideally, our built-in list type would be only a special case of this one construct.
The solution in ML is the datatype
construct:
datatype SBinaryTree = SEmpty | SNode of string * SBinaryTree * SBinaryTree;
This datatype declaration says that a BinaryTree
is a type with two constructors:
SEmpty
, the empty treeSNode
, which stores a tuple of a string and two
binary (sub)trees.To construct a value of this type, we apply the constructor to its argument type:
- val a = SEmpty; val a = SEmpty : SBinaryTree - val b = SNode ("hi", SEmpty, SEmpty); val b = SNode ("hi",SEmpty,SEmpty) : SBinaryTree - val c = SNode ("bye", a, b); val c = SNode ("bye",SEmpty, SNode ("hi",SEmpty,SEmpty)) : SBinaryTree - val d = SNode (c, SNode("au revoir", SEmpty, SEmpty)); val d = SNode ("bonjour", SNode ("bye",SEmpty,SNode #), SNode ("au revoir",SEmpty,SEmpty)) : SBinaryTree
(Notice that, as usual, ML truncates the echoed printout for deeply nested values.)
We have called SEmpty
and SNode
constructors for good reason --- you can treat them just like the
built-in constructors, including using them for pattern
matching:
case d of SEmpty => "()" | SNode (s, left, right) => "node: " ^ s;
Processing just one level of a recursive data structure isn't that interesting. Let's process the whole data structure using a recursive function and pattern matching:
fun flattenTree SEmpty = "" | flattenTree (SNode (s, left, right)) = "(" ^ (flattenTree left) ^ "," ^ s ^ "," ^ (flattenTree right) ^ ")";
Just as with the list functions you've learned so far,
In the previous section we've leapt right into several features of datatypes. In this section we'll build datatypes from the ground up, and go into more sophisticated features, including (perhaps most importantly) polymorphic datatypes.
The simplest possible datatype has one case and only a nullary (no-argument) constructor:
datatype Nothing = Nada;
This is pretty useless. A slightly more interesting use of datatypes has multiple cases, but only nullary constructors:
datatype Color = Red | Blue | Green;
This is an "enumerated list of values", much like an
enum
type in C/C++ (Java does not yet have a proper
enumerated type --- in Java, you typically declare many
public static final int
constants instead).
Datatypes with cases are called union or sum types; unlike unions in C, however, ML unions are type-safe, because you must use a type case to access any particular case:
val c = Red; val colorChar = case c of Red => "r" | Blue => "b" | Green => "g";
Datatype cases may optionally take an argument, which is the "data" stored in the value when it is "packed":
datatype StringOrInt = String of string | Int of int;
Finally, datatypes may be recursive, as we have already seen.
Unlike type synonyms, each occurrence of a datatype declaration stands for a fresh type. Therefore, for example, the following two data types are not equivalent, and attempts to unify them will fail:
- datatype Dollar = Dol of int; datatype Dollar = Dol of int - datatype Euro = Eu of int; datatype Euro = Eu of int val d:Dollar = Eu 45; stdIn:80.1-80.21 Error: pattern and expression in val dec don't agree [tycon mismatch] pattern: Dollar expression: Euro in declaration: d : Dollar = Eu 45
This property is quite useful, because it allows us to use the type system to prevent accidental clashes of types that are meant to be distinct. In fact, even occurrences of identical data types are not compatible:
datatype Dollar = Dol of int; datatype Dollar = Dol of int val tenDollars = Dol 10; val tenDollars = Dol 10 : Dollar datatype Dollar = Dol of int; datatype Dollar = Dol of int val tenDollars':Dollar = tenDollars; stdIn:89.1-89.36 Error: pattern and expression in val dec don't agree [tycon mismatch] pattern: Dollar expression: ?.Dollar in declaration: tenDollars' : Dollar = tenDollars
We say therefore that are "generative" (each static declaration in the source text "generates" a new type), or that the type is freshly "branded" by the declaration.
In ML, the programmer therefore has a choice between generative
datatypes, and non-generative type synonyms. Note that classes in
Java are always generative --- each class declaration defines a
fresh type, regardless of the presence of other classes with the
same definition. C has both generative types (although only for
structs) and type synonyms (typedef
, which is more
restricted than ML type synonyms).
Note that constructors are applied much like functions --- by writing the argument after the constructor name. Hmmm, maybe they are functions?---
- SNode; val it = fn : string * SBinaryTree * SBinaryTree -> SBinaryTree
These can be treated just like any other function value; consider our dollar datatype constructor:
- val f = Dol; val f = fn : int -> Dollar - map Dol [12, 24, 36]; val it = [Dol 12,Dol 24,Dol 36] : Dollar list
This is one feature of Standard ML that is not found in all ML
dialects --- some ML dialects do not make constructors
first-class. Thought question: if you wanted to do something like
the above use of map
, but in a language where
constructors were not first-class, how would you do it?
The tree type we've seen so far is limited in usefulness, because it only applies to tree elements. What if we wanted a binary tree with any kind of element at the root? We can do this by using polymorphic datatypes, whose syntax parallels that of polymorphic type synonyms:
datatype 'a BinaryTree = Empty | Node of ('a * 'a BinaryTree * 'a BinaryTree);
This datatype can then be instantiated implicitly, or explicitly with a type synonym or ascription:
val stringNode = Node ("hi", Empty, Empty); val stringNode = Node ("hi",Empty,Empty) : string BinaryTree fun leaf i = Node (i, Empty, Empty); val leaf = fn : 'a -> 'a BinaryTree type IntStrBinTree = (int * string) BinaryTree; type IntStrBinTree = (int * string) BinaryTree val i:int BinaryTree = Node (10, leaf 20, leaf 30); val i = Node (10,Node (20,Empty,Empty),Node (30,Empty,Empty)) : int BinaryTree
We can define a substitute for the built-in list polymorphic type as follows:
- datatype 'a List = Nil | Cons of 'a * 'a List; datatype 'a List = Cons of 'a * 'a List | Nil - val p = Nil; val p = Nil : 'a List val q = Cons (3, Cons(4, Cons(5, Nil))); val q = Cons (3,Cons (4,Cons #)) : int List
In fact, the built-in list data type is really just a plain
datatype 'a list
with some syntactic sugar for the
cons constructor:
datatype 'a list = nil | :: of 'a * 'a list;
Thought question: why is it a bad idea to type the above definition into the ML interpreter yourself? Hint: Consider the standard library functions and datatype generativity.
Recursive datatypes and recursive functions naturally go together, as we have seen with lists and again with binary trees above. We can apply all the lessons we've learned from lists --- for example, consider the following function that maps each element of a string tree to a different string:
- fun reallyExcited Empty = Empty | reallyExcited (Node (s, left, right)) = Node (s ^ "!", reallyExcited left, reallyExcited right); val reallyExcited = fn : string BinaryTree -> string BinaryTree
This is simply an instance of mapping over trees --- so we might want to write a more general function:
- fun treeMap _ Empty = Empty | treeMap f (Node (elem, left, right)) = Node (f elem, (treeMap f left), (treeMap f right)); val treeMap = fn : ('a -> 'b) -> 'a BinaryTree -> 'b BinaryTree - fun reallyExcited' aTree = treeMap (fn s => s ^ "!") aTree; val reallyExcited' = fn : string BinaryTree -> string BinaryTree
Thought exercise: Implement a "tree reduce" function, which
takes a "base case" function that applies to the empty case, and a
function that combines combination at interior nodes. Implement
treeMap
as an application of this tree reduction
function.
A binary tree with elements in random order is not very useful --- usually you want the elements to be sorted for efficient search. However, to write functions that, e.g., insert an element in sorted order into a binary tree, we need an element type for which some comparison function (which enables us to compare the element values) is defined.
Unfortunately, there's no way in the ML core language to state
directly that a type variable can only be instantiated with types
that have a comparison function. A 'a BinaryTree
is
instantiable with any type substituted for
'a
, not just "comparable" types.
However, we do have first-class function values; so we can store a comparison function in the tree data structure for the type that we're interested in. Where will we get this function? Well, the client must provide this function when creating the tree. Observe:
datatype 'a BTNode = Empty | Node of 'a * 'a BTNode * 'a BTNode; type 'a comparisonFn = ('a * 'a) -> bool; datatype 'a BTree = Tree of {greaterThan:'a comparisonFn, root:'a BTNode}; fun leaf x = Node (x, Empty, Empty); fun insert (Tree {greaterThan, root}, x) = let fun insertHelper Empty = leaf x | insertHelper (Node (y, left, right)) = if greaterThan(x, y) then Node (y, left, (insertHelper right)) else Node (y, (insertHelper left), right) in Tree {greaterThan=greaterThan, root=insertHelper(root)} end;
When the client creates a tree instance, the client will be responsible for providing a comparison function, which will be stored in the tree data structure and used for all subsequent operations on the tree. For example:
- val aStringTree = Tree {greaterThan=op >, root=Node("hi", Empty, Empty)}; val aStringTree = Tree {greaterThan=fn,root=Node ("hi",Empty,Empty)} : string BTree - type Point = {x:real, y:real}; type Point = {x:real, y:real} - fun magnitude {x:real, y:real} = Math.sqrt(x*x + y*y); val magnitude = fn : {x:real, y:real} -> real - val aPointTree = Tree {greaterThan= (fn (p1, p2) => (magnitude p1) > (magnitude p2)), root=Node({x=1.0, y=2.0}, Empty, Empty)}; val aPointTree = Tree {greaterThan=fn,root=Node ({x=#,y=#},Empty,Empty)} : {x:real, y:real} BTree - val anotherPT = insert (aPointTree, {x=4.0, y=5.0}); val anotherPT = Tree {greaterThan=fn,root=Node ({x=#,y=#},Empty,Node #)} : {x:real, y:real} BTree
Exercise 1: Write some functions that print out parts of this search tree, to verify that the insertion functions work.
Exercise 2: Notice that the insertion function for
these trees does not eliminate duplicates. How would you update
the definition of BTree
and insert
so
that it is possible to insert only non-duplicate values?
Exercise 3: Rather than storing the function in the tree structure, we could ask the client to pass the comparison function to every invocation of insert; in other words, insert could have the type:
('a BTree * 'a * comparisonFn) -> 'a BTree
Why would this generally be an inferior interface? What programming errors could arise if the library were implemented this way? Explain by implementing this version of the insertion function and showing an example.
More generally, you can have a type that potentially stores many functions over some of its components. If you store several functions in a record, along with an element type, the type begins to look rather like an object:
type Foo = { this:'a, hashCode:'a -> int, toString:'a -> string, ... }
The correspondence (in this translation, at least) is not precise, because the "this" argument has to be passed manually by the client to the member functions. We'll return to contrasting these records-of-functions and objects in later lectures.
So far, we have not learned how to write mutually recursive functions or data types. Perhaps the most trivial example of a mutually recursive datatype is odd/even lists: suppose we wanted two types of lists, one with an odd number of elements and one with an even number of elements. It is not possible to construct an odd-list without an even-list, because the list of length zero (nil) is an even-list; conversely, it is impossible to construct a non-empty even-list without an odd-list.
Recursive declarations in ML must be declared together, using
the and
keyword:
datatype 'a EvenList = Empty | Even of 'a * 'a OddList and 'a OddList = Odd of 'a * 'a EvenList;
Here are values of int EvenList
and int
OddList
type respectively:
val anEven = Even(6, Odd(7, Empty)); val anOdd = Odd(10, Even(11, Odd(12, anEven)));
Mutually recursive datatypes must generally be processed using mutually recursive functions:
fun evenLength (Empty) = 0 | evenLength (Even(_, odd)) = 1 + oddLength(odd) and oddLength (Odd(_, even)) = 1 + evenLength(even);
We'll see a less trivial example of a mutually recursive declaration when we discuss interpreters.
From the previous sections, we have noted that ML datatypes actually join several distinct concepts:
You could imagine a language that separated these three features. In such a language, you'd have a union type construct, a recursive type construct, and a branding type construct. Each of these could then be independently and orthogonally applied, in much the same way that you can combine records and lists. For example, let's just make up some syntax for such a language, and wave our hands about the semantics:
Let union types be written
type1 or type2 or ... or type3
To "pack" a union type, simply use the pack[N]
construct, which packs a value V into a union with the V's type
at position N
:
val iOrS:(int or string) = if p then pack[1](123) else pack[2]("hi")
To unpack the type, use a type case, which for clarity we will invent a new syntax for:
unpack iOrS at 1(i) => print (Int.toString i) | 2(s) => print (s)
The cases of an unpack expression must have the same type, of
course (in this case, unit
).
Let recursive types be written
rec name = type
where name
refers to the recursive type
itself inside the body, type
.
Thought exercise: why aren't recursive types very useful without union types?
Let branded types be written
brand name = type
so that a reference to the type name
is
distinct from any other occurrences of type
.
Therefore, for example, given the "branded" type
brand BISPair = (int * string)
a value of type BISPair
would not be
interchangeable with values of type int * string
.
To construct a fresh value of branded type, apply the type name
to a value of the branded type's underlying type:
BISPair (123, "hi")
is a fresh value of type BISPair
, not
of type int * string
.
OK, we've got some syntax; but getting the semantics to work out the way we'd like it to turns out to be rather tricky. For example, consider how you might write the type of binary trees of strings as follows (parens added to clarify order):
brand BTree = (rec BT = (string or (BT * BT))
But this doesn't work --- you cannot construct the value you
want for the node case of BTree
, because the
BT
occurrences inside the body of the
rec
do not refer to BTree
---
recall that branded types are distinct from the
underlying type. Consider the following code:
val a = BTree pack[1]("hi"); val b = BTree pack[1]("bye"); val c = BTree pack[2](a, b)
The bindings for a
and b
are values
of type BTree
. But the binding for c
can't typecheck, because the BTree pack[2](a, b)
expects type BT * BT
, not BTree * BTree
--- and these are not the same type, because branding
distinguishes BT
from BTree
!
(Actually, it's really hard to make branding both correct and useful when it's a separate construct. The consensus seems to be that it's best integrated as a property of some other type construct. True language aficionados might want to read the section "How the Types Got Their Identity" in the book Systems Programming with Modula-3, edited by Greg Nelson.)
Because of complications like this, and because recursive datatypes are (basically) always used in combination with union types, ML made the language engineering decision to merge all these concepts into one language construct. This makes the language (arguably) less orthogonal, but much easier to use.