CSE 341: User-defined types and type synonyms

All sensible languages provide the user with some mechanisms to define data types customized to the needs of their application. ML provides two primary mechanisms:

Type synonyms

The ML syntax for type synonyms is as follows:

type [vars] name = type

where [vars] is an optional list of type variables, and name is the name of the type. Here is an example of a type synonym:

type ISPair = int * string;

Type synonyms are "user-defined syntactic sugar"; a type synonym expands in-place to exactly its type. Therefore, for example:

- val x = (123, "hi");
val x = (123,"hi") : int * string
- val y:ISPair = x;
val y = (123,"hi") : ISPair
- val z:(int * string) = y;
val z = (123,"hi") : (int * string)

Every value or expression of type ISPair is also a value of type int * string, and vice versa. When printing types interactively, the SML/NJ interpreter will pick a type based on what it thinks you want to see, but it doesn't really matter, because the types are exactly equivalent.

Type synonyms can be used wherever other types are, including inside other type synonyms:

type Name = {first:string, last:string};
type Date = {month:string, day:int, year:int};
type Person = {name:Name, birthdate:Date};

A type synonym can be polymorphic --- that is, it can incorporate type variables in the body --- provided those type variables are declared in the variable list:

- type 'a myRecord = {x:'a, y:int};
type 'a myRecord = {x:'a, y:int};
- type 'a sequence = 'a list;
type 'a sequence = 'a list

Polymorphic type synonyms can be instantiated just like the polymorphic types we've seen in the past:

- val p:'a sequence = [1, 2, 3];
val p = [1,2,3] : int sequence

Datatype basics

Because of the richness of ML's built-in types, you can get pretty far using only types and type synonyms. However, they do not allow you to express some important idioms.

Consider ML's built-in list type: it has two constructors, nil and cons. Suppose we want to define a binary tree data type --- this must also have two constructors: some base case for empty trees, and an inductive case for the nodes.

You might imagine a language that builds-in a special type for binary trees as well as lists. But then what about more general trees, or the myriad of other data that needs more than one case? Clearly, we would like a way to declare user-defined types with multiple cases; and ideally, our built-in list type would be only a special case of this one construct.

The solution in ML is the datatype construct:

datatype SBinaryTree =
    SEmpty
  | SNode of string * SBinaryTree * SBinaryTree;

This datatype declaration says that a BinaryTree is a type with two constructors:

To construct a value of this type, we apply the constructor to its argument type:

- val a = SEmpty;
val a = SEmpty : SBinaryTree
- val b = SNode ("hi", SEmpty, SEmpty);
val b = SNode ("hi",SEmpty,SEmpty) : SBinaryTree
- val c = SNode ("bye", a, b);
val c =
  SNode ("bye",SEmpty, SNode ("hi",SEmpty,SEmpty))
  : SBinaryTree
- val d = SNode (c, SNode("au revoir", SEmpty, SEmpty));
val d =
  SNode
    ("bonjour",
     SNode ("bye",SEmpty,SNode #),
     SNode ("au revoir",SEmpty,SEmpty))
  : SBinaryTree

(Notice that, as usual, ML truncates the echoed printout for deeply nested values.)

We have called SEmpty and SNode constructors for good reason --- you can treat them just like the built-in constructors, including using them for pattern matching:

case d of
   SEmpty => "()"
 | SNode (s, left, right) => "node: " ^ s;

Processing just one level of a recursive data structure isn't that interesting. Let's process the whole data structure using a recursive function and pattern matching:

fun flattenTree SEmpty = ""
  | flattenTree (SNode (s, left, right)) =
    "("
    ^ (flattenTree left)
    ^ "," ^ s ^ ","
    ^ (flattenTree right)
    ^ ")";

Just as with the list functions you've learned so far,

Datatypes from the ground up

In the previous section we've leapt right into several features of datatypes. In this section we'll build datatypes from the ground up, and go into more sophisticated features, including (perhaps most importantly) polymorphic datatypes.

The simplest possible datatype has one case and only a nullary (no-argument) constructor:

datatype Nothing = Nada;

This is pretty useless. A slightly more interesting use of datatypes has multiple cases, but only nullary constructors:

datatype Color = Red | Blue | Green;

This is an "enumerated list of values", much like an enum type in C/C++ (Java does not yet have a proper enumerated type --- in Java, you typically declare many public static final int constants instead).

Datatypes with cases are called union or sum types; unlike unions in C, however, ML unions are type-safe, because you must use a type case to access any particular case:

val c = Red;
val colorChar =
    case c of
        Red => "r"
      | Blue => "b"
      | Green => "g";

Datatype cases may optionally take an argument, which is the "data" stored in the value when it is "packed":

datatype StringOrInt =
     String of string
   | Int of int;

Finally, datatypes may be recursive, as we have already seen.

Datatypes are generative (branding)

Unlike type synonyms, each occurrence of a datatype declaration stands for a fresh type. Therefore, for example, the following two data types are not equivalent, and attempts to unify them will fail:

- datatype Dollar = Dol of int;
datatype Dollar = Dol of int
- datatype Euro = Eu of int;
datatype Euro = Eu of int
val d:Dollar = Eu 45;
stdIn:80.1-80.21 Error: pattern and expression
    in val dec don't agree [tycon mismatch]
  pattern:    Dollar
  expression:    Euro
  in declaration:
    d : Dollar = Eu 45

This property is quite useful, because it allows us to use the type system to prevent accidental clashes of types that are meant to be distinct. In fact, even occurrences of identical data types are not compatible:

datatype Dollar = Dol of int;
datatype Dollar = Dol of int
val tenDollars = Dol 10;
val tenDollars = Dol 10 : Dollar
datatype Dollar = Dol of int;
datatype Dollar = Dol of int
val tenDollars':Dollar = tenDollars;
stdIn:89.1-89.36 Error: pattern and expression
    in val dec don't agree [tycon mismatch]
  pattern:    Dollar
  expression:    ?.Dollar
  in declaration:
    tenDollars' : Dollar = tenDollars

We say therefore that are "generative" (each static declaration in the source text "generates" a new type), or that the type is freshly "branded" by the declaration.

In ML, the programmer therefore has a choice between generative datatypes, and non-generative type synonyms. Note that classes in Java are always generative --- each class declaration defines a fresh type, regardless of the presence of other classes with the same definition. C has both generative types (although only for structs) and type synonyms (typedef, which is more restricted than ML type synonyms).

Constructors are first-class

Note that constructors are applied much like functions --- by writing the argument after the constructor name. Hmmm, maybe they are functions?---

- SNode;
val it =
    fn : string * SBinaryTree * SBinaryTree -> SBinaryTree

These can be treated just like any other function value; consider our dollar datatype constructor:

- val f = Dol;
val f = fn : int -> Dollar
- map Dol [12, 24, 36];
val it = [Dol 12,Dol 24,Dol 36] : Dollar list

This is one feature of Standard ML that is not found in all ML dialects --- some ML dialects do not make constructors first-class. Thought question: if you wanted to do something like the above use of map, but in a language where constructors were not first-class, how would you do it?

Polymorphic datatypes

The tree type we've seen so far is limited in usefulness, because it only applies to tree elements. What if we wanted a binary tree with any kind of element at the root? We can do this by using polymorphic datatypes, whose syntax parallels that of polymorphic type synonyms:

datatype 'a BinaryTree =
         Empty
       | Node of ('a * 'a BinaryTree * 'a BinaryTree);

This datatype can then be instantiated implicitly, or explicitly with a type synonym or ascription:

val stringNode = Node ("hi", Empty, Empty);
val stringNode = Node ("hi",Empty,Empty) : string BinaryTree

fun leaf i = Node (i, Empty, Empty);
val leaf = fn : 'a -> 'a BinaryTree

type IntStrBinTree = (int * string) BinaryTree;
type IntStrBinTree = (int * string) BinaryTree

val i:int BinaryTree = Node (10, leaf 20, leaf 30);
val i =
  Node (10,Node (20,Empty,Empty),Node (30,Empty,Empty))
  : int BinaryTree

Lists are polymorphic datatypes

We can define a substitute for the built-in list polymorphic type as follows:

- datatype 'a List = Nil | Cons of 'a * 'a List;
datatype 'a List = Cons of 'a * 'a List | Nil
- val p = Nil;
val p = Nil : 'a List
val q = Cons (3, Cons(4, Cons(5, Nil)));
val q = Cons (3,Cons (4,Cons #)) : int List

In fact, the built-in list data type is really just a plain datatype 'a list with some syntactic sugar for the cons constructor:

datatype 'a list = nil | :: of 'a * 'a list;

Thought question: why is it a bad idea to type the above definition into the ML interpreter yourself? Hint: Consider the standard library functions and datatype generativity.

Putting it together: datatypes, functions, polymorphism

Tree map

Recursive datatypes and recursive functions naturally go together, as we have seen with lists and again with binary trees above. We can apply all the lessons we've learned from lists --- for example, consider the following function that maps each element of a string tree to a different string:

- fun reallyExcited Empty = Empty
  | reallyExcited (Node (s, left, right)) =
    Node (s ^ "!", reallyExcited left, reallyExcited right);
val reallyExcited = fn : string BinaryTree -> string BinaryTree

This is simply an instance of mapping over trees --- so we might want to write a more general function:

- fun treeMap _ Empty = Empty
  | treeMap f (Node (elem, left, right)) =
    Node (f elem, (treeMap f left), (treeMap f right));
val treeMap = fn : ('a -> 'b) -> 'a BinaryTree -> 'b BinaryTree
- fun reallyExcited' aTree =
    treeMap (fn s => s ^ "!") aTree;
val reallyExcited' = fn : string BinaryTree -> string BinaryTree

Thought exercise: Implement a "tree reduce" function, which takes a "base case" function that applies to the empty case, and a function that combines combination at interior nodes. Implement treeMap as an application of this tree reduction function.

Search trees

A binary tree with elements in random order is not very useful --- usually you want the elements to be sorted for efficient search. However, to write functions that, e.g., insert an element in sorted order into a binary tree, we need an element type for which some comparison function (which enables us to compare the element values) is defined.

Unfortunately, there's no way in the ML core language to state directly that a type variable can only be instantiated with types that have a comparison function. A 'a BinaryTree is instantiable with any type substituted for 'a, not just "comparable" types.

However, we do have first-class function values; so we can store a comparison function in the tree data structure for the type that we're interested in. Where will we get this function? Well, the client must provide this function when creating the tree. Observe:

datatype 'a BTNode =
         Empty
       | Node of 'a * 'a BTNode * 'a BTNode;

type 'a comparisonFn = ('a * 'a) -> bool;

datatype 'a BTree =
         Tree of {greaterThan:'a comparisonFn, root:'a BTNode};

fun leaf x = Node (x, Empty, Empty);

fun insert (Tree {greaterThan, root}, x) =
    let
        fun insertHelper Empty = leaf x
          | insertHelper (Node (y, left, right)) =
            if greaterThan(x, y) then
                Node (y, left, (insertHelper right))
            else
                Node (y, (insertHelper left), right)
    in
        Tree {greaterThan=greaterThan, root=insertHelper(root)}
    end;

When the client creates a tree instance, the client will be responsible for providing a comparison function, which will be stored in the tree data structure and used for all subsequent operations on the tree. For example:

- val aStringTree =
    Tree {greaterThan=op >,
          root=Node("hi", Empty, Empty)};
val aStringTree =
  Tree {greaterThan=fn,root=Node ("hi",Empty,Empty)}
  : string BTree

- type Point = {x:real, y:real};
type Point = {x:real, y:real}

- fun magnitude {x:real, y:real} = Math.sqrt(x*x + y*y);
val magnitude = fn : {x:real, y:real} -> real

- val aPointTree = 
    Tree {greaterThan=
           (fn (p1, p2) => (magnitude p1) > (magnitude p2)),
          root=Node({x=1.0, y=2.0}, Empty, Empty)};
val aPointTree =
  Tree {greaterThan=fn,root=Node ({x=#,y=#},Empty,Empty)}
  : {x:real, y:real} BTree

- val anotherPT = insert (aPointTree, {x=4.0, y=5.0});
val anotherPT =
  Tree {greaterThan=fn,root=Node ({x=#,y=#},Empty,Node #)}
  : {x:real, y:real} BTree

Suggested exercises

Exercise 1: Write some functions that print out parts of this search tree, to verify that the insertion functions work.

Exercise 2: Notice that the insertion function for these trees does not eliminate duplicates. How would you update the definition of BTree and insert so that it is possible to insert only non-duplicate values?

Exercise 3: Rather than storing the function in the tree structure, we could ask the client to pass the comparison function to every invocation of insert; in other words, insert could have the type:

('a BTree * 'a * comparisonFn) -> 'a BTree

Why would this generally be an inferior interface? What programming errors could arise if the library were implemented this way? Explain by implementing this version of the insertion function and showing an example.

Records of functions vs. objects?

More generally, you can have a type that potentially stores many functions over some of its components. If you store several functions in a record, along with an element type, the type begins to look rather like an object:

type Foo = {
  this:'a,
  hashCode:'a -> int,
  toString:'a -> string,
  ...
}

The correspondence (in this translation, at least) is not precise, because the "this" argument has to be passed manually by the client to the member functions. We'll return to contrasting these records-of-functions and objects in later lectures.

Mutual recursion

So far, we have not learned how to write mutually recursive functions or data types. Perhaps the most trivial example of a mutually recursive datatype is odd/even lists: suppose we wanted two types of lists, one with an odd number of elements and one with an even number of elements. It is not possible to construct an odd-list without an even-list, because the list of length zero (nil) is an even-list; conversely, it is impossible to construct a non-empty even-list without an odd-list.

Recursive declarations in ML must be declared together, using the and keyword:

datatype 'a EvenList = Empty
                     | Even of 'a * 'a OddList
and 'a OddList = Odd of 'a * 'a EvenList;

Here are values of int EvenList and int OddList type respectively:

val anEven = Even(6, Odd(7, Empty));
val anOdd = Odd(10, Even(11, Odd(12, anEven)));

Mutually recursive datatypes must generally be processed using mutually recursive functions:

fun evenLength (Empty) = 0
  | evenLength (Even(_, odd)) = 1 + oddLength(odd)
and oddLength (Odd(_, even)) = 1 + evenLength(even);

We'll see a less trivial example of a mutually recursive declaration when we discuss interpreters.

Separating the features of datatypes

From the previous sections, we have noted that ML datatypes actually join several distinct concepts:

You could imagine a language that separated these three features. In such a language, you'd have a union type construct, a recursive type construct, and a branding type construct. Each of these could then be independently and orthogonally applied, in much the same way that you can combine records and lists. For example, let's just make up some syntax for such a language, and wave our hands about the semantics:

OK, we've got some syntax; but getting the semantics to work out the way we'd like it to turns out to be rather tricky. For example, consider how you might write the type of binary trees of strings as follows (parens added to clarify order):

brand BTree = (rec BT = (string or (BT * BT))

But this doesn't work --- you cannot construct the value you want for the node case of BTree, because the BT occurrences inside the body of the rec do not refer to BTree --- recall that branded types are distinct from the underlying type. Consider the following code:

val a = BTree pack[1]("hi");
val b = BTree pack[1]("bye");
val c = BTree pack[2](a, b)

The bindings for a and b are values of type BTree. But the binding for c can't typecheck, because the BTree pack[2](a, b) expects type BT * BT, not BTree * BTree --- and these are not the same type, because branding distinguishes BT from BTree!

(Actually, it's really hard to make branding both correct and useful when it's a separate construct. The consensus seems to be that it's best integrated as a property of some other type construct. True language aficionados might want to read the section "How the Types Got Their Identity" in the book Systems Programming with Modula-3, edited by Greg Nelson.)

Because of complications like this, and because recursive datatypes are (basically) always used in combination with union types, ML made the language engineering decision to merge all these concepts into one language construct. This makes the language (arguably) less orthogonal, but much easier to use.