In the preceding chapter we considered the following signature of dictionaries with an arbitrary key type:
signature DICT = sig
type key
val lt : key * key -> bool
val eq : key * key -> bool
type 'a dict
val empty : 'a dict
val insert : 'a dict * key * 'a -> 'a dict
val lookup : 'a dict * key -> 'a
end
The signatures of dictionaries with particular choices of key type were defined using
the "where type
" construct. For example, the signature
declarations
signature STRING_DICT = DICT where type key=string
signature INT_DICT = DICT where type key=int
define the signatures of dictionaries with string and integer keys, respectively.
The motivation for introducing these specialized instances of the DICT
signature is that we typically wish to hold the implementation type, 'a dict
,
of dictionaries abstract, but leave the type of keys concrete, as described earlier.
The signature DICT
is a bit unsatisfactory because it mixes two different
notions in one interface, namely the type, key
, of keys and its associated
comparison operations, lt
and eq
, and the type 'a dict
of dictionaries and its associated operations empty
, insert
, and
lookup
. It would be cleaner to separate these two aspects of the
interface, especially since we shall soon consider the key component to be
"generic", with the rest being "specific", to the abstraction.
The way to do this in ML is with a substructure, as follows:
signature DICT = sig
structure Key : ORDERED
type 'a dict
val empty : 'a dict
val insert : 'a dict * Key.t * 'a -> 'a dict
val lookup : 'a dict * Key.t -> 'a
end
The type of keys and the operation on it are segregated into a substructure of
the dictionary structure, a component of a structure that is itself a structure.
Correspondingly, uses of the type key
are replaced by references to the t
component of the substructure Key
. This leads to a hierarchical
organization in which we consider the key structure to be subservient to the dictionary
operations.
Specialized versions of the signature DICT are build essentially as before, except that we use a long identifier to specify the type of keys:
signature STRING_DICT = DICT where type Key.t=string
signature INT_DICT = DICT where type Key.t=int
Specific implementations of these specialized instances may be defined as follows:
structure StringDict :> STRING_DICT = struct
structure Key : ORDERED = StringLT
type 'a dict = Key.t BinTree.tree
val empty = BinTree.empty
val insert = ...insert into a BST using Key.lt and Key.eq...
val lookup = ...lookup in a BST using Key.lt and Key.eq...
end
structure IntDict :> INT_DICT = struct
structure Key : ORDERED = IntLT
type 'a dict = Key.t BinTree.tree
val empty = BinTree.empty
val insert = ...insert into a BST using Key.lt and Key.eq...
val lookup = ...lookup in a BST using Key.lt and Key.eq...
end
The difficulty, of course, is that we are repeating the code for dictionaries in each implementation; the elided parts of both structures would be identical. The only difference between the two dictionary structures lies in the implementation of keys; in one case we choose string operations and in the other we choose integer operations. Since the bulk of the code is the same, it is a pity to have to repeat it for each choice of key type.
Fortunately, ML provides a convenient means of avoiding such redundancy, called a functor. A functor is a parameterized module, or a generic structure, that is defined in terms of zero or more argument structures with a specified signature. A functor may be applied, or instaniated, with any structures matching the argument signatures. A functor is therefore a kind of function taking zero or more structures as arguments and yielding a structure as result.
In the case of dictionaries we may define a generic implementation that is parameterized by the type of keys and associated comparison operations. This is achieved by introducing a functor.
functor Dict (structure K : ORDERED) :> DICT where type Key.t=K.t =
struct
structure Key : ORDERED = K
type 'a dict = Key.t BinTree.tree
val empty = BinTree.empty
val insert = ...insert into a BST using Key.lt and Key.eq...
val lookup = ...lookup in a BST using Key.lt and Key.eq...
end
This declaration introduces a functor named Dict
that takes as argument
any structure implementing the signature ORDERED
, and yields a structure
implementing the instance of the signature DICT
determined by taking the key
type of the dictionary to be the type component of its argument, leaving the type of
dictionaries abstract. The type checker ensures that the body of the functor matches
the specified result signature, under the assumption that the argument has the stated
signature. In the case of the Dict
functor the type checker ensures
that the principal signature of the body of the functor (the part between struct
and end
) matches the signature
DICT where type Key.t=K.t,
assuming that the structure K
has signature ORDERED
.
The Dict
functor encapsulates the implementation of dictionaries as a
generic structure that is independent of the specific choice of keys. One advantage
of this encapsulation is that should we wish to modify the implementation of dictionaries,
say to fix an error or to improve performance, we need only modify the Dict
functor, rather than change every occurrence of the dictionary code spread throughout a
large system. This is obviously advantageous for both the original author of the
code, and anyone who must maintain it in the future. In fact common data structures
such as dictionaries are typically provided as part of a "shrink wrapped"
library, and hence are shared among many different programs, thereby increasing code reuse
and reducing redundancy.
The Dict
functor provides a generic implementation of dictionaries.
Dictionaries with specific key types may be built by instantiating the Dict
functor as follows:
structure IntDict = Dict (structure K = IntLT)
structure StringDict = Dict (structure K = StringLT)
Notice that functor application uses keyword parameter passing --- the parameter is explicitly bound to a structure using a structure binding. In practice the right-hand sides of such bindings are always (long) identifiers; if not, the compiler implicitly inserts bindings to ensure that this is the case. In our discussions we will tacitly assume that the right-hand side of all such bindings are (long) identifiers.
What are the signatures of the structure variables IntDict
and StringDict
?
Since no signature is ascribed to these bindings, the principal signature of the
corresponding right-hand side of the binding is assigned to each variable, in keeping with
our previous policies. Since the right-hand side in these examples is a functor
application, we must answer the question: what is the principal signature of a functor
application? If --- as here --- the result signature of the functor is opaque, the
principal signature is precisely the asribed signature of the functor, but with the
structure parameter replaced by its binding (which must be, by our assumption, another
structure identifier). Thus the signature assigned to IntDict
is
DICT where type Key.t=IntLT.t
which is equivalent to the signature
DICT where type Key.t=int
since IntLt.t
= int
. Similarly, the signature assigned
to StringDict
is
DICT where type Key.t=StringLT.t
which is equivalent to the signature
DICT where type Key.t=string
What if the functor has no result signature, or its result signature is transparently ascribed? In that case we assign the intermediate signature of the match as the result signature of the functor, and use that signature as the implied result signature of the functor.
Dictionaries illustrate the use of the ML module system to build generic
implementations of abstract types. A generic implementation of priority queues
(which support a remove_min
operation that dequeues the "least"
element of the queue relative to a specified ordering) may be built in an exactly
analogous manner. Here's a suitable signature of priority queues:
signature PRIO_QUEUE = sig
structure Elt : ORDERED
type prio_queue
exception Empty
val empty : prio_queue
val insert : Elt.t * prio_queue -> prio_queue
val remove : prio_queue -> Elt.t * prio_queue
end
Notice that prio_queue
is a type, and not a type constructor, as it was in
the case of "plain" queues. This is a reflection of the fact that the
operations on a priority queue are not independent of the type of elements (as they are
with plain queues), but rely on the comparison operations that are provided with the Elt
structure.
A generic implementation of priority queues is a functor taking as argument a structure containing the element type together with its associated operations:
functor PrioQueue
(structure E : ORDERED) :> PRIO_QUEUE where type Elt.t=E.t =
struct
structure Elt : ORDERED = E
type prio_queue = ...a heap based on the ordering Elt.lt...
exception Empty
val empty = ...the empty heap...
val insert = ...sift a new element into the heap...
val remove = ...remove the least element and adjust the heap...
end
Specific instances of priority queues may be built as follows:
structure IntPQ = PrioQueue (structure E = IntLT)
structure StringPQ = PrioQueue (structure E = StringLT)
with signatures
PRIO_QUEUE where type Elt.t=int
and
PRIO_QUEUE where type Elt.t=string
respectively.
The situation becomes more interesting when we wish to combine two or more abstract types to form a third. Suppose we are to implement a (hypothetical) abstract type that employs an ordered type of values that occur both as keys of a dictionary and elements of a priority queue. The signature of this abstract type might look like this
signature ADT = sig
structure Val : ORDERED
type adt
...operations...
end
The implementation should be generic in the type of values, and also in the
implementation of dictionaries and priority queues; we don't want to build the
implementation of these auxiliary data structures into the implementation of ADT
's.
There are two approaches to building an Adt
functor, each with its
advantages and disadvantages. Here's the first approach:
functor Adt
(structure V : ORDERED) :> ADT where type Val.t=V.t =
struct
structure Val : ORDERED = V
structure D = Dict (structure K = V)
structure Q = PrioQueue (structure E = V)
type adt = ...
...
end
The functor Adt
instantiates the Dict
and PrioQueue
functors to the structure of values specified as argument to the Adt
functor.
This ensures that the type equation
D.Key.t = Q.Elt.t = V.t
holds inside the body of the functor, so that expressions such as
D.insert (Q.remove_min ..., ...)
are well-typed. (The structures D
and Q
are not visible
outside of the functor since they do not appear in the result signature; they are local
auxiliaries used within the functor.)
This approach works well, but if the Dict
or PrioQueue
functors are changed, the Adt
functor must be recompiled to pick up the new
versions. An alternative, which avoids this dependency of the implementation of Adt
on the implementations of the Dict
and PrioQueue
functors, is to
treat the dictionary and priority queue structures as additional parameters to the Adt
functor. This leads to the following setup:
functor Adt'
(structure V : ORDERED and D : DICT and Q : PRIO_QUEUE) :>
ADT where type Val.t=V.t =
struct
structure Val = V
type adt = ...implementation type...
...implementation of operations...
end
To build an instance of the Adt'
functor we must first built appropriate
instances of the Dict
and PrioQueue
functors and pass these to Adt'
:
structure IntDict = Dict (structure K=IntLT)
structure IntPQ = Dict (structure K=IntLT)
structure A = Adt' (structure V=IntLt and D=IntDict and Q=IntPQ)
There is a problem, however, with this setup: the functor Adt'
is
ill-typed! It is no longer true within the body of Adt
that the type
equation
D.Key.t = Q.Elt.t = V.t
holds in the body of Adt'
, even though the equation
IntDict.Key.t = IntPQ.Elt.t = IntLT.t = int
does hold of the arguments, for we might well choose arguments for which the required equation is invalid. In short, the functor is "too generic", and consequently the body is not type correct.
What to do? The solution is to restrict the parameters to the Adt'
functor so that the only possible instances are those that satisfy the required equation.
There are two methods for doing this, both equivalent. The first is to
explicitly require that the dictionary and priority queue arguments agree on the value
type passed as parameter:
functor Adt'
(structure V : ORDERED
and D : DICT where type Key.t=V.t
and Q : PRIO_QUEUE where type Elt.t=V.t) :>
ADT where type Val.t=V.t =
struct
...
end
The body of Adt'
is now type correct since the required type equations
hold as a result of our additional assumptions on the arguments.
An alternative is to impose the equational requirement on types in a post hoc manner using a sharing specification:
functor Adt'
(structure V : ORDERED and D : DICT and Q : PRIO_QUEUE
sharing type D.Key.t = Q.Elt.t = V.t) :>
ADT where type Val.t=V.t =
struct
...
end
The sharing specification stipulates that the given equation must hold of any instance
of this functor. Any attempt to instantiate Adt'
with structures V
,
D
, and Q
not satisfying the sharing specification is rejected as
ill-formed.
An advantage of sharing specifications is that they provide a direct, symmetric
specification of the required type equation without forcing the programmer to explicitly
"thread" the common type through the various signatures. In fact sharing
specifications encourage concision since they do not require that the common component be
"factored out" as it is in the foregoing example. Here is a more concise
formulation of the Adt'
functor in which we drop the first argument entirely,
relying only on a sharing
specification to constraint the dictionary and
priority queue structures appropriately.
functor Adt'
(structure D : DICT and Q : PRIO_QUEUE
sharing type D.Key.t = Q.Elt.t) :>
ADT where type Val.t=D.Key.t =
struct
...
end
Notice that the result signature changes slightly to extract the common type from one
of the parameters, the choice of which being arbitrary in the presence of the sharing
specification.