It is good practice to ascribe a signature to every structure binding in a program to ensure that the signature of the bound structure variable is apparent from the binding. In the preceding chapter we described the elaboration and evaluation of a structure binding with an explicit signature ascription. First the ascribed signature is used to determine a view of the principal signature of the right-hand side of the binding, then the view is checked to ensure that it verifies the type sharing requirements of the ascribed signature. If both steps succeed, we assign a signature to the bound structure variable according to whether it is a transparent or opaque ascription --- if it is transparent, we assign the view to the variable, otherwise the ascription. Thus transparent ascription is used to form views of a structure, and opaque ascription is used to form abstractions in which critical type information is hidden from the rest of the program.
The formation of a view also has significance at run-time: a new structure is built consisting of only those components of the right-hand side of the binding mentioned in the ascribed signature, perhaps augmented by zero or more type components to ensure that the signature of the view is well-formed. (For example, if we attempt to extract only the constructors of a datatype, and not the datatype itself, the compiler will implicitly extract the datatype to ensure that the types of the constructors are expressible in the signature. Any type implicitly included in the view is marked as "hidden" to indicate that it was implicitly included as a consequence of the explicit inclusion of some other components of the structure.) Moreover, the types of polymorphic value components may be specialized in the view, corresponding to a form of polymorphic instantiation during signature matching. The result is a structure whose shape is fully determined by the view; no "junk" remains after the ascription. This ensures that access to the components of a structure is efficient (constant-time), and that there are no "space leaks" stemming from the presence of components of a structure that are not mentioned in its signature.
In this chapter we discuss the trade-off's between using views and abstraction in ML by offering some guidelines and examples of their use in practice. How does one decide whether to use transparent or opaque ascription? Generally speaking, transparent ascription is appropriate if the signature is not intended to be exhaustive, but is rather just a specification of some minimum requirements that a module must satisfy. Opaque ascription is appropriate if the signature is intended to be exhaustive, specifying precisely the operations that are available on the type.
Here's a common example of the use of transparent ascription in a program. When
defining a module it is often convenient to introduce a number of auxiliary bindings,
especially of "helper functions" that are used internally to the code of the
"public" operations. Since these auxiliaries are not intended to be used
by clients of the module, it is good practice to localize them to the implementation of
the public operations. This can be achieved by using the local
construct, as previously discussed in these notes. An alternative is to define the
auxiliaries as components of the stucture, relying on transparent ascription to drop the
auxiliary components before exporting the public components to clients of the
module. Thus we might write something like this:
structure IntListOrd : ORDERED =
struct
type t = int list
fun aux l = ...
val lt (l1, l2) = ... aux ...
val eq (l1, l2) = ... aux ...
end
The effect of the signature ascription is to drop the auxiliary component aux
from the structure during signature matching so that afterwards the binding of IntListOrd
contains only the components in the signature ORDERED
. An added bonus
of this style of programming is that during debugging and testing we may gain access to
the auxiliary by simply "commenting out" the ascription by writing instead
structure IntListOrd (* : ORDERED *) =
struct
type t = int list
fun aux l = ...
val lt (l1, l2) = ... aux ...
val eq (l1, l2) = ... aux ...
end
Since the ascription has been suppressed, the auxiliary component IntListOrd.aux
is accessible for testing. (It would be useful to have a compiler switch that
"turns off" signature ascription, rather than having to manually comment out
each ascription in the program, but no current compilers support such a feature.)
Now let us consider uses of opaque ascription by reconsidering the implementation of
persistent queues using pairs of lists. Here it makes sense to use opaque ascription
since the operations specified in the signature are intended to be exhaustive --- the only
way to create and manipulate queues is to use the operations empty
, insert
,
and remove
. By using opaque signature matching in the declaration of
the Queue
structure, we ensure that the type Queue.queue
is
hidden from the client. Consequently an expression such as Queue.insert (1,
([],[]))
is ill-typed, even though queues are "really" pairs of lists,
because the type 'a list * 'a list
is not equivalent to 'a Queue.queue
.
Were we to use transparent ascription this equation would hold, which means that
the client would not be constrained to using only the "official" queue
operations on values of type 'a Queue.queue
. This violates the
principle of data abstraction, which states that an abstract type should be
completely defined by the operations that may be performed on it.
Why impose such a restriction? One reason is that it ensures that the client of an abstraction is insensitive to changes in the implementation of the abstraction. Should the client's behavior change as a result of a change of implementation of an abstract type, we know right where to look for the error: it can only be because of an error in the implementation of the operations of the type. Were abstraction not enforced, the client might (accidentally or deliberately) rely on the implementation details of the abstraction, and would therefore need to be modified whenever the implementation of the abstraction changes. Whenever such coupling can be avoided, it is desirable to do so, since it allows components of a program to be managed independently of one another.
A closely related reason to employ data abstraction is that it enables us to enforce
representation invariants on a data structure. More precisely, it enables us to
isolate any violations of a representation invariant to the implementation of the
abstraction itself. No client code can disrupt the invariant if abstraction is
enforced. For example, suppose that we are implementing a dictionary package using a
binary search tree. The implementation might be defined in terms of a library of
operations for manipulating generic binary trees called BinTree
. The
implementation of the dictionary might look like this:
structure Dict :> STRING_DICT =
struct
(* Rep Invariant: binary search tree *)
type t = string BinTree.tree
fun insert (k, t) = ...
fun lookup k = ...
end
Had we used transparent, rather than opaque, ascription of the STRING_DICT
signature to the Dict
structure, the type Dict.t
would be known
to clients to be string BinTree.tree
. But then one could call Dict.lookup
with any value of type string BinTree.tree
, not just one that satisfies the
representation invariant governing binary search trees (namely, that the strings at the
nodes descending from the left child of a node are smaller than those at the node, and
those at nodes descending from the right child are larger than those at the node).
By using opaque ascription we are isolating the implementation type to the Dict
package, which means that the only possible violations of the representation invariant are
those that arise from errors in the Dict
package itself; the invariant cannot
be disrupted by any other means. The operations themselves may assume that
the representation invariant holds whenever the function is called, and are obliged to ensure
that the representation invariant holds whenever a value of the representation type is
returned. Therefore any combination of calls to these operations yielding a value of
type Dict.t
must satisfy the invariant.
You might wonder whether we could equally well use run-time checks to enforce representation invariants. The idea would be to introduce a "debug flag" that, when set, causes the operations of the dictionary to check that the representation invariant holds of their arguments and results. In the case of a binary search tree this is surely possible, but at considerable expense since the time required to check the binary search tree invariant is proportional to the size of the binary search tree itself, whereas an insert (for example) can be performed in logarithmic time. But wouldn't we turn off the debug flag before shipping the production copy of the code? Yes, indeed, but then the benefits of checking are lost for the code we care about most! (To paraphrase Tony Hoare, it's as if we used our life jackets while learning to sail on a pond, then tossed them away when we set out to sea.) By using the type system to enforce abstraction, we can confine the possible violations of the representation invariant to the dictionary package itself, and, moreover, we need not turn off the check for production code because there is no run-time penalty for doing so.
A more subtle point is that it may not always be possible to enforce data abstraction at run-time. Efficiency considerations aside, you might think that we can always replace static localization of representation errors by dynamic checks for violations of them. But this is false! One reason is that the representation invariant might not be computable. As an example, consider an abstract type of total functions on the integers, those that are guaranteed to terminate when called, without performing any I/O or having any other computational effect. It is a theorem of recursion theory that no run-time check can be defined that ensures that a given integer-valued function is total. Yet we can define an abstract type of total functions that, while not admitting ever possible total function on the integers as values, provides a useful set of such functions as elements of a structure. By using these specified operations to create a total function, we are in effect encoding a proof of totality in the code itself.
Here's a sketch of such a package:
signature TIF = sig
type tif
val apply : tif -> (int -> int)
val id : tif
val compose : tif * tif -> tif
val double : tif
...
end
structure Tif :> TIF = struct
type tif = int->int
fun apply t n = t n
fun id x = x
fun compose (f, g) = f o g
fun double x = 2 * x
...
end
Should the application of such some value of type Tif.tif
fail to
terminate, we know where to look for the error. No run-time check can assure us that
an arbitrary integer function is in fact total.
Another reason why a run-time checkto enforce data abstraction is impossible is that it
may not be possible to tell from looking at a given value whether or not it is a
legitimate value of the abstact type. Here's an example. In many operating
systems processes are "named" by integer-value process identifiers. Using
the process identifier we may send messages to the process, cause it to terminate, or
perform any number of other operations on it. The thing to notice here is that any
integer at all is a possible process identifier; we cannot tell by looking at the integer
whether it is indeed valid. No run-time check on the value will reveal whether a
given integer is a "real" or "bogus" process identifier.
The only way to know is to consider the "history" of how that integer came into
being, and what operations were performed on it. Using the abstraction mechanisms
just described, we can enforce the requirement that a value of type pid
,
whose underlying representation is int
, is indeed a process identifier.
You are invited to imagine how this might be achieved in ML.
Transparency and opacity may seem, at first glance, to be fundamentally opposed to one another. But in fact transparency is special case of opacity! By using type definitions in signatures, we may always express explicitly the propagation of type information that is conveyed implicitly by transparent ascription. For example, rather than write
structure IntLT : ORDERED = struct type t=int ... end
we may instead write
structure IntLT :> INT_ORDERED = struct type t=int ... end
at the expense of introducing a specialized version of the signature ORDERED
with the type t
defined to be int
. This syntactic
inconvenience can be ameliorated by using the "where type
"
construct, writing
structure IntLT :> ORDERED where type t=int = struct ... end
The signature expression "ORDERED where type t=int
" is
equivalent to the signature INT_ORDERED
defined above.
Thus transparency is a form of opacity in which we happen to publicize the identity of the underlying types in the ascribed signature. This observation is more important than one might think at first glance. The reason is that it is often the case that we must use a combination of opacity and transparency in a given situation. Here's an example. Suppose that we wished to implement several dictionary packages that differ in the type of keys. The "generic" signature of a dictionary might look like this:
signature DICT = sig
type key
val lt : key * key -> bool
val eq : key * key -> bool
type 'a dict
val empty : 'a dict
val insert : 'a dict * key * 'a -> 'a dict
val lookup : 'a dict * key -> 'a
end
Notice that we include a type component for the keys, together with operations for comparing them, along with the type of dictionaries itself and the operations on it. Now consider the definition of an integer dictionary module, one whose keys are integers ordered in the usual manner. We might use a declaration like this:
structure IntDict :> DICT = struct
type key = int
val lt : key * key -> bool = (op <)
val eq : key * key -> bool = (op =)
datatype 'a dict = Empty | Node of 'a dict * 'a * 'a dict
val empty = Empty
fun insert (d, k, e) = ...
fun lookup (d, k) = ...
end
But this doesn't work as intended! The reason is that the opaque ascription, which is intended to hide the implementation type of the abstraction, also obscures the type of keys. Since the only operations on keys in the signature are the comparison functions, we can never insert an element into the dictionary!
What is necessary is to introduce a specialized version of the DICT
signature in which we publicize the identity of the key
type, as follows:
signature INT_DICT = DICT where type key = int
structure IntDict :> INT_DICT = struct
type key = int
val lt : key * key -> bool = (op <)
val eq : key * key -> bool = (op =)
datatype 'a dict = Empty | Node of 'a dict * 'a * 'a dict
val empty = Empty
fun insert (d, k, e) = ...
fun lookup (d, k) = ...
end
With this declaration the type 'a IntDict.dict
is abstract, but the type IntDict.key
is equivalent to int
. Thus we may correctly write IntDict.insert
(IntDict.empty, 1, "1")
to insert the value "1"
into the empty dictionary with key 1
. To build a dictionary whose keys
are strings, we proceed similarly:
signature STRING_DICT = DICT where type key = string
structure StringDict :> STRING_DICT = struct
type key = string
val lt : key * key -> bool = (op <)
val eq : key * key -> bool = (op =)
datatype 'a dict = Empty | Node of 'a dict * 'a * 'a dict
val empty = Empty
fun insert (d, k, e) = ...
fun lookup (d, k) = ...
end
In the next two chapters we will discuss how to build a generic implementation of dictionaries that may be instantiated for many different choices of key type.