341 : 14 Feb 2002 : ML wrapup + project

Principles of module systems

In all of history, human beings have devised only one way of managing the complexity of the systems they construct: divide and conquer. The construction of software is no different. As the amount of code in a system grows, so does the need to break the system into small and manageable pieces that can be considered independently of the others.

To some extent, functions fill this decompositional role. When writing a function, the implementor does not need to think about the client; and when calling a function, the client does not need to worry about its implementation. Fundamentally, functions divide labor between the implementor and client. However, functions are too fine-grained to serve as the only structural unit of large-scale software.

Most reasonable languages therefore have a module system. Module systems, ideally, play several roles at the language level. Roughly in order of increasing complexity, these include the following:

Namespace management: The simplest kind of module system merely provides a disciplined way of separating declared names (data types, functions, global values) into logical groups.
Information hiding: Once you can split up names into distinct spaces, it's not a huge step to hide some of the names in a namespace from "outside view" by other modules.
Interface checking: Statically typed languages usually provide mechanisms for ensuring that modules interact with each other through published "contracts". The most common form of contract is an interface type: a module's published interface can include not only names but type information about those names. Some languages have more complex notions of "contract" as well, e.g. you could declare that something takes "only a non-null pointer to X" as a parameter.
Code reuse: If a module system is flexible enough, then the code in a a module can be reused in many different contexts. Some languages have higher-order modules: such modules construct other modules based on user-specified parameters, facilitating the reuse of code in many different type contexts. These "generic" modules have various names, and varying degrees of flexibility and power: ML has functors, Ada has generics, C++ has templates. Java and C# will both have support for some form of generics in the near future.

(Note: dynamically typed languages usually are flexible enough for the programmer to write higher-order modules manually. For example, in Scheme it's quite trivial to write code that manipulates other code in much the same way that a functor does; the difference is that this code cannot be statically type checked.)

Modules also have several implications at the implementation level. For example, they could be used to facilitate separate compilation, or dynamic loading of code (e.g., plugins/applets).

Modules: ML vs. the world

ML's modules provide all of the above benefits:

Structures partition the namespace, providing namespace management.
When combined with signatures, structures enable information hiding. Only names declared in a structure's signature are visible to clients of that structure.
The signature of a structure is its type, and includes type information about all its members. Clients of a structure are forced to obey standard ML typing rules when using structure members.
Functors are a form of higher-order module: they take other modules (structures) as arguments, and construct modules dynamically. Functors can also be type checked independently of their clients.

Many languages, by contrast, have module systems that are much weaker:

Perl and Python's module systems have only namespace management, and none of the other features (not even information hiding!).
The Squeak dialect of Smalltalk, which we'll be studying in this course, has no module system to speak of. Instead, Squeak relies on the programming environment to help the programmer manage complexity.
C++ uses classes to play some of the roles of a module---i.e., interface checking and information hiding. However, classes are too fine-grained to serve as proper modules, so C++ has a separate namespace construct for namespace management. C++ namespaces have no disciplined form of information hiding---it's based on "whatever happens to be visible in the header file you include". Also, C++'s system for generics (templates) is quite primitive and does not allow templates to be typechecked independently from their clients.
In Java, classes play the same role in modularity as in C++, but again classes are too fine-grained to serve as the sole mechanism for modularity. Therefore, Java has a notion of "packages", which are modules that perform only namespace management and a limited form of information hiding.

One feature that the Standard ML module system does not have is inheritance. This form of code reuse is quite different from functors: whereas a functor allows a module to be parameterized by other modules, inheritance allows a module to inherit code from a parent module. They may seem totally different, but both involve the notion of an "incomplete" module that relies upon another module for part of its implementation.

For example, a functor can be viewed as an "incomplete" module that is "completed" by its application to its module parameters. Similarly, a class can be viewed as an "incomplete" module that is "completed" by the addition of its subclass/superclass code. Both mechanisms facilitate code reuse---the functors and classes can be "completed" by many other pieces of code, and hence reused---but in quite different ways.

A few final features of ML

Mutual recursion

(* Two mutually recursive datatypes.  We use the 'and' keyword instead
   of redeclaring 'datatype'.  These are the lists with an even number of
   elements and odd number of elements, respectively. *)
datatype 'a EvenList = Empty
                     | Even of 'a * 'a OddList
     and 'a OddList = Odd of 'a * 'a EvenList;

(* Here's the result of entering some data of the above types into the
   SML interpreter... notice that we can't construct an EvenList of
   length 1! *)

- Empty;
val it = Empty : 'a EvenList

- Even(1, Odd(0, Empty));
val it = Even (1,Odd (0,Empty)) : int EvenList

- Even(2, Empty);
stdIn:22.1-22.15 Error: operator and operand don't agree [tycon mismatch]
  operator domain: int * int OddList
  operand:         int * 'Z EvenList
  in expression:
    Even (2,Empty)


(* To write well-behaved functions over these types, we need mutually
   recursive functions.  Again, we use the 'and' keyword instead of
   'fun'. *)
fun evenLength(Empty) = 0
  | evenLength(Even(_, rest)) = oddLength(rest) + 1
and oddLength(Odd(_, rest)) = evenLength(rest) + 1;

(* A use... *)
- evenLength(Even(1,Odd(2,Even(3,Odd(4,Empty)))));
val it = 4 : int

Currying

Named after the mathematician Haskell Curry, "currying" is a way of defining functions of multiple arguments in terms of higher-order functions, each of which takes a single argument and returns a higher order function.

The key insight of currying: Any function of type

a * b -> c

can also be phrased as a function of type

a -> b -> c

Consider normal addition:

(* Ordinary add *)
val add = fn (x, y) => x + y;
val foo = add(3, 4);

(* Manually curried add, and its application *)
val handCurriedAdd = fn (x) => fn(y) => x + y;
val addFive   = handCurriedAdd 5;       (* adds 5 to any int *)
val eleven    = addFive 6;              (* applying addFive *)
val seventeen = (handCurriedAdd 8) 9;   (* applying both at once *)
val twenty    = handCurriedAdd 10 10;   (* application is left-associative *)

(* ML provides a nice syntax for currying: instead of having a tuple
   of arguments, have a space-delimited list.  This is automatically
   curried addition: *)
fun curriedAdd x y = x + y;

Why curry? Well, it's nice to give your clients the freedom to partially apply your function and "save the result for later". The SML standard library uses currying heavily.

cse341-webmaster@cs.washington.edu

Last modified: Wed Feb 13 20:13:51 PST 2002