[ ^ CSE 341, Winter 2004 home page | Lectures index ]

CSE 341: Connections between OOP and FP

Anonymous inner classes in Java

In Scheme, anonymous functions can be used to implement objects. Conversely, in Smalltalk, we found that a BlockContext object was used to implement anonymous functions.

Likewise, Java has no anonymous functions, but it does have objects. It turns out that you can use anonymous inner classes to do some of the same things that anonymous functions are traditionally used for --- which should not be surprising, since functions and objects both bundle state (local variables/parameters) and behavior (code).

The syntax for Java anonymous inner classes is as follows:

new ClassOrInterfaceName() { classBody }

This is an expression that constructs an instance of an anonymous class that subclasses/subtypes ClassOrInterfaceName.

For example, Java defines the built-in Iterator interface, which abstracts sequential iteration over a series of values:

public interface Iterator {
    public boolean hasNext();
    public Object next();
    public void remove();
}

To define an anonymous inner class that meets the Iterator type, and provides iteration over the integers from 0 to 9, you could write the following:

new Iterator() {
    private int i = 0;
    public boolean hasNext() { i < 10; }
    public Object next() {
        Integer retval = new Integer(i);
        ++i;
        return i;
    }
    public void remove() {
      throw new UnsupportedOperationException();
    }
}

Since anonymous inner classes are expressions, you can do whatever you do with any other expressions, e.g. assign one to local variables:

Iterator anIterator = new Iterator() {
    private int i;
    public boolean hasNext() { i < 10; }
    ... /* as above */
};
while (anIterator.hasNext()) {
    System.out.println(anIterator.next());
}

Like anonymous functions in "good" languages, Java anonymous classes are lexically scoped --- their bodies have access to names in the enclosing scope, including:

class variables
instance variables
the enclosing method's final local variables and parameters.

The restriction in the third set of names --- the fact that you can only access final locals --- is due to quirks in the Java language design which we won't discuss further here.

The Iterator example here is contrived, but there are more useful examples...

Callbacks with anonymous inner classes

Callbacks are a fundamental programming idiom. A callback interface is one in which the client of a library passes a pointer to some code, which the library will "call back" when some event occurs.

For example:

Many GUIs provide a button class; clients might register a callback specifying code to execute when the button is clicked (whenever that is).
The standard C library has the atexit callback; clients can register code to be executed right before the current process terminates (whenever that is).
Libraries based on the SAX XML standard provide traversal for trees of XML data; the client registers code to be called when the SAX parser encounters element nodes, text nodes, etc.

Callbacks are inherently higher-order: the interface to register callbacks has to accept a function, i.e., the code to be called back. Anonymous functions and callbacks go especially well together; in Java, you commonly use anonymous inner classes instead.

In the Java Swing GUI library, the java.awt.event.MouseListener interface describes "any object that can listen to mouse click events":

public interface MouseListener {
    void mouseClicked(MouseEvent e);
    void mouseEntered(MouseEvent e);
    void mouseExited(MouseEvent e);
    void mousePressed(MouseEvent e);
    void mouseReleased(MouseEvent e);
}

The java.awt.event.MouseAdapter class defines a default implementation of this interface that does nothing:

public abstract class MouseAdapter {
    public void mouseClicked(MouseEvent e) {}
    public void mouseEntered(MouseEvent e) {}
    public void mouseExited(MouseEvent e) {}
    public void mousePressed(MouseEvent e) {}
    public void mouseReleased(MouseEvent e) {}
}

Now, in order to register interest in a component's mouse events, you can use an anonymous inner class:

final String buttonLabel = "A button";
JComponent c = new JButton(buttonLabel);
c.addMouseListener(new MouseAdapter() {
    public void mouseEntered(MouseEvent e) {
        System.out.println("Mouse entry event: " + e
                           + " on button: " + buttonLabel);
    }
})

This produces an instance of an anonymous class that inherits from MouseAdapter, overriding the mouseEntered(MouseEvent) method. This instance is then registered using the addMouseListener callback interface. The system will call this class back with a MouseEvent object whenever the mouse enters the button's screen area.

Notice that the body of the anonymous inner class uses the name buttonLabel from the surrounding lexical context.

Functional programming with Java collections and anonymous classes

The Java libraries could have been (but were not) designed to encourage functional programming, but it would be easy enough to add it. Java has a built-in interface for collections, java.util.Collection:

public interface Collection {
    void add(Object o);  // Adds an object to this collection
    void clear();        // Empties the collection
    Iterator iterator(); // Returns iterator for the collection
    int size();          // Returns number of objects in collection
    ... // other methods
}

This interface is OK, but doesn't encourage higher-order programming; there's no equivalents for do or filter in the style of functional languages (by contrast, Smalltalk collections have do: and collect:, which take blocks that operate on the elements). So, how could we design a collections library to support a "higher-order" style of programming?

First, define an interface for "function objects":

public interface Function {
    Object apply(Object argument);
}

You can implement Function using named classes...

public class AddPeriod implements Function {
    public Object apply(Object argument) { return ((String)argument) + "."; }
}

...or anonymous classes:

String hi = "hello";
String hiWorld = (new Function() {
    public Object apply(Object o) { return ((String)argument) + "."; }
}).apply(hi);

(Aside: notice that we have to cast the argument, because we only know that it's of type Object. For now, we won't worry about accurate static typing for arguments or results --- all functions take and return Object. We'll need bounded parametric polymorphism, in the style of Pizza, to fix this.)

Now, define the HigherOrderCollection interface:

public interface HigherOrderCollection {
    /** Add an object to this collection */
    void add(Object o);

    /** Apply f to each element of this collection. */
    void doEach(Function f);

    /** Apply f to each element in this collection, and add the result
        to target. */
    void map(HigherOrderCollection target, Function f);

    /** Add all elements satisfying pred into the target collection. */
    void filter(HigherOrderCollection target, Function pred);

    ... // other operations
}

Aside: There are some minor differences between this and the higher-order collection functions we've seen before --- most importantly, the use of a target parameter to map and filter. Our reasoning for this design decision is as follows

ML map and filter return a fresh list because they only operate over lists. However, HigherOrderCollection is an interface for many kinds of collections --- e.g., lists, trees, and sets.
Smalltalk methods like collect: solve this problem by returning a collection that is a clone of the original collection. This isn't appropriate for map, which may return elements of different kinds, necessitating different properties for the target collection than the source collection. For example, mapping a binary search tree of integers to a binary search tree of strings will generally require a different comparison criterion.
Therefore, having the user pass the target collection as an argument allows greater flexibility: the user gets to choose the class and instance of the result collection.

The following defines the HOList class, which meets the above interface:

public class HOList implements HigherOrderCollection {

   /** Helper class for list nodes */
   private class Link {
      Object value;
      Link next;
      Link(Object value, Link next)
          { this.value = value; this.next = next; }
   }

   private Link head;

   public HOList() { this.head = null; }

   public void add(Object o) { this.head = new Link(o, this.head); }

   public void doEach(final Function f) {
      for (Link current = head; current != null; current = current.next) {
          f.apply(current.value);
      }
   }

   public void map(final HigherOrderCollection target, final Function f) {
      this.doEach(new Function() {
         public Object apply(Object o) {
            target.add(f.apply(o));
            return null;
         }
      });
   }

   public void filter(final HigherOrderCollection target, final Function pred) {
      this.doEach(new Function() {
         public Object apply(Object o) {
            Boolean predVal = (Boolean)pred.apply(o);
            if (predVal.booleanValue()) {
               target.add(o);
            }
            return null;
         }
      });
   }

   // ... other operations
}

Here's an example of a client of HOList

HOList greetings = new HOList();
greetings.add("hi");
greetings.add("bonjour");
greetings.add("hola");

HOList helloWorlds = new HOList();
greetings.map(helloWorlds, new Function() {
    public Object apply(Object o) {
        return ((String)o) + ", world!";
    }
});

helloWorlds.doEach(new Function() {
    public Object apply(Object o) {
        System.out.println(o);
        return null;
    }
});

Argh! Stop the madness!

The above is cool in a way, but I bet it doesn't strike you as very clean. Why is it that something that seems so clean in ML (and other languages we've studied) becomes ugly in Java? I claim that the answer has two parts:

Anonymous inner classes are much more verbose than anonymous functions in ML, Scheme, or Smalltalk. As a result, we end up writing lots more curly braces and other text that we don't really want to write.

Now, this verbosity does buy us something: the flexibility to specify many different methods is useful for things like MouseListener implementations, which must specify different functions to call for different events. But for simple things like Function it's overkill.
Java has only subtype polymorphism; it doesn't have ML-style parametric polymorphism. As a result, the argument to apply has type Object, and we must nearly always use a cast in apply's body.

What we really want is to have Function represent a family of interfaces, whose instances take and return different types --- in much the same way that the ML function type 'a -> 'b gets instantiated to different specific types depending on the function definition.

As it happens, Pizza provides solutions to these problems.

Pizza: Functional extensions for Java

Pizza is an extension of Java developed by Margin Odersky and Philip Wadler in 1997. Pizza is backwards-compatible with Java: every legal Java program is also a legal Pizza program that has the same meaning, and Pizza compiles to standard Java bytecodes. Pizza augments Java with three ideas from the functional language community:

Bounded parametric polymorphism
Syntactically lightweight anonymous functions
Algebraic datatypes

We won't discuss Pizza's algebraic datatypes in these notes. They're cool, but the motivation for adding them to Java is arguably weaker than the former two features.

Bounded parametric polymorphism

Consider the type of the ML map function:

('a -> 'b) -> 'a list -> 'b list

This type uses parametric polymorphism: there are two type parameters (type variables), 'a and 'b, which are automatically instantiated to specific types when map is applied:

map (fn x => x ^ ", world!") ["hi", "bonjour", "hola"];

In the above expression, (fn x => x ^ ", world!") is of type string -> string, and map's type is automatically adapted to this type.

It would be useful to add the analogous power to Java. This isn't an artifact of our weird HigherOrderCollection type; it's useful in general, especially for the built-in standard Java collections. Consider the following code:

Collection c = new LinkedList();
c.add("hi");
c.add("bonjour");
c.add("hola");

Collection c2 = new LinkedList();
for (Iterator i = c.iterator(); i.hasNext(); ) {
    String s = (String)i.next();  // XXX
    c2.add(s + ", world!");
}

The cast on line XXX is not very satisfying. For one thing, it may fail at runtime. For another, it's a really poor way to document that c holds collections of String rather than merely collections of type Object. What we'd really like to do is say that

Collection is (to borrow ML's type syntax) a 'a Collection;
a 'a Collection's iterator() method returns a 'a Iterator; and
a 'a Iterator's next() method returns a 'a, not just an Object.

But Java gives us no way of saying this.

Parameterized types in Pizza

In Pizza, you can declare that a type (interface or class type) has type parameters by writing the type parameters in angle brackets. Pizza's Collection interface looks like this:

public interface Collection<T> {
    void add(T value);
    Iterator<T> iterator();
    ... // other methods
}

The T here is a type variable: it plays a role similar to 'a in an ML datatype declaration.

Unlike ML, Java has no real type inference. To maintain harmony with Java, and for some other good reasons, Pizza doesn't infer instantiations of parameterized types. Therefore, when you declare a reference to an instantiation of Collection<T>, you have to provide the type parameter explicitly:

Collection<String> c = new LinkedList<String>();
c.add("hi");
c.add("bonjour");
c.add("hola");

Collection<String> c2 = new LinkedList<String>();
for (Iterator<String> i = c.iterator(); i.hasNext(); ) {
    String s = i.next();    // XXX
    c2.add(s + ", world!");
}

At first glance, this seems like a greater burden than before --- where before we only had to write the cast to String on line XXX, we must now fill in type parameters on many of our type declarations. However, this is still superior for at least the following reasons:

We now get static checking that the collections c and c2 only contain strings; this checking extends to all method calls on the collection. For example, if we tried to add an Integer to one of the collections, we'd get a static error:
```
c.add(new Integer(3));   // compile-time error
```
Having the element type in the collection is a useful form of documentation about the contents of the collection. Since the type is statically checked, this documentation is guaranteed to stay in sync with the actual code.
Because our static checking is sound, we never need to worry about a runtime error at line XXX due to a dynamic cast failure.

To implement a Collection<T>, we can define a linked list class that also has a type parameter:

public class LinkedList<T>
    implements Collection<T> {

    private class Link<T> {
        T value;
        Link<T> next;
        Link<T>(T value, Link<T> next)
            { this.value = value; this.next = next; }
    }

    private Link<T> head;

    public LinkedList<T>() { this.head = null; }

    void add(T value) { current = new Link<T>( }

    Iterator<T> iterator() {
        return new Iterator<T>() {
            Link<T> current = head;
            public boolean hasNext() { return current != null; }
            public T next() {
                T retval = current;
                current = current.next;
                return retval;
            }
            public void remove() {
                return new UnsupportedOperationException();
            }
        };
    }
}

Notice that wherever we have a type name, we consistently insert a parameter to indicate the element type.

Bounded parametricity

ML has parametric polymorphism, which Java does not. But Java has subtype polymorphism, which ML does not. It so happens that when you have both subtyping and parametric polymorphism in your language, it becomes natural to extend parametric polymorphism with bounds on the type. Pizza supports this feature, which is called bounded parametric polymorphism.

For example, we might want to define a class of "printable lists", whose elements must define a print method. First, we define a Printable interface:

public interface Printable {
    String print();
}

Now, we can define an interface for printable collections:

public interface PrintableCollection<T implements Printable>
    extends Collection<T> {
    String printAll();
}

Notice the extends clause on the type parameter. This says that the type variable can only be instantiated with types that implement the Printable interface. Therefore, for example, suppose we have classes Foo and Bar:

class Foo implements Printable { String print() { return "foo"; } }
class Bar {} // does not implement Printable

PrintableCollection<Foo> = ...; // OK
PrintableCollection<Bar> = ...; // Static error: Bar not Printable

In the implementation of a PrintableCollection, the class body is allowed to assume that print is defined on the element type:

public class PrintableLinkedList<T extends Printable>
    extends LinkedList<T>
    implements PrintableCollection<T> {
    public String printAll() {
        String retval = "";
        for (Iterator<T> i = this.iterator(); i.hasNext(); ) {
            retval = retval + i.next().print(); // XXX
        }
        return retval;
    }
}

Notice that on line XXX we use.

Bounded polymorphism can gets pretty fancy --- it turns out that you really want recursive bounds to express certain typing patterns. You can read the Pizza paper for details.

The important lessons to take away from Pizza's bounded polymorphism:

The benefits of parametric polymorphism apply even in non-functional languages (see the list of reasons above).
When you combine subtyping with parametric polymorphism, you really want bounded parametric polymorphism, so that you can place constraints on the instantiation of the type parameter.

Note on terminology: sometimes you will hear bounded parametric polymorphism called generics or generic types. Variations on generics have appeared in Ada, Modula-3, and C++ (templates), among other languages.

Syntactic sugar for functions

Java's anonymous inner classes are too verbose for simple uses of anonymous functions. Pizza adds a simpler syntax for anonymous functions:

fun (argType₁ argName₁, ..., argType_N argName_N) -> returnType stmt

Here's the identity function:

fun (Object x) -> Object { return x; }

Functions can be applied using normal Java function call syntax, so here's the identity function applied to a string:

(fun (Object x) -> Object { return x; })("hello")

In this world, it is simple for our LinkedList class to support a map function:

class LinkedList<T>
    implements Collection<T>
{
    ...
    <T2> LinkedList<T2> map((T)->T2 f) {
        LinkedList<T2> retval = new LinkedList<T2>();
        for (Iterator<T> i = this.iterator(); i.hasNext(); ) {
            retval.add(f(i.next()));
        }
        return retval;
    }
}

The user of this function must write much less than with anonymous inner classes:

LinkedList<String> greetings = new LinkedList<String>();
greetings.add("hello");
greetings.add("bonjour");
greetings.add("hola");

LinkedList<String> helloWorlds =
    greetings.map(fun (String s) -> String { return s + ", world!" });

Notice that parametric polymorphism and anonymous functions have synergistic effects. Each makes the other more powerful and elegant.

MultiJava: Multiple dispatch for Java

MultiJava is an extension of Java originally developed by Craig Chambers and Todd Millstein (of UW) and Curtis Clifton and Gary Leavens (of Iowa State University). MultiJava is in semi-active development.

Like Pizza, MultiJava is backwards compatible --- every legal Java program is also a legal MultiJava program that has the same meaning, and MultiJava programs compile to standard Java bytecodes. MultiJava augments Java with two key ideas from the (object-oriented) research language Cecil:

Multiple dispatch: the ability to dispatch on multiple arguments, not just the receiver
Open classes: the ability to add methods to a class outside its original declaration.

The MultiJava compiler is freely available at multijava.org. I have used it daily for several months as the implementation language for some of my own projects; it is a relatively stable, high-quality tool, and I strongly recommend you try it out.

Overriding vs. static overloading

Recall the Shape and Rectangle classes from our lecture on OO static typing:

class Shape extends Object {
   boolean overlaps(Shape other) { ... }     // AAA
}

class Rectangle extends Shape {
   boolean overlaps(Rectangle other) { ... } // BBB
}

Rectangle r = new Rectangle(...);
Shape s = new Rectangle(...);
boolean b = r.overlaps(s);        // XXX

The methods at lines AAA and BBB are statically overloaded, not overridden. At line XXX, the method is chosen based on the static overload resolution, not dynamic dispatch, and the static type of the argument is Shape.

Clearly, we don't want static overloading. This is known as the binary method problem. In Java, you can only implement the "right" behavior for binary methods as follows:

class Rectangle extends Shape {
    boolean overlaps(Shape other) {
        if (other instanceof Rectangle) {
            Rectangle otherRect = (Rectangle)other;
            ... // code to compare with otherRect
        } else {
            return super.overlaps(other);
        }
    }
}

This is not satisfying. We must manually test for Rectangle and use a cast; it's easy to make an error, and it's tedious to write and maintain this code, which amounts to a manual implementation of dynamic dispatch.

Languages with multiple dispatch enable the programmer to specify directly that a method should dynamically dispatch based on multiple arguments --- i.e., based on the runtime type of arguments in addition to the receiver.

In MultiJava, you can make overlaps dispatch dynamically when the argument is a Rectangle as follows:

class Rectangle extends Shape {
    boolean overlaps(Shape@Rectangle other) { ... }
}

Notice that the declared argument type is now Shape@Rectangle. This means that

we are overriding the method overlaps(Shape) inherited from Shape;
but only when the dynamic class of the second argument is Rectangle.

As a result, we simply get the "right" thing. The MultiJava compiler will automatically generate dispatch code.

Multiple dispatch has uses besides binary methods: event handling, extensible data structure traversals, and many more.

EML: Extensible datatypes

EML (Extensibe ML) is a language developed by Todd Millstein and Craig Chambers (two of the designers of MultiJava) in order to explore certain connections between functional and object-oriented languages.

In the functional universe, we have pattern matching and datatypes:

datatypes divide a single type into several cases;
pattern matching uses the cases of a datatype to choose one of a function's cases when that function is called

In the object-oriented universe, we have classes and dynamic dispatch:

classes divide their supertypes into several cases;
dynamic dispatch uses the class of the receiver to choose one of a function's cases when the function is called

You should have recognized this similarity when you implemented the Smalltalk binary tree after implementing the ML binary tree. Smalltalk binary tree nodes might be represented with the classes:

Object subclass: #TreeNode ...
TreeNode subclass: #EmptyNode ...
TreeNode subclass: #ValueNode instanceVariableNames: 'v left right' ...

ML binary tree nodes might be represented with the datatype:

datatype 'a TreeNode =
    EmptyNode
  | ValueNode of {v:'a, left:'a TreeNode, right:'a TreeNode}

(The above examples have been modified slightly from the homeworks, in order to make the parallel clearer.)

EML develops this observation further by unifying pattern matching with multiple dispatch, and unifying ML-style data types with object-oriented classes.

Consider the following ML-style datatype declaration of points:

datatype Point =
    CartPoint of {x:real, y:real}
  | PolarPoint of {rho:real, theta:real}

fun getX (CartPoint {x, y}) = x
  | getX (PolarPoint {rho, theta}) = rho * Math.cos(theta)

In ML, it would not be possible to add a third case, CartPoint3D, with a z field, without altering this original source code. On the other hand, it is easy to add a new function,

In functional languages, it is easy to add new functions, because the cases of a function are grouped together, and are not "tied" to the data type.
In object-oriented languages, it is easy to add new data types, because the functions in a class are grouped together, and are not "tied" to the functions of other classes.

The duality between these two forms of extensibility is sometimes called the horizontal-vertical extensibility problem, because if you arrange data types and functions in a table, then object-oriented programming gives you "horizontal" extension and functional programming gives you "vertical" extension:

	CartPoint	PolarPoint	...
getX	getX(CartPoint)	getX(PolarPoint)	...
getY	getY(CartPoint)	getY(PolarPoint)	...
...	...	...	...

In EML, both data types and functions are extensible (with some restrictions to ensure that typechecking can be performed separately on each module), thereby solving the horizontal-vertical extensibility problem.

In EML, the above datatype declaration for Point is actually syntactic sugar for the following class declarations:

abstract class Point() of {}

class CartPoint(x:real, y:real) extends Point()
  of {x:real = x, y:real = y}

class PolarPoint(rho:real, theta:real) extends Point()
  of {rho:real = rho, theta:real = theta}

Class names serve as constructors, just as with ML datatype constructor names:

val aCartPoint = CartPoint {1.0, 2.0};

You can extend this data type straightforwardly, in the OO style:

class CartPoint3D(x:real, y:real, z:real) extends CartPoint(x, y)
  of {z:real = z}

Notice that classes only declare data members. Functions are still specified separately from classes, and use pattern-matching syntax:

fun getX (CartPoint {x, y}) = x
    getX (PolarPoint {rho, theta}) = rho * Math.cos(theta)

fun plus (CartPoint {x=x1, y=y1}, CartPoint {x=x2, y=y2}) =
    CartPoint {x=x1+x2, y=y1+y2}
  | plus (PolarPoint {rho=r1, theta=t1}, PolarPoint {rho=r2, theta=t2}) =
    ...

With the ability to extend datatypes, one must have the ability to extend functions for datatypes; and, indeed, one can, using the extend fun construct:

extend fun getX (CartPoint3D{x, y, z}) = x;

extend fun plus(CartPoint3D {x=x1, y=y1, z=z1},
                CartPoint3D {x=x2, y=y2, z=z2}) =
           CartPoint3D {x=x1+x2, y=y1+y2, z=z1+z2};

extend fun plus(CartPoint{x=x1, y=y1}, CartPoint3D{x=x2, y=y2, z=z2}) =
           CartPoint3D {x=x1+x2, y=y1+y2, z=z2};

... (* extend funs for other cases *)

One can pattern match on the types of all the arguments, as in ML. However, unlike in ML, the order of the cases doesn't matter --- the function always dispatches to the case with the most specific matching argument pattern.
Everything in a class is "public" by default. Access protections, interfaces, and encapsulation are handled by a separate module system composed of structures and signatures, in a fashion similar to ML.

Post script: EML ~= MultiJava

It turns out that, if you dig down under the syntax, EML and MultiJava are actually based on the same underlying ideas:

Both languages support roughly the same kinds of extensibility --- a combination of functional extensibility plus object-oriented extensibility.
Both languages support multiple dispatch on function arguments. EML is slightly more expressive, because it allows pattern matching on parts of data values rather than just the type, but both EML and MultiJava have the same general idea that a function's case is chosen by examining all the arguments at once.