Object-oriented generics

Life without typed generics

abstract class Dog {
  protected String name;
  protected Dog(String name) { this.name = name; }
  abstract public void bark();
}

class Labrador extends Dog {
  public Labrador(String name) { super(name); }
  public void bark() { System.out.println(name + ": woof!"); }
}

class Dalmatian extends Dog {
  public Dalmatian(String name) { super(name); }
  public void bark() { System.out.println(name + ": weee-arrr-ooof!"); }
}

/** Standard singly linked list cell */
class ListCell {
  private final Object value;
  private final ListCell next;

  public ListCell(Object value, ListCell next) {
    this.value = value;
    this.next = next;
  }
  public Object getValue() { return value; }
  public ListCell next() { return next; }
}

Suppose we wanted to keep a collection of dogs:

public class Kennel {
  private ListCell dogs; // The head of the list
  public Kennel(ListCell dogs) { this.dogs = dogs; }

  public void rollCall() {
    for (ListCell cell = dogs; cell != null; cell = cell.next()) {
      Dog d = (Dog)cell.getValue();
      d.bark();
    }
  }

  public static void main(String[] args) {
    Dog rover = new Labrador("rover");
    Dog spots = new Dalmatian("spot");
    ListCell dogs = new ListCell(rover, new ListCell(spots, null));
    Kennel k = new Kennel(dogs);
    k.rollCall();
  }
}

This code is dissatisfying for at least two reasons:

First, the (Dog) cast inside the loop of rollCall() incurs a dynamic check. Dynamic checks are undesirable because they may raise an error (ClassCastException) at runtime if they fail. Less importantly, dynamic checks have some runtime performance cost.
More generally, there's something dissatisfying about the type ListCell: we know that the kennel should contain only dogs. We also know that the value field of every cell in this list should contain the same type: Dog (or a subtype thereof). We would like the compiler to verify these facts for us statically.

In this case, there's only one cast, because the code is simple. However, as the complexity of the code grows, the pervasive use of casts and imprecise types grows increasingly troublesome. Furthermore, the annoyance increases as you write more reusable code: the more code you write dealing with Object, the more casts back from Object your clients have to use.

ML polymorphism

Life does not have to be this way. Consider the similar ML code (to make it closer to the Java code, I define my own ListCell type, and I use manual recursion instead of higher-order functions):

datatype DogType = Labrador of string | Dalmatian of string;

datatype 'a ListCell = Null | Cons of 'a * 'a ListCell;

fun bark(Labrador(name))  = print (name ^ ": woof!")
  | bark(Dalmatian(name)) = print (name ^ ": weee-arrr-ooof!");

fun rollCall(Null) = ()
  | rollCall(Cons(dog::dogs)) = (bark(dog); print "\n"; rollCall(dogs));

val rover = Dog("rover");
val spots = Dalmatian("spots");
rollCall([rover, spots]);

In this code, the ListCell data type uses ML-style polymorphism, which solves several problems:

Casting goes away, because the application of bark in rollCall causes the polymorphic type 'a ListCell to be instantiated as Dog ListCell.
More generally, the type instantiation Dog ListCell prevents us from using any list of Dogs improperly. For example, if we try to cons a integer list cell onto to a list of dogs, the type checker will raise an error statically. We can therefore enforce the homogeneity of the list (note that if we want a heterogeneous list, we can simply use a list of Object).

ML-style polymorphism, a.k.a. "parametric" or "Hindley-Milner" polymorphism, is a good thing. Why can't we have something similar in Java?

Generics for Java

GJ is a proposal by Bracha/Odersky/Stoutamire/Wadler to add parametric polymorphism (in the OO context, parametric polymorphism is often called "generics") to Java. GJ is based on Pizza, an earlier Java extension that added generics, first-class anonymous functions, and algebraic data types (types with "cases", like ML datatype constructors) to Java.

A GJ class definition can be parameterized by an element type:

class ListCell<Elem> {
  private final Elem value;
  private final ListCell<Elem> next;

  public ListCell(Elem value, ListCell<Elem> next) {
    this.value = value;
    this.next = next;
  }
  public Elem getValue() { return value; }
  public ListCell<Elem> next() { return next; }
}

This class is parameterized by its the type Elem; this is indicated by <Elem> type parameter after the class name. To instantiate this class, the programmer must explicitly provide a type parameter to all references and constructors. Values with a type instantiation will be typechecked in the way you would expect:

// OK
ListCell<String> strList = new ListCell<String>("aString", null);

// Also OK: Dalmatian is a subtype of dog
Dalmatian boo = new Dalmatian("boo");
ListCell<Dog> dogList = new ListCell<Dog>(boo, null);

// Raises a type error, because the second argument to the 
// ListCell<Dog> constructor should be a ListCell<Dog>
Dog spike = new Labrador("spike");
ListCell<Dog> badList = new ListCell<Dog>(spike, strList);

// Implementing Kennel
class Kennel {
  private ListCell<Dog> dogs; // The head of the list
  public Kennel(ListCell<Dog> dogs) { this.dogs = dogs; }

  public void rollCall() {
    for (ListCell<Dog> cell = dogs; cell != null; cell = cell.next()) {
      Dog d = cell.getValue();
      d.bark();
    }
  }

  public static void main(String[] args) {
    Dog rover = new Labrador("rover");
    Dog spots = new Dalmatian("spot");
    ListCell<Dog> dogs =
      new ListCell<Dog>(rover, new ListCell<Dog>(spots, null));
    Kennel k = new Kennel(dogs);
    k.rollCall();
  }
}

You might complain that, in this particular case, the generic code for Kennel is more verbose than the original code. This is true. However, we get rid of the (Dog) cast, and its runtime check---we can statically prove that it's safe to assign the result of cell.getValue() to a Dog reference. And we also have the assurance that the list is homogeneous, i.e. consists only of dogs.

Translating parametric polymorphism

Since this is not a language implementation course, we won't discuss the translation strategies in detail. However, the most important point is that GJ is defined via translation to Java, and runs on an unmodified Java Virual Machine. This has benefits (backwards compatibility) and drawbacks (the JVM protects Java's type system, not GJ's; therefore, the translation must use casts in the translated code. Also, there are certain complexities with the handling of arrays and primitive types).

See the Pizza paper for details, if you're curious. In the terminology of the Pizza authors, GJ uses only the "homogeneous translation".

Generics vs. ML-style polymorphism

When comparing the GJ code above with our original ML, we observe that the GJ code is much more verbose. Why, one might ask, can't GJ simply infer all the types, leaving us to write only the code that implements the logic?

Java has something that ML does not: namely, object-oriented subtyping. Most reasonable schemes for joining parametric polymorphism with object-oriented subtyping make ML-like levels of support for type inference undecidable. On the other hand, type checking, combined with more limited forms of type inference, remains tractable.

The requirement that constructors, references, and argument and return types be explicitly parameterized is one reason that GJ is more verbose than ML. Another reason is that it is built upon a more verbose substrate---non-generic Java itself (like most non-Smalltalk-derived OO languages) is verbose already.

Java collections and iterators

Java's collections are defined using the following interfaces:

public interface Collection {
  public abstract void add(Object o);
  public abstract int size();
  public abstract Iterator iterator();
  // ...etc.
}

public interface Iterator {
  public abstract boolean hasNext();
  public abstract Object next();
  // ...etc.
}

Iteration over a collection---for example, an ArrayList---occurs as follows:

List aList = new ArrayList();
aList.add("hi");
aList.add("bye");

for (Iterator i = aList.iterator(); i.hasNext(); ) {
  String s = (String)i.next();
  ... s.length() ... // do something with s as a string
}

In a generic context, we can use parameterized types to preserve precise type information:

public interface Collection<Elem> {
  public abstract void add(Elem e);
  public abstract int size();
  public abstract Iterator<Elem> iterator();
  // ...etc.
}

public interface Iterator<Elem> {
  public abstract boolean hasNext();
  public abstract Elem next();
  // ...etc.
}

List<String> aList = new ArrayList<String>();
aList.add("hi");
aList.add("bye");

for (Iterator<String> i = aList.iterator(); i.hasNext(); ) {
  String s = i.next();
  ... s.length() ... // do something with s as a string
}

Bounded type parameters

Simple parametric polymorphism, like the parameterized ListCell, isn't as expressve as we would like. For example, consider the following extension of lists:

public class NamedListCell
    extends ListCell
{
  public void printAll() {
    this.getValue().print();
    this.next().printAll();
  }
}

This does not type check, because the only thing we know about Elem is that its type, when instantiated, will be of Object type. Since Object is not known to have a print() method, it is not safe to call print on the result of getValue().

In order to be able to do anything "interesting" with types named in type parameters, we need bounded quantification:

public interface Printable { public abstract void print(); }

class NamedListCell<Elem extends Printable>
    extends ListCell<Elem>
{
  public NamedListCell(Elem value, NamedListCell<Elem> next) {
      super(value, next);
  }
  public NamedListCell<Elem> nextNamed() {
    return (NamedListCell<Elem>)next();
  }
  public void printAll() {
    this.getValue().print();
    this.nextNamed().printAll();
  }
}

The extends clause in the type parameter places a bound the types with which NamedListCell can be instantiated.

Why do we need to implement nextNamed(), and why do we still need a cast there? The inherited method next() returns ListCell<Elem>, whereas we need it to return NamedListCell<Elem>. Furthermore, since Java uses equivariant return types, we cannot override next() with an appropriate return type.

This is certainly dissatisfying, but in many contexts this sort of reimplementation will not be necessary.

Recursive bounds

You may be saying to yourself: there must be a way to make the return type of next() narrow automatically. Indeed, there is, using recursively bounded quantification---I'll just throw this out there, since I don't expect you to absorb it fully:

class ListCell<Elem, MyType extends ListCell<Elem, MyType>> {
  private final Elem value;
  private final MyType next;

  public ListCell(Elem value, MyType next) {
    this.value = value;
    this.next = next;
  }
  public Elem getValue() { return value; }
  public MyType next() { return next; }
}

class NamedListCell<Elem extends Printable>
    extends ListCell<Elem, NamedListCell<Elem>>
{
  public NamedListCell(Elem value, NamedListCell<Elem> next) {
      super(value, next);
  }
  public void printAll() {
    this.getValue().print();
    this.next().printAll();
  }
}

However, this doesn't really make our lives easier, since we can never actually refer to an instantiation of the ListCell type. Exercises for the truly curious: (1) Why not? (2) Can this be fixed, while retaining the ability to treat NamedListCell as a subtype of ListCell?

Footnote: Polymorphism in Java arrays

Java arrays have an odd property: you can upcast the reference type in an unsound way.

class A           { void foo() {} }
class B extends A { void bar() {} }

B[] arrB = new B[5]; // Create an array of B refs
arrB[0] = new B();   // OK assignment

A[] arrA = arrB;     // (1) Weird assignment, because...
A foo = arrA[0];     // (2) ...even though this is OK...
arrA[1] = new A();   // (3) ...this is NOT...
arrB[1].bar();       // (4) ...because this might happen later...

We can see from the above that the assignment at the line marked (1) should, intuitively, be statically illegal. However, Java makes it legal, and instead performs a runtime check on all assignments into arrays of object references. Hence, Java will raise a runtime error at line (3) instead.

Why did the Java designers make such a strange decision? Well, consider the following method:

  void doFoo(A[] arrA) {
    for (int i=0; i<arrA.length; i++) {
      arrA[i].foo();
    }
  }

Because of Java's array typing rule, you can call doFoo freely over all arrays with element type A, and all arrays with element types that are subtypes of A. This is useful, to a degree: you can pass an array of B to doFoo, and everything's fine. However, the Java designers did not originally want to include full, sound parametric polymorphism in the language, so instead they imbued arrays with a simpler "broken" version.

Final note: it may occur to you that the way to fix the above is to define read-only arrays, and a disciplined way of casting up from read-write arrays to read-only arrays. This is true; however, doing this cleanly requires some type system machinery that is beyond the scope of this course.

cse341-webmaster@cs.washington.edu

Last modified: Tue Mar 12 23:11:37 PST 2002