abstract class Dog { protected String name; protected Dog(String name) { this.name = name; } abstract public void bark(); } class Labrador extends Dog { public Labrador(String name) { super(name); } public void bark() { System.out.println(name + ": woof!"); } } class Dalmatian extends Dog { public Dalmatian(String name) { super(name); } public void bark() { System.out.println(name + ": weee-arrr-ooof!"); } } /** Standard singly linked list cell */ class ListCell { private final Object value; private final ListCell next; public ListCell(Object value, ListCell next) { this.value = value; this.next = next; } public Object getValue() { return value; } public ListCell next() { return next; } }
Suppose we wanted to keep a collection of dogs:
public class Kennel { private ListCell dogs; // The head of the list public Kennel(ListCell dogs) { this.dogs = dogs; } public void rollCall() { for (ListCell cell = dogs; cell != null; cell = cell.next()) { Dog d = (Dog)cell.getValue(); d.bark(); } } public static void main(String[] args) { Dog rover = new Labrador("rover"); Dog spots = new Dalmatian("spot"); ListCell dogs = new ListCell(rover, new ListCell(spots, null)); Kennel k = new Kennel(dogs); k.rollCall(); } }
This code is dissatisfying for at least two reasons:
(Dog)
cast inside the loop of
rollCall()
incurs a dynamic check. Dynamic checks
are undesirable because they may raise an error
(ClassCastException
) at runtime if they fail. Less
importantly, dynamic checks have some runtime performance
cost.ListCell
: we know that the kennel should
contain only dogs. We also know that the value
field
of every cell in this list should contain the same type:
Dog
(or a subtype thereof). We would like the
compiler to verify these facts for us statically.In this case, there's only one cast, because the code is simple.
However, as the complexity of the code grows, the pervasive use of
casts and imprecise types grows increasingly troublesome.
Furthermore, the annoyance increases as you write more reusable
code: the more code you write dealing with Object
, the
more casts back from Object
your clients have to
use.
Life does not have to be this way. Consider the similar ML code (to make it closer to the Java code, I define my own ListCell type, and I use manual recursion instead of higher-order functions):
datatype DogType = Labrador of string | Dalmatian of string; datatype 'a ListCell = Null | Cons of 'a * 'a ListCell; fun bark(Labrador(name)) = print (name ^ ": woof!") | bark(Dalmatian(name)) = print (name ^ ": weee-arrr-ooof!"); fun rollCall(Null) = () | rollCall(Cons(dog::dogs)) = (bark(dog); print "\n"; rollCall(dogs)); val rover = Dog("rover"); val spots = Dalmatian("spots"); rollCall([rover, spots]);
In this code, the ListCell
data type uses ML-style
polymorphism, which solves several problems:
bark
in rollCall
causes the polymorphic
type 'a ListCell
to be instantiated as Dog
ListCell
.Dog
ListCell
prevents us from using any list of
Dog
s improperly. For example, if we try to cons a
integer list cell onto to a list of dogs, the type checker will
raise an error statically. We can therefore enforce the
homogeneity of the list (note that if we want a heterogeneous
list, we can simply use a list of Object
).ML-style polymorphism, a.k.a. "parametric" or "Hindley-Milner" polymorphism, is a good thing. Why can't we have something similar in Java?
GJ is a proposal by Bracha/Odersky/Stoutamire/Wadler to add
parametric polymorphism (in the OO context, parametric polymorphism
is often called "generics") to Java. GJ is based on Pizza, an
earlier Java extension that added generics, first-class anonymous
functions, and algebraic data types (types with "cases", like ML
datatype
constructors) to Java.
A GJ class definition can be parameterized by an element type:
class ListCell<Elem> { private final Elem value; private final ListCell<Elem> next; public ListCell(Elem value, ListCell<Elem> next) { this.value = value; this.next = next; } public Elem getValue() { return value; } public ListCell<Elem> next() { return next; } }
This class is parameterized by its the type Elem
;
this is indicated by <Elem>
type parameter after
the class name. To instantiate this class, the programmer must
explicitly provide a type parameter to all references and
constructors. Values with a type instantiation will be typechecked
in the way you would expect:
// OK ListCell<String> strList = new ListCell<String>("aString", null); // Also OK: Dalmatian is a subtype of dog Dalmatian boo = new Dalmatian("boo"); ListCell<Dog> dogList = new ListCell<Dog>(boo, null); // Raises a type error, because the second argument to the // ListCell<Dog> constructor should be a ListCell<Dog> Dog spike = new Labrador("spike"); ListCell<Dog> badList = new ListCell<Dog>(spike, strList); // Implementing Kennel class Kennel { private ListCell<Dog> dogs; // The head of the list public Kennel(ListCell<Dog> dogs) { this.dogs = dogs; } public void rollCall() { for (ListCell<Dog> cell = dogs; cell != null; cell = cell.next()) { Dog d = cell.getValue(); d.bark(); } } public static void main(String[] args) { Dog rover = new Labrador("rover"); Dog spots = new Dalmatian("spot"); ListCell<Dog> dogs = new ListCell<Dog>(rover, new ListCell<Dog>(spots, null)); Kennel k = new Kennel(dogs); k.rollCall(); } }
You might complain that, in this particular case, the generic
code for Kennel
is more verbose than the original code.
This is true. However, we get rid of the (Dog)
cast,
and its runtime check---we can statically prove that it's safe to
assign the result of cell.getValue()
to a
Dog
reference. And we also have the assurance that the
list is homogeneous, i.e. consists only of dogs.
Since this is not a language implementation course, we won't discuss the translation strategies in detail. However, the most important point is that GJ is defined via translation to Java, and runs on an unmodified Java Virual Machine. This has benefits (backwards compatibility) and drawbacks (the JVM protects Java's type system, not GJ's; therefore, the translation must use casts in the translated code. Also, there are certain complexities with the handling of arrays and primitive types).
See the Pizza paper for details, if you're curious. In the terminology of the Pizza authors, GJ uses only the "homogeneous translation".
When comparing the GJ code above with our original ML, we observe that the GJ code is much more verbose. Why, one might ask, can't GJ simply infer all the types, leaving us to write only the code that implements the logic?
Java has something that ML does not: namely, object-oriented subtyping. Most reasonable schemes for joining parametric polymorphism with object-oriented subtyping make ML-like levels of support for type inference undecidable. On the other hand, type checking, combined with more limited forms of type inference, remains tractable.
The requirement that constructors, references, and argument and return types be explicitly parameterized is one reason that GJ is more verbose than ML. Another reason is that it is built upon a more verbose substrate---non-generic Java itself (like most non-Smalltalk-derived OO languages) is verbose already.
Java's collections are defined using the following interfaces:
public interface Collection { public abstract void add(Object o); public abstract int size(); public abstract Iterator iterator(); // ...etc. } public interface Iterator { public abstract boolean hasNext(); public abstract Object next(); // ...etc. }
Iteration over a collection---for example, an ArrayList---occurs as follows:
List aList = new ArrayList(); aList.add("hi"); aList.add("bye"); for (Iterator i = aList.iterator(); i.hasNext(); ) { String s = (String)i.next(); ... s.length() ... // do something with s as a string }
In a generic context, we can use parameterized types to preserve precise type information:
public interface Collection<Elem> { public abstract void add(Elem e); public abstract int size(); public abstract Iterator<Elem> iterator(); // ...etc. } public interface Iterator<Elem> { public abstract boolean hasNext(); public abstract Elem next(); // ...etc. } List<String> aList = new ArrayList<String>(); aList.add("hi"); aList.add("bye"); for (Iterator<String> i = aList.iterator(); i.hasNext(); ) { String s = i.next(); ... s.length() ... // do something with s as a string }
Simple parametric polymorphism, like the parameterized
ListCell
, isn't as expressve as we would like. For
example, consider the following extension of lists:
public class NamedListCellextends ListCell { public void printAll() { this.getValue().print(); this.next().printAll(); } }
This does not type check, because the only thing we know about
Elem
is that its type, when instantiated, will be of
Object
type. Since Object
is not known to
have a print()
method, it is not safe to call
print
on the result of getValue()
.
In order to be able to do anything "interesting" with types named in type parameters, we need bounded quantification:
public interface Printable { public abstract void print(); } class NamedListCell<Elem extends Printable> extends ListCell<Elem> { public NamedListCell(Elem value, NamedListCell<Elem> next) { super(value, next); } public NamedListCell<Elem> nextNamed() { return (NamedListCell<Elem>)next(); } public void printAll() { this.getValue().print(); this.nextNamed().printAll(); } }
The extends clause in the type parameter places a bound
the types with which NamedListCell
can be
instantiated.
Why do we need to implement nextNamed()
, and why do
we still need a cast there? The inherited method
next()
returns ListCell<Elem>
,
whereas we need it to return NamedListCell<Elem>
.
Furthermore, since Java uses equivariant return types, we cannot
override next()
with an appropriate return type.
This is certainly dissatisfying, but in many contexts this sort of reimplementation will not be necessary.
You may be saying to yourself: there must be a way to make the
return type of next()
narrow automatically. Indeed,
there is, using recursively bounded quantification---I'll just throw
this out there, since I don't expect you to absorb it fully:
class ListCell<Elem, MyType extends ListCell<Elem, MyType>> { private final Elem value; private final MyType next; public ListCell(Elem value, MyType next) { this.value = value; this.next = next; } public Elem getValue() { return value; } public MyType next() { return next; } } class NamedListCell<Elem extends Printable> extends ListCell<Elem, NamedListCell<Elem>> { public NamedListCell(Elem value, NamedListCell<Elem> next) { super(value, next); } public void printAll() { this.getValue().print(); this.next().printAll(); } }
However, this doesn't really make our lives easier, since we can
never actually refer to an instantiation of the
ListCell
type. Exercises for the truly curious: (1)
Why not? (2) Can this be fixed, while retaining the ability to
treat NamedListCell
as a subtype of
ListCell
?
Java arrays have an odd property: you can upcast the reference type in an unsound way.
class A { void foo() {} } class B extends A { void bar() {} } B[] arrB = new B[5]; // Create an array of B refs arrB[0] = new B(); // OK assignment A[] arrA = arrB; // (1) Weird assignment, because... A foo = arrA[0]; // (2) ...even though this is OK... arrA[1] = new A(); // (3) ...this is NOT... arrB[1].bar(); // (4) ...because this might happen later...
We can see from the above that the assignment at the line marked (1) should, intuitively, be statically illegal. However, Java makes it legal, and instead performs a runtime check on all assignments into arrays of object references. Hence, Java will raise a runtime error at line (3) instead.
Why did the Java designers make such a strange decision? Well, consider the following method:
void doFoo(A[] arrA) { for (int i=0; i<arrA.length; i++) { arrA[i].foo(); } }
Because of Java's array typing rule, you can call
doFoo
freely over all arrays with element type
A
, and all arrays with element types that are
subtypes of A
. This is useful, to a degree: you can
pass an array of B
to doFoo
, and
everything's fine. However, the Java designers did not originally
want to include full, sound parametric polymorphism in the language,
so instead they imbued arrays with a simpler "broken" version.
Final note: it may occur to you that the way to fix the above is to define read-only arrays, and a disciplined way of casting up from read-write arrays to read-only arrays. This is true; however, doing this cleanly requires some type system machinery that is beyond the scope of this course.