Contents:
Generics is a static (compile-time) feature of Java that allows for more powerful type checking, and hopefully better, safer code.
To write a generic class, you denote in angle brackets one or more (comma-separated) parameter types. It is recommended that these names be single capital letters (e.g. T or E) with some meaning (e.g. T stands for "type" and E in the Java collection classes stands for "element"). For example, here is a rough approximation of java.util.Map.Entry, which represents a (key, value) pair in a map.
/** * Entry is a mutable class that represents a (key, value) pair in a map. * * @specfield key : the key from which this entry maps * @specfield value : the value to which this entry maps */ public class Entry<K, V> { /* * AF(e): * key = e.key * value = v.value */ private K key; // notice key's type is K private V value; // notice value's type is V /** * @return a new Entry with key = k, value = v */ public Entry(K k, V v) { value = v; key = k; } /** * @modifies value * @effects value_post = newValue * @return value_pre */ public V setValue(V newValue) { V oldValue = value; value = newValue; return oldValue; } }
Notice that the parameter types K and V can be referred to anywhere in the class, including as parameters and return types of methods.
An invocation of a generic class assigns actual types to the parameter types. For example, the code:
Entry<Integer, String> myEntry = new Entry<Integer, String>(2, "two");
assigns Integer and String to K and V, respectively.
Declaring a reference to and instantiating a generic class without parametrizing it (with < >) is referred to as using a raw type. In general, proper generics code should not use raw types (exceptions include type casting).
In addition to declaring parameter types for a class, you can make a particular method generic as well. To assign parameter types to a method, you use a similar notation. Suppose you have a class that contains a number of static utility functions, and one of them takes a List and returns a new one that is a reversed copy of the original List. Ideally, you would like the returned List to have the same parameter type as the input. You can accomplish this by writing:
public class Utils { /** * @requires lst!=null * @modifies nothing * @return a List r equal to the sequence < lst[lst.length-1], lst[lst.length-2], ..., lst[0] > */ public static <E> List<E> makeReversedList(List<E> lst) { List<E> revList = new LinkedList<E>(); for (E e: lst) { revList.add(0,e); // insert at the front } return revList; } ... }
The <E> appearing before the return type declares that this method has a parameter type E. The <E> appearing in List<E> in the argument list tells the compiler what to map E to when the method is invoked. Thus, an invocation:
List<String> stringList = new LinkedList<String>(); stringList.add("blah"); stringList.add("foo"); List<String> rev = Utils.makeReversedList(lst);
will assign the parameter type E to an actual type of String.
The type hierarchy of parameterized types is based on the base type and not on the type of the parameters.
What about Collection<Integer> intLst = new ArrayList<Integer>()? Okay.
What about Collection<Object> intLst = new ArrayList<Integer>()? Error.
A common mistake made when making code generics-compliant is to blindly parametrize argument types to Object. For example, suppose we wish to write another utility method printList() that prints a List with elements of any type. Since unlike in makeReversedList(), we don't need to worry about the return type of this method being properly parametrized (the method is declared as void), one might naively conclude that it'd be easiest just to explicitly set the argument list's parameter type as Object, as in the following:
/** * @requires lst!=null * @modifies nothing * @effects prints the elements of lst in order to standard output */ public static void printList(List<Object> lst) { for (Object e: lst) { System.out.println(e); } }
What is the solution to this? We can just make the method generic by adding <E> as we did for makeReversedList(), but since we don't really care about the element type of the List in this method, we can take another approach. Java allows the ? character to be used as a wildcard for matching parameter types. Thus, if we write:
public static void printList2(List<?> lst) { for (Object e: lst) { System.out.println(e); } }
The <?> in the argument type can be used match any parameter type, and the rest of the method does not care what is actually matched. But since we know that whatever was used to match ? (e.g. String) must be a Java subtype of Object, the compiler allows you to set elements from the List as Objects without error.
Sometimes, you want <?> to, instead of matching any type, match only types that are the same as or Java subtypes of a given type Foo. You can accomplish this by writing <? extends Foo> instead of just <?>. Similarly, you can require that only Foo or its supertypes are matched with <? super Foo>.
Let's apply this to an example. Suppose we want to write a utility function max() that returns a maximum element in the list, assuming that the elements in the list all implement Comparable and are in fact mutually Comparable to one another. Here are the constraints we need to make on the parameter type <E> of the argument List<E> lst.
With these observations, we can arrive at the following method signature and implementation of max(). Note that we can call e.compareTo() without compile errors because e's type E was required to match a subtype of Comparable.
/** * @requires lst!=null && no element in lst is null * @modifies nothing * @return an element lst[i] in lst such that * for all 0<=j<lst.length | (lst[i].compareTo(lst[j]) >= 0) */ public static <E extends Comparable<? super E>> E max(List<E> lst) { E max = null; // the currently known maximum // iterate over lst, overwriting max if necessary for (E e : lst) { if (max==null) { // initial case max = e; // set max to the first element of list } else if (e.compareTo(max)>0) { max = e; // set max to e if e is bigger } } return max; }
Since generics is solely a compile-time feature in Java,
parameter types are "erased" after compilation
(they disappear and all generic types become raw types). Thus,
runtime instanceof calls and type casts (in general) can not
work with parameterized types. Some up- or down-casts from
parameterized types are exceptions; for example, you can cast from
List<T>
to LinkedList<T>
because
the compiler knows that List
's parameter T constrains
LinkedList
as well. On the other hand, you cannot cast
Object
to LinkedList<T>
without a
compiler error, because the parameterization of the
LinkedList
has been lost.
In 331 we will be concerned with two notions of equivalence for objects: behavioral equivalence and observational equivalence.
We exclude the == operator from the set of operations we can use to distinguish objects.
One recommendation (from Barbara Liskov) is to always use the equals method to provide a notion of behavioral equivalence for an ADT, while using an additional similar method to provide the ADT's notion of observational equivalence. In general, the Java API does not follow this guideline for equals and there is no similar method defined in Object.
Each ADT should provide its own notion of what it means for two instances of that ADT to be equivalent. In general, you should never use the == operator to compare two object references, unless you are explictly trying to see if they are the exact same object in memory.
(Note that Java's primitive types are not object references; you
should always use == on them, as there is no way to call an
equals()
method.)
The equals() method specified in Object should be used to test for observational equivalence. Each ADT you specify should override the equals() method, if needed, to provide a notion of equivalence for that ADT.
public boolean equals(Object obj)
The equals() method has the following specified requirements for non-null references x, y, and z:
x.equals(null)
is always false
.
x.equals(x)
is always true
.
x.equals(y)
it must be that y.equals(x)
x.equals(y)
and y.equals(z)
then
it must be that x.equals(z)
x.equals(y)
return the same
value unless one of the objects is mutated.
Note that this is quite an intentionally incomplete definition, and so the following meets the spec of being an equivalence relationship in Java, even though it does not seem particularly useful:
public boolean equals (Object obj) { return (obj instanceof ThisClass); }
public boolean equals(Object o)
the invocation equals(null) can do one of the following things:
Why does equals(null) return false?
Option 1: Return true
If equals(null) were to return true, then intuitively, this would not make any sense because a non-null object would be considered equal to something that is not even an object. This is inconsistent with our idea of equality.
Option 2: Throw an Exception
Because equals() is a method of Object, it is a fundamental method in Java that is used often, and in a variety of ways. Thus, it would be nconvenient if it threw an exception when passed a null reference when there is a more reasonable response, namely:
Option 3: Return false
This consistent with our idea of equality because it considers objects and non-objects unequal, so this appears to be the most reasonable result for equals(null).
(Note however that returning false is asymmetric. If a is null, and b is not, then b.equals(a) returns false, while a.equals(b) throws a NullPointerException. This illustrates why the condition of an equivalence relation for equals() only applies for non-null input.)
Consider a class Point2D and a class Point3D that extends it:
public class Point2D { protected int x; protected int y; public Point2D(int x, int y) { this.x = x; this.y = y; } } public class Point3D extends Point2D { private int z; public Point2D(int x, int y, int z) { super(x, y); this.z = z; } }
Now let's explore some implementations of equals():
// Point2D public boolean equals(Object obj) { if (!(obj instanceof Point2D)) return false; Point2D p = (Point2D)obj; return this.x == p.x && this.y == p.y; } // Point3D public boolean equals(Object obj) { if (!(obj instanceof Point3D)) return false; Point3D p3 = (Point3D)obj; return this.x == p3.x && this.y == p3.y && this.z == p3.z; }
equals()
contract
does this violate? Point2D p2 = new Point2D(2, 71); Point3d p3 = new Point3D(2, 71, 82); p2.equals(p3); // true p3.equals(p2); // false
// Point3D public boolean equals(Object obj) { if (!(obj instanceof Point3D)) return super.equals(obj); Point3D p = (Point3D)o; return this.x == p.x && this.y == p.y && this.z == p.z; }
equals()
contract
does this violate? Point2D p3Da = new Point3D(0, 6, 170); Point2D p3Db = new Point3D(0, 6, 171); Point2D p2D = new Point2D(0, 6); p3Da.equals(p2D); //true p2D.equals(p3Db); //true p3Da.equals(p3Db); //false: violates transitivity
// Point2D public boolean equals(Object obj) { if (obj == null || !this.getClass().equals(obj.getClass())) { return false; } Point2D p = (Point2D)o; return this.x == p.x && this.y == p.y; } // Point3D public boolean equals(Object obj) { if (obj == null || !this.getClass().equals(obj.getClass())) { return false; } Point3D p3 = (Point3D)o; return this.x == p3.x && this.y == p3.y && this.z == p3.z; }
public class Point2DPolar extends Point2D { public double angle() { //returns the polar angle of this point } public double radius() { //returns the polar radius of this point } }
This leads to the following problem:
Point2D p = new Point2D(4, 20); Point2D pp = new Point2DPolar(4, 20); p.getX() == pp.getX(); // true p.getY() == pp.getY(); // true p.equals(pp); // false
Thus, two objects that appear to be the same are not considered equal when compared with the equals() method. (Note that this problem would also exist if we had used Point3D in place of Point2DPolar in this example). Even though PointPolar is not fundamentally different than its superclass, the implementation of equals() in Point2D makes the two types incomparable. Thus, this is not an ideal solution to our problem.
public class Point3D { private Point2D p; private int z; public Point3D(int x, int y, int z) { p = new Point2D(x, y); this.z = z; } }
Note that the client of Point3D need not know that Point3D uses Point2D.
Because Point2D and Point3D do not extend one another, there are no problems involving inheriting a broken equals() method. However, we do lose something: previously, we could add both Point2D and Point3D objects to a Point2D[ ] array. Because Point3D is no longer a subclass, we can no longer do this. The solution is to create an interface that both classes implement:
public interface Point { public int getX(); public int getY(); }
Thus, we could create a Point[] array to which we could add both Point2D and Point3D objects. Note that great care must be exercised when using the equals() method of objects that are referred to through their interface, as an interface rarely specifies how equals() must be implemented. For example, if we have the following:
Point p1, p2; // suppose these were initialized outside this block of code Set<Point> set = new HashSet<Point>(); p1.getX() == p2.getX(); // true p1.getY() == p2.getY(); // true set.add(p1); set.add(p2); System.out.println(set.size());
What gets printed out? From this code, there is no way to know because it is unclear what happens when add(p2) is invoked since set tests if p1.equals(tp2). Thus, be conscious of the use of equals() in scenarios like this.
Hash-codes are used by data structures such as hash tables (called HashMap in Java).
The goal of hash tables is to let us store objects in such a way that we can perform (expected) constant time lookups later on. We start with an array of buckets. Note that it takes constant time to access any arbitrary bucket. A hash function is used to mathematically map an object to a bucket. The hash-function is a mapping from an object (such as a String) to a bucket number. When an object is being stored, it is mapped to the appropriate bucket and stored there. When we are looking up an object, we only need to look at the bucket identified by that object's hash. It is normal for more than one object to hash to the same bucket. This can be solved by having a linked list for each bucket in the hash-table (there are also other solutions). If a good enough hash function is chosen these lists will be small for each bucket and the time to look up an object will be roughly constant.
Java avoids the theoretical issues associated with forming well-balanced hash-tables. The hashCode() method defined in Object lets any hash-table implementation map an object to a number. The hash-table code can then use an internal mapping to create the actual hash (i.e. bucket index) from this number.
Java really just has one main requirement for hashCode() implementations: whenever two objects are equal() they must have the same hashCode(). In general, whenever you override equals(), you will also need to override hashCode().
Let's try to design a hash code for the Duration class we have seen in lecture:
public class Duration { private int min; private int sec; public Duration(int min, int sec) { this.min = min; this.sec = sec; } }
Consider the following implementations for Duration.hashCode():
return 1;
This is a terrible hash code because it would make everything hash to the same bucket, making the hash code function as a linked list (eliminating the benefit of constant-time-lookup that a hash table should provide).
return min;
This is a better hash code; however, it will still result in a lot of collisions because every Duration with the same value for min will end up in the same bucket.
return min + sec;
Again, this is better, but there will still be collisions for a number of different durations. For example, when (min,sec) = (3,4) and (min,sec) = (4,3), both will hash to the same bucket.
return sec << 16 + min;
This is a bigger improvement because now only equal durations have equal hash codes (assuming min and seconds take on values between 0 and 60). However, if only the low order bits are used in the hash (which happens in techniques such as extensible hashing) then there will still be collisions when durations have the same minute. Note that this shows that just because you have unique hash codes for unique values, this does not guarantee a good hash.
int result = 17; result = 37 * result + sec; result = 37 * result + min; return result;
"Whoah -- where did this come from?" you ask. Well, the truth is that number theory people have discovered that multiplying by prime numbers helps spread out hash code values. In fact, they have discovered a lot of neat things about how to produce better hash codes, but they are often quite complicated, so it's probably easiest to look at the guidelines that Bloch has in Item 8 and apply them to your code rather than devise your own algorithms to achieve good hashing (unless you're into that sort of thing).
Actually, there was nothing wrong with Try #5, but if the hashCode() method is going to be invoked often, and is always going to return the same number, then it will help to cache the hash code and to declare the hashCode() method final so that the cost of method lookup and computation will be reduced. For a large number of elements in a hashed data structure (like nodes in the graph of Boston), this will have a noticeable impact on performance.
Even if Duration were a mutable object, we could still cache the hash code by updating the cached value every time a Duration were mutated. Alternatively, we could set a flag whenever it is mutated, and then check that flag before returning the cached value: if it has been mutated, then we consider the cached value stale and then recalculate and cache the hash before returning it.
The number of optimizations like this that you want to encode depends on the nature of the problem — if hashCode() is inexpensive to compute, and is not accessed often, then these mechanisms will just make your code more difficult to maintain without providing any real benefit to the client. The important thing is to think carefully about how likely hashCode() is to cause a collision, and to make sure that it is in agreement with equals().
Unfortunately, Java does not follow a consistent notion of equivalence
for mutable ADT's, beyond the specifications in Object. Some
mutable ADT's defined by the Java API use behavioral equivalence
(e.g. StringBuilder
), other mutable ADT's use
observational equivalence(e.g. Date
). In general, you should
look at the specs of the class you are using before relying on either
type of behaviour.
List<String> arrayList = new ArrayList<String>(); List<String> linkedList = new LinkedList<String>(); arrayList.add("a"); linkedList.add("A".toLowerCase()); arrayList.add("b"); linkedList.add("B".toLowerCase()); System.out.println(arrayList.equals(linkedList)); System.out.println(linkedList.equals(arrayList));
true true
The classes in the Java collections framework consistently use the
notion of observational equivalence. This is why the
LinkedList above is equal to the
ArrayList. Furthermore, note that the references in the two
lists are not ==
to each other.
The Java implementations of the List interface can hold mutable objects without any complications. However, Sets are different. According to the specifications, the behaviour of Set implementations can be completely arbitrary if an element contained in the set is mutated.
Set<List<String>> set = new HashSet<List<String>>(); List<String> list = new ArrayList<String>(); list.add("6.170"); list.add("6.004"); list.add("6.003"); set.add(list); set.contains(list); // what does this return? list.add("6.033"); set.contains(list); // what does this return? set.add(list); System.out.println("set has " + set.size() + " elements"); for (List<String> l : set) { System.out.println(list); }
set has 2 elements [6.170 6.004 6.003 6.033] [6.170 6.004 6.003 6.033]
Why exactly does this fail? You need to know some details of the implementations of both ArrayList and HashSet to figure it out:
It is the changed hashCode() value that is the problem. If we had added a StringBuilder to the set instead, mutated it, and added it again, the set would have worked fine, since StringBuilder uses Object.hashCode().