CSE 331 / Section 4

Contents:

Generics
Object Equality
Hash tables
- Creating a Good Hash Code
- Notions of Equivalence in the Java API

Generics

Generics is a static (compile-time) feature of Java that allows for more powerful type checking, and hopefully better, safer code.

Generic Classes

To write a generic class, you denote in angle brackets one or more (comma-separated) parameter types. It is recommended that these names be single capital letters (e.g. T or E) with some meaning (e.g. T stands for "type" and E in the Java collection classes stands for "element"). For example, here is a rough approximation of java.util.Map.Entry, which represents a (key, value) pair in a map.

/**
 * Entry is a mutable class that represents a (key, value) pair in a map.
 * 
 * @specfield key : the key from which this entry maps
 * @specfield value : the value to which this entry maps
 */
public class Entry<K, V>  {    
    /*
     * AF(e): 
     * key = e.key
     * value = v.value
     */
    private K key;  // notice key's type is K
    private V value;  // notice value's type is V            
    
    /**
     * @return a new Entry with key = k, value = v
     */
    public Entry(K k, V v) {
        value = v;
        key = k;        
    }    
    
    /**
     * @modifies value
     * @effects value_post = newValue
     * @return value_pre
     */
    public V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }
}

Notice that the parameter types K and V can be referred to anywhere in the class, including as parameters and return types of methods.

An invocation of a generic class assigns actual types to the parameter types. For example, the code:

Entry<Integer, String> myEntry = new Entry<Integer, String>(2, "two");

assigns Integer and String to K and V, respectively.

Declaring a reference to and instantiating a generic class without parametrizing it (with < >) is referred to as using a raw type. In general, proper generics code should not use raw types (exceptions include type casting).

Generic Methods

In addition to declaring parameter types for a class, you can make a particular method generic as well. To assign parameter types to a method, you use a similar notation. Suppose you have a class that contains a number of static utility functions, and one of them takes a List and returns a new one that is a reversed copy of the original List. Ideally, you would like the returned List to have the same parameter type as the input. You can accomplish this by writing:

public class Utils {
    /**
     * @requires lst!=null
     * @modifies nothing
     * @return a List r equal to the sequence < lst[lst.length-1], lst[lst.length-2], ..., lst[0] >
     */
    public static <E> List<E> makeReversedList(List<E> lst) {
        List<E> revList = new LinkedList<E>();
        for (E e: lst) {
            revList.add(0,e);  // insert at the front
        }
        return revList;
    }
    ...
}

The <E> appearing before the return type declares that this method has a parameter type E. The <E> appearing in List<E> in the argument list tells the compiler what to map E to when the method is invoked. Thus, an invocation:

List<String> stringList = new LinkedList<String>();
stringList.add("blah");
stringList.add("foo");
List<String> rev = Utils.makeReversedList(lst);

will assign the parameter type E to an actual type of String.

Generics and subtypes

The type hierarchy of parameterized types is based on the base type and not on the type of the parameters.

Question:: What happens when we try to evaluate List<Integer> intLst = new ArrayList<Number>()?
Answer:: We will get a compile time error. The type List<Integer> is NOT a subtype (Java or behavioral) of List<Number>, nor is there a subtyping relationship in the opposite direction.

List<Integer> cannot be substituted for List<Number> because a client expecting the latter can try to add elements of type Number to a List<Integer> and receive a compile error.
List<Number> cannot be substituted for List<Integer> because a client expecting the latter can try to get() an element, expecting an Integer, but List<Number> only guarantees that a Number is returned.

What about Collection<Integer> intLst = new ArrayList<Integer>()? Okay.

What about Collection<Object> intLst = new ArrayList<Integer>()? Error.

Wildcards

A common mistake made when making code generics-compliant is to blindly parametrize argument types to Object. For example, suppose we wish to write another utility method printList() that prints a List with elements of any type. Since unlike in makeReversedList(), we don't need to worry about the return type of this method being properly parametrized (the method is declared as void), one might naively conclude that it'd be easiest just to explicitly set the argument list's parameter type as Object, as in the following:

/**
 * @requires lst!=null
 * @modifies nothing
 * @effects prints the elements of lst in order to standard output
 */
public static void printList(List<Object> lst) {
    for (Object e: lst) {
        System.out.println(e);
    }
}

Question:: What is the problem with this?
Answer:: Suppose we call printList() with the List<String> stringList we initialized in the earlier example. This would give a compile error! The reason is that List<String> is not a subtype of List<Object>, even though String is a subtype of Object.

What is the solution to this? We can just make the method generic by adding <E> as we did for makeReversedList(), but since we don't really care about the element type of the List in this method, we can take another approach. Java allows the ? character to be used as a wildcard for matching parameter types. Thus, if we write:

public static void printList2(List<?> lst) {
    for (Object e: lst) {
        System.out.println(e);
    }
}

The <?> in the argument type can be used match any parameter type, and the rest of the method does not care what is actually matched. But since we know that whatever was used to match ? (e.g. String) must be a Java subtype of Object, the compiler allows you to set elements from the List as Objects without error.

Bounded Wildcards

Sometimes, you want <?> to, instead of matching any type, match only types that are the same as or Java subtypes of a given type Foo. You can accomplish this by writing <? extends Foo> instead of just <?>. Similarly, you can require that only Foo or its supertypes are matched with <? super Foo>.

Let's apply this to an example. Suppose we want to write a utility function max() that returns a maximum element in the list, assuming that the elements in the list all implement Comparable and are in fact mutually Comparable to one another. Here are the constraints we need to make on the parameter type <E> of the argument List<E> lst.

E must implement Comparable, which means E is a Java subtype of Comparable.
Comparable itself is generic and its parameter type denotes the type of objects that can be passed into its compareTo() method. In order for our comparisons to work, compareTo() must allow at least inputs of type E.
Furthermore, it is actually acceptable for Comparable to be parametrized to a supertype of E (this just means that its compareTo() method requires less, which is fine).

With these observations, we can arrive at the following method signature and implementation of max(). Note that we can call e.compareTo() without compile errors because e's type E was required to match a subtype of Comparable.

/**
 * @requires lst!=null && no element in lst is null
 * @modifies nothing
 * @return an element lst[i] in lst such that  
 *         for all 0<=j<lst.length | (lst[i].compareTo(lst[j]) >= 0)
 */
public static <E extends Comparable<? super E>> E max(List<E> lst) {
    E max = null; // the currently known maximum        
    // iterate over lst, overwriting max if necessary
    for (E e : lst) {
        if (max==null) { // initial case
            max = e; // set max to the first element of list
        } else if (e.compareTo(max)>0) {
            max = e; // set max to e if e is bigger
        }
    }
    return max;
}

Casts

Since generics is solely a compile-time feature in Java, parameter types are "erased" after compilation (they disappear and all generic types become raw types). Thus, runtime instanceof calls and type casts (in general) can not work with parameterized types. Some up- or down-casts from parameterized types are exceptions; for example, you can cast from List<T> to LinkedList<T> because the compiler knows that List's parameter T constrains LinkedList as well. On the other hand, you cannot cast Object to LinkedList<T> without a compiler error, because the parameterization of the LinkedList has been lost.

Object Equality

Behavioral versus Observational Equivalence

In 331 we will be concerned with two notions of equivalence for objects: behavioral equivalence and observational equivalence.

Two objects are behaviorally equivalent if no sequence of operations (both mutators and observers) can distinguish between them. Behavioral equivalence ensures that if two objects are equal or unequal at one point, they continue to be equal or unequal forever after; there is nothing you can do to two unequal objects that will make them become equal.
Two objects are observationally equivalent if no series of observer operations (we exclude mutators) can distinguish between them.

We exclude the == operator from the set of operations we can use to distinguish objects.

Question:: Which type of equivalence does Object.equals() implement?
Answer:: Behavioral.
Question:: Which type of equivalence does String.equals() implement?
Answer:: Both.
Question:: When are these two types of equivalence the same?
Answer:: For immutable types, both observational and behavioral equivalence are the same, since there are no mutators. Consequently, only when we are implementing a mutable ADT do we have to choose between these two notions of equivalence.

One recommendation (from Barbara Liskov) is to always use the equals method to provide a notion of behavioral equivalence for an ADT, while using an additional similar method to provide the ADT's notion of observational equivalence. In general, the Java API does not follow this guideline for equals and there is no similar method defined in Object.

Object contract

Each ADT should provide its own notion of what it means for two instances of that ADT to be equivalent. In general, you should never use the == operator to compare two object references, unless you are explictly trying to see if they are the exact same object in memory.

Question:

When would you do that?

Answer:

When there is only one copy of the object. We only know this will be true in the case of classes that guarantee one copy exist (such as enum classes)
When you are concerned with behavioral equality in mutable objects. More on this later.

(Note that Java's primitive types are not object references; you should always use == on them, as there is no way to call an equals() method.)

The equals() method specified in Object should be used to test for observational equivalence. Each ADT you specify should override the equals() method, if needed, to provide a notion of equivalence for that ADT.

public boolean equals(Object obj)

The equals() method has the following specified requirements for non-null references x, y, and z:

Never equal to null: x.equals(null) is always false.
Reflexive: x.equals(x) is always true.
Symmetric: If x.equals(y) it must be that y.equals(x)
Transitive: If x.equals(y) and y.equals(z) then it must be that x.equals(z)
Consistent: Multiple invocations of x.equals(y) return the same value unless one of the objects is mutated.

Note that this is quite an intentionally incomplete definition, and so the following meets the spec of being an equivalence relationship in Java, even though it does not seem particularly useful:

public boolean equals (Object obj) {
      return (obj instanceof ThisClass);
}

Question:

Given that the method signature is:

public boolean equals(Object o)

the invocation equals(null) can do one of the following things:

return true
throw an exception
return false

Why does equals(null) return false?

Answer:

Option 1: Return true

If equals(null) were to return true, then intuitively, this would not make any sense because a non-null object would be considered equal to something that is not even an object. This is inconsistent with our idea of equality.
Option 2: Throw an Exception

Because equals() is a method of Object, it is a fundamental method in Java that is used often, and in a variety of ways. Thus, it would be nconvenient if it threw an exception when passed a null reference when there is a more reasonable response, namely:
Option 3: Return false

This consistent with our idea of equality because it considers objects and non-objects unequal, so this appears to be the most reasonable result for equals(null).

(Note however that returning false is asymmetric. If a is null, and b is not, then b.equals(a) returns false, while a.equals(b) throws a NullPointerException. This illustrates why the condition of an equivalence relation for equals() only applies for non-null input.)

Equality and subtypes

Consider a class Point2D and a class Point3D that extends it:

public class Point2D {

  protected int x;
  protected int y;

  public Point2D(int x, int y) {
    this.x = x; this.y = y;
  }
}

public class Point3D extends Point2D {

  private int z;

  public Point2D(int x, int y, int z) {
    super(x, y);
    this.z = z;
  }
  
}

Now let's explore some implementations of equals():

Try #1

// Point2D
public boolean equals(Object obj) {
  if (!(obj instanceof Point2D)) return false;
  Point2D p = (Point2D)obj;
  return this.x == p.x && this.y == p.y;
}

// Point3D
public boolean equals(Object obj) {
  if (!(obj instanceof Point3D)) return false;
  Point3D p3 = (Point3D)obj;
  return this.x == p3.x && this.y == p3.y && this.z == p3.z;
}

Question:

What part of the equals() contract does this violate?

Answer:

This violates the symmetry requirement of the equals() contract:

Point2D p2 = new Point2D(2, 71);
Point3d p3 = new Point3D(2, 71, 82);

p2.equals(p3); // true 
p3.equals(p2); // false

Try #2

// Point3D
public boolean equals(Object obj) {
  if (!(obj instanceof Point3D))
    return super.equals(obj);
  Point3D p = (Point3D)o;
  return this.x == p.x && this.y == p.y && this.z == p.z;
}

Question:

What part of the equals() contract does this violate?

Answer:

This seems like it might be an improvement; however, it now violates transitivity.

Point2D p3Da = new Point3D(0, 6, 170);
Point2D p3Db = new Point3D(0, 6, 171);
Point2D p2D = new Point2D(0, 6);
p3Da.equals(p2D); //true
p2D.equals(p3Db); //true
p3Da.equals(p3Db); //false: violates transitivity

Try #3

// Point2D
public boolean equals(Object obj) {
  if (obj == null || !this.getClass().equals(obj.getClass())) {
    return false;
  }
  Point2D p = (Point2D)o;
  return this.x == p.x && this.y == p.y;
}

// Point3D
public boolean equals(Object obj) {
  if (obj == null || !this.getClass().equals(obj.getClass())) {
    return false;
  }
  Point3D p3 = (Point3D)o;
  return this.x == p3.x && this.y == p3.y && this.z == p3.z; 
}

Question:

What is good about this solution? What is bad?

Answer:

This is a better solution because it satisfies the equals() contract. Unfortunately, it is impossible to compare a Point2D with a Point3D under this implementation; however, it is difficult to define how such a comparision should be made, anyway. The larger problem with this solution is that "harmless" subclasses cannot be compared. For example, consider the following:

public class Point2DPolar extends Point2D {

  public double angle() {
      //returns the polar angle of this point
  }

  public double radius() {
      //returns the polar radius of this point
  }

}

This leads to the following problem:

Point2D p = new Point2D(4, 20);
Point2D pp = new Point2DPolar(4, 20);
p.getX() == pp.getX(); // true 
p.getY() == pp.getY(); // true 
p.equals(pp);          // false

Thus, two objects that appear to be the same are not considered equal when compared with the equals() method. (Note that this problem would also exist if we had used Point3D in place of Point2DPolar in this example). Even though PointPolar is not fundamentally different than its superclass, the implementation of equals() in Point2D makes the two types incomparable. Thus, this is not an ideal solution to our problem.

Question:

How can we avoid this problem entirely?

Answer:

Instead of extending Point2D, Point3D can reuse Point2D by composing it as a field, and delegating calls to it internally, as necessary:

 
public class Point3D {

  private Point2D p;
  private int z;

  public Point3D(int x, int y, int z) {
    p = new Point2D(x, y);
    this.z = z;
  }

}

Note that the client of Point3D need not know that Point3D uses Point2D.

Because Point2D and Point3D do not extend one another, there are no problems involving inheriting a broken equals() method. However, we do lose something: previously, we could add both Point2D and Point3D objects to a Point2D[ ] array. Because Point3D is no longer a subclass, we can no longer do this. The solution is to create an interface that both classes implement:

public interface Point {

  public int getX();

  public int getY();

}

Thus, we could create a Point[] array to which we could add both Point2D and Point3D objects. Note that great care must be exercised when using the equals() method of objects that are referred to through their interface, as an interface rarely specifies how equals() must be implemented. For example, if we have the following:

Point p1, p2; // suppose these were initialized outside this block of code 
Set<Point> set = new HashSet<Point>();
p1.getX() == p2.getX(); // true 
p1.getY() == p2.getY(); // true 
set.add(p1);
set.add(p2);
System.out.println(set.size());

What gets printed out? From this code, there is no way to know because it is unclear what happens when add(p2) is invoked since set tests if p1.equals(tp2). Thus, be conscious of the use of equals() in scenarios like this.

Hash tables

Hash-codes are used by data structures such as hash tables (called HashMap in Java).

The goal of hash tables is to let us store objects in such a way that we can perform (expected) constant time lookups later on. We start with an array of buckets. Note that it takes constant time to access any arbitrary bucket. A hash function is used to mathematically map an object to a bucket. The hash-function is a mapping from an object (such as a String) to a bucket number. When an object is being stored, it is mapped to the appropriate bucket and stored there. When we are looking up an object, we only need to look at the bucket identified by that object's hash. It is normal for more than one object to hash to the same bucket. This can be solved by having a linked list for each bucket in the hash-table (there are also other solutions). If a good enough hash function is chosen these lists will be small for each bucket and the time to look up an object will be roughly constant.

Java avoids the theoretical issues associated with forming well-balanced hash-tables. The hashCode() method defined in Object lets any hash-table implementation map an object to a number. The hash-table code can then use an internal mapping to create the actual hash (i.e. bucket index) from this number.

Java really just has one main requirement for hashCode() implementations: whenever two objects are equal() they must have the same hashCode(). In general, whenever you override equals(), you will also need to override hashCode().

Question:: What would happen if this requirement was not maintained?
Answer:: Two "equal" objects could hash to different buckets, allowing duplicates to unintentially be stored and breaking methods such as contains.
Question:: What about mutable objects and hashing?
Answer:: If the hashCode() method of the object depends on a mutable field, bad things can happen. If you mutate the object after storing it you won't be able to find it anymore because it will be in the wrong bucket.

Creating a Good Hash Code

Let's try to design a hash code for the Duration class we have seen in lecture:

public class Duration {
  private int min;
  private int sec;

  public Duration(int min, int sec) {
     this.min = min;
     this.sec = sec;
  }
}

Consider the following implementations for Duration.hashCode():

Try #1

return 1;

This is a terrible hash code because it would make everything hash to the same bucket, making the hash code function as a linked list (eliminating the benefit of constant-time-lookup that a hash table should provide).

Try #2

return min;

This is a better hash code; however, it will still result in a lot of collisions because every Duration with the same value for min will end up in the same bucket.

Try #3

return min + sec;

Again, this is better, but there will still be collisions for a number of different durations. For example, when (min,sec) = (3,4) and (min,sec) = (4,3), both will hash to the same bucket.

Try #4

return sec << 16 + min;

This is a bigger improvement because now only equal durations have equal hash codes (assuming min and seconds take on values between 0 and 60). However, if only the low order bits are used in the hash (which happens in techniques such as extensible hashing) then there will still be collisions when durations have the same minute. Note that this shows that just because you have unique hash codes for unique values, this does not guarantee a good hash.

Try #5

int result = 17;
result = 37 * result + sec;
result = 37 * result + min;
return result;

"Whoah -- where did this come from?" you ask. Well, the truth is that number theory people have discovered that multiplying by prime numbers helps spread out hash code values. In fact, they have discovered a lot of neat things about how to produce better hash codes, but they are often quite complicated, so it's probably easiest to look at the guidelines that Bloch has in Item 8 and apply them to your code rather than devise your own algorithms to achieve good hashing (unless you're into that sort of thing).

Try #6

Actually, there was nothing wrong with Try #5, but if the hashCode() method is going to be invoked often, and is always going to return the same number, then it will help to cache the hash code and to declare the hashCode() method final so that the cost of method lookup and computation will be reduced. For a large number of elements in a hashed data structure (like nodes in the graph of Boston), this will have a noticeable impact on performance.

Even if Duration were a mutable object, we could still cache the hash code by updating the cached value every time a Duration were mutated. Alternatively, we could set a flag whenever it is mutated, and then check that flag before returning the cached value: if it has been mutated, then we consider the cached value stale and then recalculate and cache the hash before returning it.

The number of optimizations like this that you want to encode depends on the nature of the problem — if hashCode() is inexpensive to compute, and is not accessed often, then these mechanisms will just make your code more difficult to maintain without providing any real benefit to the client. The important thing is to think carefully about how likely hashCode() is to cause a collision, and to make sure that it is in agreement with equals().

Notions of Equivalence in the Java API

Unfortunately, Java does not follow a consistent notion of equivalence for mutable ADT's, beyond the specifications in Object. Some mutable ADT's defined by the Java API use behavioral equivalence (e.g. StringBuilder), other mutable ADT's use observational equivalence(e.g. Date). In general, you should look at the specs of the class you are using before relying on either type of behaviour.

Question:

What does the following print:

List<String> arrayList = new ArrayList<String>();
List<String> linkedList = new LinkedList<String>();
arrayList.add("a");
linkedList.add("A".toLowerCase());
arrayList.add("b");
linkedList.add("B".toLowerCase());

System.out.println(arrayList.equals(linkedList));
System.out.println(linkedList.equals(arrayList));

Answer:

true
true

The classes in the Java collections framework consistently use the notion of observational equivalence. This is why the LinkedList above is equal to the ArrayList. Furthermore, note that the references in the two lists are not == to each other.

The Java implementations of the List interface can hold mutable objects without any complications. However, Sets are different. According to the specifications, the behaviour of Set implementations can be completely arbitrary if an element contained in the set is mutated.

Question:

Can you guess what the following does?

Set<List<String>> set = new HashSet<List<String>>();
List<String> list = new ArrayList<String>();
list.add("6.170");
list.add("6.004");
list.add("6.003");
set.add(list);
set.contains(list); // what does this return?

list.add("6.033");
set.contains(list); // what does this return?

set.add(list);

System.out.println("set has " + set.size() + " elements");
for (List<String> l : set) {
	System.out.println(list);
}

Answer:

The first contains() returns true, the second returns false and the code prints out the following:

set has 2 elements
[6.170 6.004 6.003 6.033]
[6.170 6.004 6.003 6.033]

Why exactly does this fail? You need to know some details of the implementations of both ArrayList and HashSet to figure it out:

ArrayList.hashCode():: The returned hash-code is based on the hashCode()'s of all the elements in the list. The implementation in AbstractList actually iterates over the entire list adding up the hashcodes.
HashSet.contains(elt):: Uses elt.hashCode() to map to a bucket in its internal hashtable. If it sees an object equal to obj, this method returns true, otherwise it returns false. Doesn't look anywhere else in the data structure.
HashSet.add(elt):: Maps the object to a bucket and adds it to the bucket if it isn't already in that bucket.

It is the changed hashCode() value that is the problem. If we had added a StringBuilder to the set instead, mutated it, and added it again, the set would have worked fine, since StringBuilder uses Object.hashCode().

Section 4: Generics, Equality, and Hashing

Generics

Generic Classes

Generic Methods

Generics and subtypes

Wildcards

Bounded Wildcards

Casts

Object Equality

Behavioral versus Observational Equivalence

Object contract

Equality and subtypes

Try #1

Try #2

Try #3

Hash tables

Creating a Good Hash Code

Try #1

Try #2

Try #3

Try #4

Try #5

Try #6

Notions of Equivalence in the Java API