14. Introduction to Collections

Key concepts

  1. Collections
  2. Ordered collections: ArrayList
  3. Casting
  4. Unordered collections: HashSet and HashMap

Introduction

The concept of a collection is deeply rooted in our everyday experience. Think about the following concepts:
  1. dictionary
  2. class list
  3. deck of cards
  4. music collection
  5. library
  6. bookbag
A dictionary is a collection of words and their definitions; a music collection is an assortment of compact discs, cassette tapes, and so on; a bookbag is a container for items we might use in school. All of these concepts share an important quality: they are all collections of things.

Some collections are inherently ordered. A deck of cards, for instance, has a first card, a last card, a seventh card, and so on. The books in a bookbag or backpack are not really ordered, on the other hand. What is the first book in the bookbag? You could answer this question a variety of ways: the alphabetically first book; or perhaps the first book you put into the bag. In any case, it seems that the answer to this question is unclear. Hence, we might say that a bookbag is an example of an unordered collection.

Because collections are so fundamental to the way we think about the world, every programming language provides some means to represent a collection. Java provides us with a particularly expressive set of collection classes, which we can use in our programs to build powerful abstractions. In this lesson, we'll take a look at just three examples, Arraylist, HashSet, and HashMap.

An Ordered Collection: ArrayList

ArrayLists is a standard Java collection class that provides indexed access. That is, they we can ask for the seventh (or in general, the i-th) element of the collection. The first element of an ArrayList is always located at index 0, rather than index 1. The final element of an ArrayList is found at index size-1, where size is the number of items in the collection. Starting to count at "zero" may seem strange or awkward to a beginning programmer because most humans start counting at "one". Let's look at a few of the important messages understood by ArrayLists:
public class ArrayList {

  /** Answer the size of the collection  */
  public int size();

  /** Answer the object at the given index */
  public Object get(int index);

  /** Add the given object to the end of the collection */
  public void add(Object o);

  /** 
    Remove the object located at the given index from the collection,
    and shift any subsequent elements to the left. 
    @return the element that was removed from the list 
  */
  public Object remove(int index);
}
Now let's look at a simple example of using an ArrayList (in the interpreter):
prompt> ArrayList names = new ArrayList();
prompt> names.add("Ahab");
prompt> names.add("Quequeg");
prompt> names
ArrayList["Ahab", "Quequeg"]
prompt> names.get(0)
"Ahab"
prompt> names.size()
2
In the first line in the above example, we create a new ArrayList object and name it names. We then perform some operations on this object, namely, we add two objects to the collection -- the Strings "Ahab" and "Quequeg". We then ask the ArrayList to show itself, and it shows us that it contains the two Strings that we just added. Notice that they occupy the order in which they were added. Next, we ask the ArrayList to access the element at the zero-th (first) position. This should be the object we added first, namely "Ahab", and we see that indeed it is. Finally, we ask the ArrayList to tell us its size, and it informs us that it contains two elements, as we would expect.

Rounding out the Music domain: The CompactDisc class

Now let's revisit the Music domain we considered a few lessons ago. In that lesson, we built classes to represent individual Songs, but stopped short of implementing a class to represent the CompactDiscs that contain those songs. Recall some of the properties of a CompactDisc:
CompactDisc has a collection of Songs.
CompactDisc has a running time.
CompactDisc has a name.
CompactDisc has an artist.
Let's think also about the interface that a CompactDisc object might support, by thinking about the kinds of things that we might do with real-world CompactDiscs:
  1. create new discs
  2. get the title of a disc
  3. get the artist of a disc
  4. add a song to a disc
  5. get the total running time of a disc
  6. get the ith song of the disc
This simple list of uses hints at the interface a CompactDisc object might want to support. We can sketch it out like so:
public class CompactDisc {

  // A constructor for making new CompactDisc objects

  // Get the title

  // Get the artist

  // Get the total running time

  // Add a song

  // Get the ith song

}
Now we extend our implementation by giving it a representation, guided by the property analysis we performed above:
public class CompactDisc {
  private ArrayList  songs;          // a collection of songs
  private int        runningTime;    // total running time
  private String     artistName;     // artist name
  private String     title;          // disc title

  // A constructor for making new CompactDisc objects

  // Get the title

  // Get the artist

  // Get the total running time

  // Add a song

  // Get the ith song

}
Now we can go ahead and add a constructor and implement the five methods quite easily:
public class CompactDisc {
  private ArrayList  songs;
  private int        runningTime;
  private String     artistName;
  private String     title;

  /** 
    Create a new, empty compact disc.  
    @param artistName the name of the recording artist
    @param title the title of the CD
  */
  public CompactDisc(String artistName, String title) {
    this.artistName = artistName;
    this.title = title;
    this.songs = new ArrayList();
    this.runningTime = 0;
  }

  /** Answer the title of the CD. */
  public String getTitle() {
    return this.title;
  }

  /** Answer the artist of the CD. */
  public String getArtist() {
    return this.artistName;
  }

  /** Answer the running time of the CD. */
  public int getRunningTime() {
    return this.runningTime;
  }

  /** Add the given song to the end of CD. */
  public void addSong(Song aSong) {
    this.songs.add(aSong);
    this.runningTime = this.runningTime + aSong.getLength();
  }

  /** Answer the ith song.
      @param i the index of the song to retrieve 
      @return the Song at the ith position 
  */
  public Song getSong(int i) {
    return (Song)this.songs.get(i);
  }

}
The constructor just initialized the parts of the CD to reasonable values: it sets the artist name and disc title; it creates a new, empty ArrayList to hold the songs; and finally it sets the running time to be zero. The first three methods are just simple accessor methods that let the client of this class get access to some of the basic qualities of a CompactDisc, such as artist name.

The addSong method is a little trickier, but not totally far out. It just adds the Song object (given as a parameter to the method) to the collection of songs. Finally, it increments the running time. Notice how this operation keeps the total running time up to date: every time a new song is added to the CD, it is queried for its running time, and this value is accumulated in the total running time of the CD.

The final method, getting a song from the disc, does indeed look somewhat strange. Can you see what is different about it? It looks the way it does because of a few special rules that the Java language imposes upon us. We'll talk about these rules in the following section.

Casting

Note that the return type of the get method for ArrayLists is Object. For now, think of the type Object as Java's way of expressing any kind of object defined by a class in the language. This includes "built in" classes like String and classes that we define, such as Songs. It does not include, however, primitive built in types such as int, char, double, or boolean. Every class we define has an "is-a" relationship to the class Object. In other words, we can say that every Song "is an" Object. Notice, however, that the reverse is not true: not every Object "is a" Song. This is true of the real world too. For instance, every car is an object, but not every object is a car. There could be objects that are not cars, but instead are chairs, or stoves, or some other kind of object. This rule means that a binding such as the following is perfectly legal:
Object someObject = new Song(...);
The type of the expression on the right is Song. And since every Song is an Object, the above does not result in a type mismatch. Again, the reverse is not true. In other words, it is not the case that every Object is a Song. Let's look at the following fragment:
ArrayList someSongs = new ArrayList();
someSongs.add(new Song(...));
...
Song aSong = someSongs.get(0);              // type mismatch!!
The return type of the method get() is Object, and the compiler will complain of a type mismatch. Why? Because type of the thing being returned from the get() method is Object and in the final statement, we are saying that it is a Song. In other words, we are claiming that an Object is a Song, which is not, in general, true.

This should help explain the strange-looking method body of getSong() in the class CompactDisc. We know that our ArrayList songs contains only Song objects. A client of our class expects that when they get the i-th song of a CompactDisc, they should receive a Song, not an Object. The problem is that the get() method on ArrayList returns an Object. We are faced with a seemingly impossible situation now -- as implementors of the class CompactDisc, we know that the ArrayList contains only Songs, but the compiler will insist it contains only Objects. How can we make the compiler happy? The answer is to cast the type of the object that is returned by get() to the appropriate type. The pattern for casting is as follows:

(<type-name>)<expression>
A cast is a promise to the compiler that the expression is actually of the stated type. The cast expression does not actually change the type of the value of the expression -- if the type of the expression is not as we promise, an error will be reported at runtime. Let's look at another example:
ArrayList things = new ArrayList();
things.add("Hello");
Song aSong = (Song)things.get(0);      // runtime error!!
The above fragment will compile beautifully, but will fail spectacularly when run. Why? Because we have in essence told a great lie. We have made the claim that the object returned by the get() method is actually a Song, when it really is a String. Since it is not the case that a String is a Song, the program cannot run correctly. Since the compiler cannot, in general, catch errors like these when it compiles your program, checking these kinds of errors must be deferred until the program is actually being run.

Casting is a practice that is subject to terrible abuse, and hence must be used carefully. As we know, Java is very strict about using and abusing types, and does its best to catch abuses of types at compile time. Many beginning programmers use casting to get the compiler to shut up about type errors, only to pay the price when their programs are actually run. When using a cast, ask yourself the question, "Is it really the case that this expression is of this type?" If you cannot answer in the affirmative, then you should not be using a cast.

In the getSong method, we can ask ourselves that question and indeed answer in the affirmative. Why? Because we are the only people who can add things to the ArrayList, and we know (because of the interface of the addSong method), that we'll only ever add Song objects to our collection. Hence, when we access an object in that collection, we know that it must be a Song. Therefore, we may use a cast to make this promise to the compiler, and support the desired interface to the getSong method, namely that it should return an object of type Song, not Object.

Notice that the CompactDisc class is a relatively "thin" class. Its primary job is to maintain a collection of objects (Songs) and provide a client a nice interface for adding and accessing songs as well as getting general information about that collection. It performs these operations by building on pre-existing classes (such as String and ArrayList). The CompactDisc class is an example of a facade class. It provides a client with a clean, uniform interface to a set of sub-objects that together implement some sort of object. We'll see another example of this style of programming in the next section.

An Unordered Collection: HashSet

A set is a collection that does not contain duplicates and does not have a specific order. In mathematics, the set {1, 2, 3} is the same as the set {3, 2, 1}. Furthermore, sets support operations such as intersection (the intersection of two sets is the set that contains elements that exist in both sets) and union (the union of two sets is the set that contains elements that exist in either or both sets). Java provides us with a class that we can use to conveniently represent sets in our programs, called the HashSet. HashSets are used less frequently than ArrayLists, but are important for understanding our next collection, the HashMap.

An Unordered Collection: HashMap

Suppose we want to further extend our music collection example. So far, we have written classes to represent songs and compact discs. What about the music collection itself? As the name suggests, it is clearly some kind of collection. Primarily, we wish to access the items in this collection by the title (or perhaps artist). Java provides us with a kind of collection that makes this easy -- the HashMap. Rather than wade through a lengthy analysis and development of the MusicCollection class, let's jump right to an example implementation.
public class MusicCollection {
  private HashMap discs;
  private String name;

  /** Create a new collection with the given name. */
  public MusicCollection(String collectionName) {
    this.name = collectionName;
    this.discs = new HashMap();
  }
 
  /** Add the given disc to the collection. */
  public void add(CompactDisc disc) {
    this.discs.put(disc.getName(), disc);
  }

  /** Answer the disc with the given title. */
  public CompactDisc get(String title) {
    return (CompactDisc)this.discs.get(title);
  }

  /** Remove the disc with the given title. */
  public void remove(String title) {
    this.discs.remove(title);
  }
}
A MusicCollection object really just has two fields, a name and a collection of CompactDiscs. To represent the collection, we use the HashMap class. Here's a peek at its interface:
public class HashMap {
  
  /** Answer the value associated with the given key. */
  public Object get(Object key);

  /** Associate the given key with the given value. */
  public void put(Object key, Object value);

  /** Remove the key and its corresponding value from the table. 
      @return the value if it exists, null otherwise */
  public Object remove(Object key);
}
A HashMap is a collection that maps keys to values. Let's look at a fragment that uses a HashMap:
HashMap songs = new HashMap();
CompactDisc disc1 = new CompactDisc("Banana Album", "Velvet Underground");
CompactDisc disc2 = new CompactDisc("Taking Tiger Mountain", "Brian Eno");
songs.put(disc1.getName(), disc1);
songs.put(disc2.getName(), disc2);
The above fragment associates the name "Banana Album" with the first CompactDisc object we created and the name "Taking Tiger Mountain" with the second object we created. We can visualize the contents of the table as follows:

We can retrieve items from our collection as follows:

CompactDisc aDisc = (CompactDisc)songs.get("Banana Album");
Note that we must perform a cast again, because a HashMap knows how to map key Objects to value Objects. HashMaps can map many keys to the same value, but cannot map a single key to many values. For instance:
HashMap songs = new HashMap();
CompactDisc disc1 = new CompactDisc("Banana Album", "Velvet Underground");
CompactDisc disc2 = new CompactDisc("Taking Tiger Mountain", "Brian Eno");
songs.put("Banana Album", disc1);
songs.put("Great Album!", disc1);
The above example maps the both the keys "Banana Album" and "Great Album!" to the same CompactDisc object (disc1). Extending our example, however:
HashMap songs = new HashMap();
CompactDisc disc1 = new CompactDisc("Banana Album", "Velvet Underground");
CompactDisc disc2 = new CompactDisc("Taking Tiger Mountain", "Brian Eno");
songs.put("Banana Album", disc1);
songs.put("Great Album!", disc1);
songs.put("Great Album!", disc2);
The final line re-maps the key "Great Album!" to disc2. A HashMap cannot by itself represent a one-to-many mapping of the sort we desire. Can you imagine combining a HashMap with an ArrayList to make this sort of mapping work?

What about the following example?

HashMap songs = new HashMap();
CompactDisc disc1 = new CompactDisc("Banana Album", "Velvet Underground");
CompactDisc disc2 = new CompactDisc("Taking Tiger Mountain", "Brian Eno");
songs.put(disc1.getName(), disc1);
songs.put(disc2.getName(), disc2);
CompactDisc anotherDisc = (CompactDisc)songs.get("Revolver");
What value is anotherDisc bound to? Remember that the songs contains only two mappings at this point, and neither of them has a key called "Revolver". Conceptually, the table should tell us that no such mapping exists at this point, and that is exactly what happens. There is a special value null in Java that is used to represent nothingness, and this value is returned by the get() method in this case.

We can exploit this fact to check if we have a given CompactDisc in our collection:

String title = input.readString("Title of the disc? ");
CompactDisc theDisc = (CompactDisc)songs.get(title);
if (theDisc != null) {
  output.println("We have it in stock!");
}
else {
  output.println("Sorry, please try later...");
}
Of course, by examining the interface to HashMap we see that this operation is supported directly, via the containsKey() method:
String title = input.readString("Title of the disc? ");
if (discs.containsKey(title)) {
  output.println("We have it in stock!");
}
else {
  output.println("Sorry, please try later...");
}

Ben Dugan & UW-CSE, Copyright (c) 2001.