CSE143 Notes for Wednesday, 1/11/06

I passed out a new version of the IntList class and spent some time discussing it. As I've said several times, we are using the IntList class as a way to understand the ArrayList class that is part of the Java class libraries. To underscore that point, I used the computer to examine Sun's documentation for the ArrayList class and I wrote some client code that uses an ArrayList.

In section we discussed the idea of having methods throw exceptions if preconditions are not met. I pointed out that the ArrayList documentation indicates that this occurs. I also mentioned that there are some interesting subtleties that come up. For example, for the add method that takes an index, should it be legal to add at "size"? Normally the legal indexes range from 0 to (size - 1). But for add, it makes sense to add at position size because that would append something to the end of the list. Some people don't think it should work this way, but I pointed out that by reading the documentation for ArrayList, you can see that it is done that way. The new version of IntList throws exceptions whenever a precondition is not met, just as ArrayList does.

We spent some time discussing this paragraph from the ArrayList documentation:

Each ArrayList instance has a capacity. The capacity is the size of the array used to store the elements in the list. It is always at least as large as the list size. As elements are added to an ArrayList, its capacity grows automatically. The details of the growth policy are not specified beyond the fact that adding an element has constant amortized time cost.
Our IntList has a capacity as well that we specify when we construct the object. But obviously the ArrayList is more powerful in that it grows if necessary. The new version of IntList that I passed out has this same property. The idea is that whenever the array becomes full, it allocates a new array and copies everything from the old array to the new one. We talked about how to make that efficient. The description above says that it has "constant amortized time cost".

Constant time? How do they get constant time? For example, if the array has 10 thousand elements in it, then making a new bigger array is going to require copying all 10 thousand elements from the old array to the new one. The key word here is "amortized". For example, in the IntList class, I accomplish this by doubling the size of the array. So if you have a capacity of 10 thousand and you run out of room, then at that point in time you allocate a new array that is 20 thousand long. While it's true that it takes a fair amount of work to set up that new array, it's also true that you won't have to expand again for a while. How long? Since we doubled the capacity, it has enough room to allow you to add 10 thousand items before expanding again. So the idea is that even though that one add is expensive (the one that does the doubling), it will be followed by 9,999 other adds that won't need this extra work. So averaged over all of those adds (that's what "amortized" means), the cost is relatively low.

This documentation says that the details of the "growth policy" are not specified, but I looked it up to see how it's implemented. The current version of ArrayList does something very similar to the doubling. It increases the capacity by 50% each time (multiplying by 1.5 instead of multiplying by 2). This is a more conservative approach than doubling, but it stills gives a good average time because each increase is spread out over many other adds that follow.

I pointed out that the ArrayList class is actually defined as ArrayList<E>. This is known as a "generic" class. Generics are new as of Java 5. The idea is that you can have an ArrayList of any given type. It's more common to use the letter "T" for "Type", but in the collections classes, Sun often uses the letter "E" which is short for "Element type".

I switched over to TextPad to show how to write code that uses an ArrayList. If it weren't a generic class we would construct one by saying:

        ArrayList list = new ArrayList(20);
Because this is a generic class, we can't use a simple name like ArrayList. In our case, we wrote code for an ArrayList of String values, so the type becomes ArrayList<String>. Generics are easier to use if you just remember the simple idea that its type includes the "<String>" part. For example, in the line of code above, we construct the object by mentioning its type twice:

        ArrayList list = new ArrayList(20);
        ~~~~~~~~~            ~~~~~~~~~
          type                 type
Now the type is ArrayList<String>, so the line of code we write is as follows:

        ArrayList<String> list = new ArrayList<String>();
        ~~~~~~~~~~~~~~~~~            ~~~~~~~~~~~~~~~~~
              type                         type
This line of code can be confusing, but when you remember that the "<String>" is part of the type, it makes sense.

After constructing the list, we can add some values to it. Obviously, since it is an ArrayList<String>, we can add only String values:

        list.add("how are you??");
        list.add("boring class");
        list.add("really, I though it was keen!");
        list.add("keen???  who says that anymore");
        list.add("keen is the new cool way to say cool!");
        list.add("hah!");
After the calls on add, we included a println statement to see what is in the list:

        System.out.println("initially list = " + list);
This involves string concatenation and whenever Java concatenates a string and an object, it calls the toString method of the object. So this line of code is really equivalent to:

        System.out.println("initially list = " + list.toString());
It produced the following output:

initially list = [how are you??, boring class, really, I though it was keen!, keen???  who says that anymore, keen is the new cool way to say cool!, hah!]
Then I talked about the idea of an iterator. If you knew that for the rest of your life, you'd always be working with arrays, then you'd have little use for iterators. With arrays you can call a "get" method that can quickly access any element of the array. But many of the other data structures we will be looking at don't have this kind of quick access and for those structures, iterators will make a lot more sense. So iterators will seem a bit silly when tied to an array, but we'll eventually see much more interesting examples of iterators.

In general, we think of an iterator as having three basic operations:

Sun adopted the convention early on that the second and third steps would be combined into one operation known as "next" that does two different things: it returns the next value and it advances the iterator to the next value. So in Java there are two fundamental operations:

We saw this in the Sun documentation. The ArrayList class has a method called iterator() that returns something of type Iterator<E>. And we looked at the Iterator<E> documentation and saw that it has the two method mentioned above one one extra one. There is an optional "remove" method that is supposed to remove the last value returned by the "next" method.

So we added this line of code to ask our ArrayList for an iterator that we kept track of with a variable called "i":

        Iterator<String> i = list.iterator();
We don't have to call "new" because the ArrayList class itself constructs the iterator object and returns a reference to it. Then we wrote some simple code to use the iterator to print out the different Strings and their lengths:

        while (i.hasNext()) {
            String s = i.next();
            System.out.println("next string = " + s);
            System.out.println("length = " + s.length());
        }
This produced the following output:

        next string = how are you??
        length = 13
        next string = boring class
        length = 12
        next string = really, I though it was keen!
        length = 29
        next string = keen???  who says that anymore
        length = 30
        next string = keen is the new cool way to say cool!
        length = 37
        next string = hah!
        length = 4
Then we spent a little time talking about how this iterator works. What kind of information would it need to know in order to do its work? Someone mentioned that it needs to know a current position in the list. That's absolutely right. The iterator would want to keep track of where it is in the ArrayList. Initially it is looking at the first value (at index 0). As you make calls on next, this would change.

I asked what other kind of state information would it need? This is the kind of question you always need to ask when you implement an object. What does this object need to keep track of? With a current index, it would be able to keep track of how much of the list it has examined. But how does it know when there isn't anything left to look at? In other words, how can we write the hasNext method? Someone mentioned that we'd need to know the size of the list and that's true. But we need to know even more. Someone said we'd need to keep track of the list itself. Otherwise we can't get individual values out of the list. So the second bit of state information that the iterator needs is a reference to the list it is iterating over.

I pointed out that the remove method is implemented for ArrayList, so we added some code to our while loop to remove strings of even length:

        while (i.hasNext()) {
            String s = i.next();
            System.out.println("next string = " + s);
            System.out.println("length = " + s.length());
            if (s.length() % 2 == 0) {
                i.remove();
            }
        }
We also added a println after the loop to see what is in the list and we saw that the strings of even length had been removed by this code.

Then I asked people about "tricky" cases for remove. What could cause it to fail? Someone mentioned that removing something twice might be a problem. We modified our code as follows:

        while (i.hasNext()) {
            String s = i.next();
            System.out.println("next string = " + s);
            System.out.println("length = " + s.length());
            if (s.length() % 2 == 0) {
                i.remove();
                i.remove();
            }
        }
When we ran this code, we got an IllegalStateException thrown for the second call on remove. I pointed out that when exceptions are thrown, Java gives you information about the exact line of code where the error occurred and if you have followed the instructions in handout #2 to turn on line numbers in TextPad, you can see exactly what line of code is causing the problem. Not surprisingly, it was the second call on remove that we added.

I asked if there were other tricky cases. Someone mentioned removing before next has been called and I agreed that was something that should generate an exception.

Then I asked people to think about how this would be implemented. Someone said we could use a boolean variable. I asked what the variable would keep track of. "Whether we're good to go." That's a nice way of thinking of it. The iterator keeps track of whether or not the remove method can be called right now. Initially it would not be okay to call remove. Once you call next, then remove becomes okay. But then when you call remove, okay goes back to false, to prevent two calls on remove in a row. I pointed people to my implementation of the IntListIterator and mentioned that this is exactly how it is implemented.

I asked people if there were any other special cases. Someone said that you should not call next if hasNext is not true. We tested this by putting an extra call on next after the loop:

        while (i.hasNext()) {
            ...
        }
        String s = i.next();
This threw a NoSuchElementException. I pointed out that I have a test for exactly this case in my version of the iterator.

I asked if there were any other special cases. Someone said that a person might remove a value from the list that the iterator is trying to look at. We explored this by rewriting the loop as follows:

        while (i.hasNext()) {
            String s = i.next();
            list.remove(0);

            ...
        }
So we call next with the iterator to get the next thing in the list and then we tell the list to remove the value at index 0 (the front of the list). When we ran this code, we found that it generated an exception as well. It was called a ConcurrentModificationException. Java iterators have been written with a "fail fast" behavior. The idea is that you should not be modifying the list while you are iterating over it. So if the iterator determines that the list has been modified, it throws an exception (it fails).

I didn't build this functionality into my IntList class. It's a bit of a pain to do. This behavior is implemented using a field called modCount that keeps track of how many modifications have been made to the list. The iterator makes sure that the current value of modCount is the same as its initial value. If the iterator encounters a difference, it throws the exception.

I then showed people what is known as the "extended for loop". In other languages this is known as the "foreach" loop. This is something that is new as of Java 5. It allows you to use an iterator in a much simpler manner. For example, to print each of the Strings, you can say:

        for (String s: list) {
            System.out.println("next string = " + s);
        }
This works because the ArrayList class has an iterator. Before the end of the quarter, we'll see how to add this functionality to our IntList as well. In the meantime, it's nice to know that this other for loop exists. It also works for arrays. So if you have a variable called "data" of type int[], you can say:

        for (int n: data) {
            // do something with n
        }
In the case of an array, it iterates over each array value from first to last. In this case the variable is of type int because the array is composed of individual int values.

The foreach loop is a nice alternative and will probably end up making iterators more popular, but they aren't quite as powerful as the general iterator. For example, there is no way to call the remove method with the foreach loop. For that, you'd need the kind of loop we wrote above.


Stuart Reges
Last modified: Fri Jan 13 16:48:47 PST 2006