CSE143X Notes for Wednesday, 11/26/19

I returned to our case study using the IntList class that we've been discussing all quarter. I reminded people that I'm discussing these classes as a way to understand the ArrayList<E> and LinkedList<E> classes and the List<E> interface that are part of the collections framework in the java.util package. Our versions use simple ints, but they have the same kind of methods and are implemented in a very similar manner to the others.

I first gave a recap of what we've seen. We've been looking at an array-based class called ArrayIntList that has a number of operations for manipulating a list of integers. We discussed a variation of ArrayIntList called LinkedIntList that uses a linked list instead of an array to store the data. We saw that we can capture the "int list" abstraction by defining an IntList interface that both classes implement. So we end up with a generic IntList interface and we have two specific implementations: ArrayIntList and LinkedIntList.

I reminded people that the basic structure of an ArrayIntList is an array of values along with a size variable:

        public class ArrayIntList {
            private int[] elementData;
            private int size;
        
            ...
        }
First I discussed the growth policy of the class. Java's ArrayList class increases the size of the underlying array when the client exceeds the capacity of the array. The new version of ArrayIntList has this functionality built into it. It has an ensureCapacity method that constructs a new array if necessary.

Obviously you don't want to construct a new array too often. For example, suppose you had space for 1000 values and found you needed space for one more. You could allocate a new array of length 1001 and copy the 1000 values over. Then if you find you need space for one more, you could make an array that is 1002 in length and copy the 1001 old values over. This kind of growth policy would be very expensive.

Instead, we do something like doubling the size of the array when we run out of space. So if we have filled up an array of length 1000, we double its size to 2000 when the client adds something more. That makes that particular add expensive in that it has to copy 1000 values from the old array to the new array. But it means we won't need to copy again for a while. How long? We can add another 999 times before we'd need extra space. As a result, we think of the expense as being spread out or "amortized" over all 1000 adds. Spread out over 1000 adds, the cost is fairly low (a constant).

You will find that the built-in ArrayList class does something similar. The documentation is a little coy about this saying, "The details of the growth policy are not specified beyond the fact that adding an element has constant amortized time cost." If you look at the actual code, you'll find that increase by 50% each time (a multiplier of 1.5).

Then we spent some time discussing how the iterator for ArrayIntList is implemented. We want to be able to write code like the following:

        int[] data = {18, 4, 97, 3, 4, 18, 72, 4, 42, 42, -3};
        ArrayIntList numbers = new ArrayIntList();
        for (int n : data) {
            numbers.add(n);
        }
        System.out.println("numbers = " + numbers);
		
        // remove multiples of 3 from the list
        Iterator<Integer> itr = numbers.iterator();
        while (itr.hasNext()) {
            int n = itr.next();
            if (n % 3 == 0) {
                itr.remove();
            }
        }
        System.out.println("numbers = " + numbers);
Notice that we want to use the Iterator interface, so our class will have to implement Iterator<Integer>. The main function the iterator performs is to keep track of a particular position in a list, so the primary field will be an integer variable for storing this position:

        public class ArrayIterator implements Iterator<Integer> {
            private int position;

            public ArrayIterator(?) {
                position = 0;
            }

            public int next() {
                position++;
            }

            ...
        }
I asked people how we would implement hasNext and someone said we'd have to compare the position against the size of the list. I then said, "What list?" Obviously the iterator also needs to keep track of which list it is iterating over. We can provide this information in the constructor for the iterator. So the basic outline became:

        public class ArrayIterator implements Iterator<Integer> {
            private ArrayIntList list;
            private int position;

            public ArrayIterator(ArrayIntList list) {
                position = 0;
                this.list = list;
            }

            public int next() {
                use get method of list & position
                position++;
            }

            public boolean hasNext() {
                check position against size
            }

            ...
        }
We briefly discussed how to implement remove. We have to keep track of when it's legal to remove a value. Recall that you can't remove before you have called get and you can't call remove twice in a row. We decided that this could be implemented with a boolean flag inside the iterator that would keep track of whether or not it is legal to remove at any given point in time. Using this flag, we can throw an exception in remove if it is not legal to remove at that point in time:

        public class ArrayIterator implements Iterator<Integer> {
            private ArrayIntList list;
            private int position;
            private boolean removeOK;
        
            public ArrayIterator(ArrayIntList list) {
                position = 0;
                this.list = list;
                removeOK = false;
            }
        
            public int next() {
                use get method of list & position
                position++
                removeOK = true;
            }
        
            public boolean hasNext() {
                check position against size
            }
        
            public void remove() {
                if (!removeOK) {
                    throw new IllegalStateException()
                }
                call remove method on list
                removeOK = false;
            }
        }
This is a fairly complete sketch of the ArrayIterator code.

Then I showed a new version of the IntList interface with some extra operations:

        public interface IntList {
            public int size();
            public int get(int index);
            public int indexOf(int value);
            public boolean isEmpty();
            public boolean contains(int value);
            public void add(int value);
            public void add(int index, int value);
            public void addAll(IntList other);
            public void remove(int index);
            public void removeAll(IntList other);
            public void set(int index, int value);
            public void clear();
        }
We expect that each of our implementations will provide all of this functionality. As we go to implement these different operations, we'll find that some of them are quite different. For example, when we write the "get" method to return a value at a particular index, for the array we'll be able to just ask for elementData[index], but for the linked list, we'll have to start at the beginning of the list and keep doing some kind of "current = current.next" operation to position ourselves to the right spot in the list. So the implementations of "get" will be very different in the two classes.

But what about a method like addAll? It is supposed to add all of the values from one list to another list. We'll probably build it on top of low-level operations like add. So perhaps we'll write the same code for each class.

How do we eliminate redundancy? Should we change the interface to an abstract class instead? Someone said we want both and I said that's a good idea. Because IntList is an interface, anyone can implement their own version of IntList in any way they want. If we were to change it to an abstract class, then we'd be forcing people to extend our class. You only get one inheritance relationship, so it would be very annoying for someone to tell you that it has to be used to extend a particular abstract class.

Instead, we have both. We keep IntList as an interface, but we also introduce an abstract class that we can use to factor out common code between our two implementations:

              AbstractIntList----(implements)----> IntList
               /          \
              /            \
        ArrayIntList  LinkedIntList
This is a very flexible approach. The abstract class allows us to eliminate any redundant code for our two implementations. Anyone who wants to can extend our abstract class as well to take advantage of that common code. But they can also do something completely different with no connection to our abstract class as long as they implement the IntList interface. This pattern is so useful that you'll find it used throughout the collections framework. For example, there is a Map interface that has two implementations called TreeMap and HashMap and each of the implementations extend a class called AbstractMap. There is a similar structure for sets and lists.

Then I turned to the question of how we would implement an operation like addAll to include in the AbstractIntList class. How do we do that? The idea is to get values from the second list to add them to "this" list. How do we get those values? Someone said we'd call get. We could do so, writing code along these lines:

        for (int i = 0; i < other.size(); i++) {
            add(other.get(i));
        }
This will work, but there is a problem with it. The idea is that we're writing one version of the code that works for both implementations. This will work fine for the array-based implementation, but it will end up being very slow for the linked list implementation.

Think about how get will be implemented for the linked list. We'll start a variable current at the front of the list and we'll move along until we get to the right spot. This is fairly quick when you want to get to something towards the front of the list. But what happens when you have a really long list? You might find yourself asking the list for the element at index 1000. That takes a lot of work (moving current over 1000 times). Then you'd ask it for the value at index 1001, which requires again a lot of work (moving current over 1001 times).

In fact, writing the code this way will turn addAll into an O(n2) operation for the linked list operation. We obviously don't want it to be that slow if we can avoid it. So how do we access the values more efficiently? Someone mentioned iterators. But what kind of iterator? I showed people Java's generic iterator interface:

        public interface Iterator<E> {
            public boolean hasNext();
            public E next();
            public void remove();
        }
Someone said that we want an Iterator<Integer>. But that means we would need to add something to the IntList interface that indicates that we have a method that we can call to produce an iterator. The convention in Java is to call the method "iterator":

        public interface IntList {
            ...
            public Iterator<Integer> iterator();
        }
If we assume this is part of the interface, then we can write the following code:

        public void addAll(IntList other) {
            Iterator<Integer> i = other.iterator();
            while (i.hasNext()) {
                add(i.next());
            }
        }
Then I reminded people about the foreach loop. Sun added this with Java 5. It has a simpler syntax than the while loop above. There is an interface in Java that is known as the Iterable interface. Saying that you implement the Iterable interface means that you can produce an iterator:

        public interface Iterable<E> {
            public Iterator<E> iterator();
        }
So we modified the header for the IntList interface to specify that it also implements the Iterable interface (remember that you use the "extends" keyword when one interface is being related to another interface):

        public interface IntList extends Iterable<E> {
            ...
        }
With this change, we were ablew to use a foreach loop for a method like addAll:

        public void addAll(IntList other) {
	    for (int i: other) {
	        add(i);
            }
        }
This version works exactly the same as the previous version, because the foreach loop is implemented using iterators.

We then spent a few minutes looking at the implementation of the removeAll method. This method takes an IntList as a parameter and removes all values from this list that appear in the other list. Here is the code:

        public void removeAll(IntList other) {
	    Iterator<Integer> i = iterator();
	    while (i.hasNext()) {
	        if (other.contains(i.next())) {
		    i.remove();
                }
            }
        }
This method works by calling a method a method the iterator class called remove. The idea is that while you are iterating over a list, you might want to remove the last value you saw. It's like saying, "I didn't like that last value...get rid of it."

The iterator's remove method is supposed to remove the value that was most recently returned by a call on next. As a result, it's not legal to call remove two times in a row or to call remove before next has been called. I asked people to think about how this would be implemented. Someone said we could use a boolean variable that keeps track of whether it's okay to remove. When the iterator is first constructed, it is not okay to call remove. Once you call next, then remove becomes okay. But then when you call remove, okay goes back to false, to prevent two calls on remove in a row. This is exactly how it is implemented.

We then spent a few minutes looking at the new version of the ArrayIntList class. We saw that the iterator method constructs an object of type ArrayIterator. ArrayIterator is a private inner class. Because it is defined inside the ArrayIntList class, each instance of the class has access to the ArrayIntList object that constructs it. This allows it to call methods like size to decide what value hasNext should return.


Stuart Reges
Last modified: Wed Nov 27 16:41:34 PST 2019