CSE143 Notes for Friday, 1/13/06

I began by reviewing some aspects of the new version of IntList (handout #5). I mentioned that the documentation at the beginning of the class is the kind of specification you'd want to give to a client of the class. For Java's ArrayList class, we were looking at nice html pages for the documentation. That is generated by something called "javadoc". This is a great way to automatically produce documentation. There is a certain format you have to follow to include javadoc comments in your class files. When you do that, you can run a utility program called "javadoc" that automatically generates the html files like the one we saw for ArrayList.

I consider javadoc to be one of those aspects of Java that is useful, but not as important as the other topics we need to cover. It's something that would be good to learn on your own if you have the time and the interest, but we don't have time to cover it in detail in cse143. I also find the javadoc comments themselves fairly ugly. The html pages are pretty, but what you end up with in the class file isn't very pretty.

In the list of public methods, I pointed out that I had separated them into three groups:

I mentioned that Java does not make a big distinction between accessors and mutators, but the concept is considered an important one for object-oriented programming. In C++ there is a way to tell the compiler which methods are accessors and which ones are mutators, so C++ programmers tend to care about this more.

I pointed out that the new version of IntList has code that throws exceptions when preconditions are not satisfied. Three of the methods ended up with the exact same test (get, set and remove). When you have this kind of duplication, it's a good idea to put the code in a method. But you wouldn't want to add this to the client interface. This isn't a method you want the client to see. It's an internal detail of the implementation. This is a great place to use a private method. You can have as many private methods as you want in your classes. The client would never see these methods. So my checkIndex method is a good example of a private method.

I also pointed out that my new version calls ensureCapacity inside of methods like add to make sure that there is room for any new value being added to the list. In the ensureCapacity code, I pointed out that when the method needs to make a larger array, it always makes one that is at least twice as big. It could be even bigger because this is a public method that the client can call. To make this work, we allocate the new array, copy the existing value into the new array, and then reassign the value of this.elementData to point to the new array.

Someone asked if this isn't wasteful, because now that other array is floating off in memory somewhere. It is certainly expensive to have to do this, so we don't want to do it often (that's why we use the doubling technique described in Wednesday's lecture). But we have no choice if we're trying to make a larger array.

What happens internally is that the other array will eventually be found by what is known as the "garbage collector." The garbage collector will notice that nobody is referring to the array and will reclaim the space. I said that you could imagine objects like arrays as being like helium balloons. Normally we have a variable that keeps track of each object (like a string tied to the balloon). If a variable stops referring to such an object (like when we change the value of this.elementData to point to the new array instead of the old array), this is like letting go of the string on a helium balloon. It floats away. The garbage collector is a program that wakes up every once in a while and that sees what's up in the ceiling of the room (which objects are no longer being held onto). It reuses the space for those objects.

We also spent some time looking at this code for the addAll method:

        // post: appends all values in the given list to the end of this list
        public void addAll(IntList other) {
            for (int i = 0; i < other.size; i++)
                this.add(other.elementData[i]);
        }
It may seem a little odd to have one IntList dealing with another IntList, but this actually happens fairly often. The idea is that "this" IntList is supposed to add all of the values from the "other" IntList. I asked if there is anything odd about this. Someone said that the IntList is accessing the private data fields of "other". That's right. This IntList is referring to other.size and other.elementData. But this works because the understanding of the word private is that it is "private to the class." This is not at all the way that we as humans understand the meaning of private (if something is private to me, then it shouldn't be available to other humans). But in Java, one IntList object can access private elements of another IntList object because they are both of the same class.

Then we turned our attention to what are known as linked lists. We have been examining how to store a list of values in an array and we saw that arrays use a contiguous block of memory. One downside to that is that it is not easy to expand the array to store more values. So why have contiguous memory? Someone pointed out that it makes it fast. That's true. Arrays are what we call "random access" structures because we can quickly access any value within the array. Even if you ask for element 5000, you'd know right where to find it in memory because everything is stored together as one big block.

We'll find that linked lists have what we would call "sequential access." That means that it can be slow to access things in the middle of the list. A good analogy is to think of CDs versus cassette tapes. With a CD, you can quickly jump from track 2 to track 18. With a cassette tape, you have to fast forward through the tracks in between. This can take a long time. The same is true with linked lists. In fact, we'll find that the things that arrays do particularly well linked lists tend to do badly and vice versa.

I asked people if they could think of some other thing that arrays do badly. Someone mentioned that inserting or removing in the middle can be expensive. That's exactly right. If you have something like 10 thousand values in an array and you want to get rid of the first one, you have to shift 9,999 values over to fill in the gap. This will be a case where linked lists are much faster than arrays.

Linked lists are composed of individual elements called nodes. Each node is like a Lego building block. It looks unimpressive by itself, but once you put a bunch of them together, it can form an interesting structure.

A basic list node looks like this:

        +------+------+
        | data | next |
        |  18  |  +---+--->
        +------+------+
It's an object with two data fields: one for storing a single item of data and one for storing a reference to the next node in the list. For a list of int values, we'd declare this as follows:

        public class ListNode {
            public int data;
            public ListNode next;
        }
I pointed out that this isn't a nicely encapsulated object because of the public data fields. I said that I'd discuss this next week (why this is okay to do). I also pointed out that this is a recursive data structure (a class that is defined in terms of itself in that the class is called ListNode and it has a data field of type ListNode).

Then we wrote some code that would build up the list (3, 7, 12). Obviously we're going to need three nodes that are linked together. With linked lists, if you have a reference to the front of the list, then you can get to anything in the list. So we'll usually have a single variable of type ListNode that refers to (or points to) the front of the list. So we began with this declaration:

        ListNode list;
The variable "list" is not itself a node. It's a variable that is capable of referring to a node. So we'd draw it something like this:

         +---+
    list | ? |
         +---+
where we understand that the "?" is going to be replaced with a reference to a node. So this box does not have a "data" field or a "next" field. It's a box where we can store a reference to such an object.

We don't have an actual node until we call new:

        list = new ListNode();
This constructs a new node and tells Java to have the variable "list" refer to it:

                    +------+------+
         +---+      | data | next |
    list | +-+--->  |      |      |
         +---+      +------+------+
What do we want to do with this node? We want to store 3 in its data field (list.data) and we want its next field to point to a new node:

        list.data = 3;
        list.next = new ListNode();
which leads us to this situation:

                    +------+------+      +------+------+
         +---+      | data | next |      | data | next |
    list | +-+--->  |   3  |   +--+--->  |      |      |
         +---+      +------+------+      +------+------+
When you program linked lists, you have to be careful to keep track of what you're talking about. The variable "list" stores a reference to the first node. We can get inside that node with the dot notation (list.data and list.next). So "list.next" is the way to refer to the "next" box of the first node. We wrote code to assign it to refer to a new node, which is why "list.next" is pointing at this second node.

Now we want to assign the second node's data field (list.next.data) to the value 7 and assign the second node's next field to refer to a third node:

        list.next.data = 7;
        list.next.next = new ListNode();
which leads us to this situation:

                    +------+------+      +------+------+      +------+------+
         +---+      | data | next |      | data | next |      | data | next |
    list | +-+--->  |   3  |   +--+--->  |   7  |   +--+--->  |      |      |
         +---+      +------+------+      +------+------+      +------+------+
I again repeated the idea of paying close attention to list versus list.next versus list.next.next and remember which box each of those coincides with:

                    +------+------+      +------+------+      +------+------+
         +---+      | data | next |      | data | next |      | data | next |
    list | +-+--->  |   3  |   +--+--->  |   7  |   +--+--->  |      |      |
         +---+      +------+------+      +------+------+      +------+------+
           |                   |                    |
           |                   |                    |
          list             list.next          list.next.next
Finally, we want to set the data field of this third node to 12 (list.next.next.data) and we want to set its next field to null. The keyword "null" is a Java word that means "no object". This provides a "terminator" for the linked list (a special value that indicates that we are at the end of the list). So we'd execute these statements:

        list.next.next.data = 12;
        list.next.next.next = null;
which leaves us in this situation:

                    +------+------+      +------+------+      +------+------+
         +---+      | data | next |      | data | next |      | data | next |
    list | +-+--->  |   3  |   +--+--->  |   7  |   +--+--->  |  12  |   /  |
         +---+      +------+------+      +------+------+      +------+------+
We draw a diagonal line through the last "next" field as a way to indicate that it's value is null. The assignment to null is actually unnecessary. Java will initialize all data fields to the "zero equivalent" for that particular type. For type int, that means initializing to 0. For double, it initializes to 0.0. For boolean, it initializes to false. For arrays and other objects, it initializes to null. But it's not a bad idea to include the code to make it perfectly clear what's going on.

Obviously this is a very tedious way to manipulate a list. It's much better to write code that involves loops to manipulate lists. But it takes a while to get used to this idea, so we're first going to practice how to do some raw list operations without a loop. In section we will go over 10 different exercises that involve list operations. Each will have a "before" picture and an "after" picture. The challenge is to write code that gets you from the one state to the other state.

As an example, suppose that you have two variables of type ListNode called p and q and that you have the following situation:

                    +------+------+      +------+------+
         +---+      | data | next |      | data | next |
       p | +-+--->  |   2  |   +--+--->  |   4  |   /  |
         +---+      +------+------+      +------+------+

                    +------+------+      +------+------+
         +---+      | data | next |      | data | next |
       q | +-+--->  |   3  |   +--+--->  |   9  |   /  |
         +---+      +------+------+      +------+------+
and you want to get to this situation:

                    +------+------+      +------+------+      +------+------+
         +---+      | data | next |      | data | next |      | data | next |
       p | +-+--->  |   2  |   +--+--->  |   4  |   +--+--->  |   3  |   /  |
         +---+      +------+------+      +------+------+      +------+------+

                    +------+------+
         +---+      | data | next |
       q | +-+--->  |   9  |   /  |
         +---+      +------+------+
How do we do it? I started by asking people how many variables of type ListNode we have. I got various answers. Some people said two (probably thinking of p and q). Other people said four (probably thinking of p, q and the two non-null links). But in fact, there are six different variables of type ListNode. I numbered each one:

                              2                    3
           1        +------+------+      +------+------+
         +---+      | data | next |      | data | next |
       p | +-+--->  |   2  |   +--+--->  |   4  |   /  |
         +---+      +------+------+      +------+------+

                               5                    6
           4        +------+------+      +------+------+
         +---+      | data | next |      | data | next |
       q | +-+--->  |   3  |   +--+--->  |   9  |   /  |
         +---+      +------+------+      +------+------+
Then I asked which of these variables has to change in value. The answer is that the boxes numbered 3, 4 and 5 have to be changed. If we change them appropriately, we'll be done. But we have to be careful of how we do so. Order can be important. For example, suppose we were going to start by changing box 4. In the final situation, it's supposed to point at the node with 9 in it. But if we started with that change, then what would happen to the node with 3 in it? We'd lose track of it. This is potentially a problem.

Of the three values we have to change to solve this problem, the one that is safe to change is box 3 because it's currently null. So we begin by setting it to point to the node with 3 in it:

        p.next.next = q;
Now that we've used the value of box 4 to reset box 3, we can reset box 4. It's supposed to point to the node that has 9 in it. We can do this by "leap frogging" over the current node it's pointing to:

        q = q.next;
Now we just have to reset box 5. But we can no longer refer to box 5 as q.next because we've changed q. Now we have to refer to it this way:

        p.next.next.next = null;
Putting these three lines together, we see the code that is needed to get from the initial state to the final state:

        p.next.next = q;
        q = q.next;
        p.next.next.next = null;
Obviously this can be very confusing. It is essential that you draw pictures to keep track of what is pointing where and what is going on when this code executes. It's the only way to master linked list code. We'll practice these small problems in section so that in lecture on Wednesday we can turn to the question of how to use loops to do more generalized processing of linked lists.


Stuart Reges
Last modified: Fri Jan 13 17:20:27 PST 2006