CSE143X Notes for Wednesday, 10/24/18

We started a new topic. We are going to learn about what are known as linked lists. In doing so, we will explore in great detail the difference between:

We have been examining how to store a list of values in an array and we saw that arrays use a contiguous block of memory. One downside to that is that it is not easy to expand the array to store more values. So why have contiguous memory? Someone pointed out that it makes it fast. That's true. Arrays are what we call "random access" structures because we can quickly access any value within the array. Even if you ask for element 5000, you'd know right where to find it in memory because everything is stored together as one big block.

We'll find that linked lists have what we would call "sequential access." That means that it can be slow to access things in the middle of the list. A good analogy is to think of CDs versus cassette tapes. With a CD, you can quickly jump from track 2 to track 18. With a cassette tape, you have to fast forward through the tracks in between. This can take a long time. The same is true with linked lists. In fact, we'll find that the things that arrays do particularly well linked lists tend to do badly and vice versa.

I asked people if they could think of some other thing that arrays do badly. Someone mentioned that inserting or removing in the middle can be expensive. That's exactly right. If you have something like 10 thousand values in an array and you want to get rid of the first one, you have to shift 9,999 values over to fill in the gap. This will be a case where linked lists are much faster than arrays.

With arrays, we might store a list of 6 ints as follows:

                         [0]   [1]   [2]   [3]   [4]   [5]
             +---+     +-----+-----+-----+-----+-----+-----+
        list | +-+-->  |  0  |  2  |  40 |  23 |  14 |  72 |
             +---+     +-----+-----+-----+-----+-----+-----+
Imagine cutting this array up into individual variables and scattering them throughout memory:

        +-----+   +-----+   +-----+   +-----+   +-----+   +-----+
        |  23 |   |  2  |   |  40 |   |  0  |   |  14 |   |  72 |
        +-----+   +-----+   +-----+   +-----+   +-----+   +-----+
If the values are going to be scattered throughout memory, we would have to somehow connect them to each other to keep track of the order of our list:

           +---->---->---->---->---->---->---->----+
           ^                                       |
           |         +----<----<----<----+         V
           ^         |                   ^         |
           |         V                   |         V
        +-----+   +-----+   +-----+   +-----+   +-----+   +-----+
        |  23 |   |  2  |-->|  40 |   |  0  |   |  14 |-->|  72 |-end
        +-----+   +-----+   +-----+   +-----+   +-----+   +-----+
           ^                   |
           |                   V
           +----<----<----<----+
Each bit of data is going to point to the next bit of data and the final bit of data (72) will have a special value that will indicate that we are at the end of the list. You might think that even with this interconnected structure, we'd have to keep track of where each value is stored. In fact, we just need a reference to the front of the list. So if we can get to the value that stores 0 in it, then from there we can get to every other value in the list. This is the basic idea that we are going to explore with linked lists.

Linked lists are composed of individual elements called nodes. Each node is like a Lego building block. It looks unimpressive by itself, but once you put a bunch of them together, it can form an interesting structure.

A basic list node looks like this:

        +------+------+
        | data | next |
        |  18  |  +---+--->
        +------+------+
It's an object with two data fields: one for storing a single item of data and one for storing a reference to the next node in the list. For a list of int values, we'd declare this as follows:

        public class ListNode {
            public int data;
            public ListNode next;
        }
I pointed out that this isn't a nicely encapsulated object because of the public data fields. I said that I'd discuss this later in the week (why this is okay to do). I also pointed out that this is a recursive data structure (a class that is defined in terms of itself in that the class is called ListNode and it has a data field of type ListNode).

Then we wrote some code that would build up the list (3, 7, 12). Obviously we're going to need three nodes that are linked together. With linked lists, if you have a reference to the front of the list, then you can get to anything in the list. So we'll usually have a single variable of type ListNode that refers to (or points to) the front of the list. So we began with this declaration:

        ListNode list;
The variable "list" is not itself a node. It's a variable that is capable of referring to a node. So we'd draw it something like this:

         +---+
    list | ? |
         +---+
where we understand that the "?" is going to be replaced with a reference to a node. So this box does not have a "data" field or a "next" field. It's a box where we can store a reference to such an object.

We don't have an actual node until we call new:

        list = new ListNode();
This constructs a new node and tells Java to have the variable "list" refer to it:

                    +------+------+
         +---+      | data | next |
    list | +-+--->  |      |      |
         +---+      +------+------+
What do we want to do with this node? We want to store 3 in its data field (list.data) and we want its next field to point to a new node:

        list.data = 3;
        list.next = new ListNode();
which leads us to this situation:

                    +------+------+      +------+------+
         +---+      | data | next |      | data | next |
    list | +-+--->  |   3  |   +--+--->  |      |      |
         +---+      +------+------+      +------+------+
When you program linked lists, you have to be careful to keep track of what you're talking about. The variable "list" stores a reference to the first node. We can get inside that node with the dot notation (list.data and list.next). So "list.next" is the way to refer to the "next" box of the first node. We wrote code to assign it to refer to a new node, which is why "list.next" is pointing at this second node.

Now we want to assign the second node's data field (list.next.data) to the value 7 and assign the second node's next field to refer to a third node:

        list.next.data = 7;
        list.next.next = new ListNode();
which leads us to this situation:

                    +------+------+      +------+------+      +------+------+
         +---+      | data | next |      | data | next |      | data | next |
    list | +-+--->  |   3  |   +--+--->  |   7  |   +--+--->  |      |      |
         +---+      +------+------+      +------+------+      +------+------+
I again repeated the idea of paying close attention to list versus list.next versus list.next.next and remember which box each of those coincides with:

                    +------+------+      +------+------+      +------+------+
         +---+      | data | next |      | data | next |      | data | next |
    list | +-+--->  |   3  |   +--+--->  |   7  |   +--+--->  |      |      |
         +---+      +------+------+      +------+------+      +------+------+
           |                   |                    |
           |                   |                    |
          list             list.next          list.next.next
Finally, we want to set the data field of this third node to 12 (list.next.next.data) and we want to set its next field to null. The keyword "null" is a Java word that means "no object". This provides a "terminator" for the linked list (a special value that indicates that we are at the end of the list). So we'd execute these statements:

        list.next.next.data = 12;
        list.next.next.next = null;
which leaves us in this situation:

                    +------+------+      +------+------+      +------+------+
         +---+      | data | next |      | data | next |      | data | next |
    list | +-+--->  |   3  |   +--+--->  |   7  |   +--+--->  |  12  |   /  |
         +---+      +------+------+      +------+------+      +------+------+
We draw a diagonal line through the last "next" field as a way to indicate that it's value is null. The assignment to null is actually unnecessary. Java will initialize all data fields to the "zero equivalent" for that particular type. For type int, that means initializing to 0. For double, it initializes to 0.0. For boolean, it initializes to false. For arrays and other objects, it initializes to null. But it's not a bad idea to include the code to make it perfectly clear what's going on.

Obviously this is a very tedious way to manipulate a list. It's much better to write code that involves loops to manipulate lists. But it takes a while to get used to this idea, so we're first going to practice how to do some raw list operations without a loop.

The calendar includes this simple code along with a new version of the ListNode class that includes several constructors:

        public class ListNode {
            public int data;
            public ListNode next;
        
            public ListNode() {
                this(0, null);
            }
        
            public ListNode(int data) {
                this(data, null);
            }
        
          public ListNode(int data, ListNode next) {
              this.data = data;
              this.next = next;
          }
        }
As with the other classes we've seen, there is one "real" constructor (the one that takes two arguments). The other two use the "this(...)" notation to call the third constructor with default values (0 for the data, null for next). With the new version of the class, it is possible to write a single line of code to construct the list.

In section we will go over 10 different exercises that involve list operations. Each will have a "before" picture and an "after" picture. The challenge is to write code that gets you from the one state to the other state.

As an example, suppose that you have two variables of type ListNode called p and q and that you have the following situation:

                    +------+------+      +------+------+
         +---+      | data | next |      | data | next |
       p | +-+--->  |   2  |   +--+--->  |   4  |   /  |
         +---+      +------+------+      +------+------+

                    +------+------+      +------+------+
         +---+      | data | next |      | data | next |
       q | +-+--->  |   3  |   +--+--->  |   9  |   /  |
         +---+      +------+------+      +------+------+
and you want to get to this situation:

                    +------+------+      +------+------+      +------+------+
         +---+      | data | next |      | data | next |      | data | next |
       p | +-+--->  |   2  |   +--+--->  |   4  |   +--+--->  |   3  |   /  |
         +---+      +------+------+      +------+------+      +------+------+

                    +------+------+
         +---+      | data | next |
       q | +-+--->  |   9  |   /  |
         +---+      +------+------+
How do we do it? I started by asking people how many variables of type ListNode we have. I got various answers. Some people said two (probably thinking of p and q). Other people said four (probably thinking of p, q and the two non-null links). But in fact, there are six different variables of type ListNode. I numbered each one:

                              2                    3
           1        +------+------+      +------+------+
         +---+      | data | next |      | data | next |
       p | +-+--->  |   2  |   +--+--->  |   4  |   /  |
         +---+      +------+------+      +------+------+

                               5                    6
           4        +------+------+      +------+------+
         +---+      | data | next |      | data | next |
       q | +-+--->  |   3  |   +--+--->  |   9  |   /  |
         +---+      +------+------+      +------+------+
Then I asked which of these variables has to change in value. The answer is that the boxes numbered 3, 4 and 5 have to be changed. If we change them appropriately, we'll be done. But we have to be careful of how we do so. Order can be important. For example, suppose we were going to start by changing box 4. In the final situation, it's supposed to point at the node with 9 in it. But if we started with that change, then what would happen to the node with 3 in it? We'd lose track of it. This is potentially a problem.

Of the three values we have to change to solve this problem, the one that is safe to change is box 3 because it's currently null. So we begin by setting it to point to the node with 3 in it:

        p.next.next = q;
Now that we've used the value of box 4 to reset box 3, we can reset box 4. It's supposed to point to the node that has 9 in it. We can do this by "leap frogging" over the current node it's pointing to:

        q = q.next;
Now we just have to reset box 5. But we can no longer refer to box 5 as q.next because we've changed q. Now we have to refer to it this way:

        p.next.next.next = null;
Putting these three lines together, we see the code that is needed to get from the initial state to the final state:

        p.next.next = q;
        q = q.next;
        p.next.next.next = null;
Obviously this can be very confusing. It is essential that you draw pictures to keep track of what is pointing where and what is going on when this code executes. It's the only way to master linked list code.

Then we discussed how to write code that would print the values in a list one per line. For example, suppose we have a variable called list that stores a reference to the list (3, 5, 2):

                    +------+------+      +------+------+      +------+------+
         +---+      | data | next |      | data | next |      | data | next |
    list | +-+--->  |   3  |   +--+--->  |   5  |   +--+--->  |   2  |   /  |
         +---+      +------+------+      +------+------+      +------+------+
We could refer to each of the three data fields: list.data (3), list.next.data (5) and list.next.next.data (2). This can work for very short lists, but obviously won't work when we have hundreds or thousands of nodes to process. So we want to write a loop for this.

We have just one variable to work with, so that's clearly where we have to start (the variable "list"). We could use it to move along the list and print things out, but then we would lose the original value of the variable which would mean that we would have lost the list. Instead, we declare a local variable of type ListNode that we will use to access the different data fields of the list:

        ListNode current = list;
This initializes current to point to the same value as list (the first node in the list). We want to have a loop that prints the various values and we want it to keep going as long as there is more data to print. After executing the statement above, we have the following situation:

                    +------+------+      +------+------+      +------+------+
         +---+      | data | next |      | data | next |      | data | next |
    list | +-+--->  |   3  |   +--+--->  |   5  |   +--+--->  |   2  |   /  |
         +---+      +------+------+      +------+------+      +------+------+
                           ^
         +---+             |
 current | +-+------>------+
         +---+
So how do we structure our loop? We want to keep going while there is more data to print. The variable current will end up referring to each different node in turn. The final node has the value null in its next field, so eventually the variable current will become null and that's when we know we're done. So our basic loop structure will be:

ListNode current = list; while (current != null) { <process next value> } To process a node, we need to print out its value, which we can get from current.data, and we need to move current to the next node over. The position of the next node is stored in current.next, so moving to that next node involves resetting current to current.next:

        ListNode current = list;
        while (current != null) {
            System.out.println(current.data);
            current = current.next;
        }
The first time through this loop, current is referring to the node with the 3 in it. It prints this value and then resets current, which causes current to refer to (or point to) the second node in the list:

                    +------+------+      +------+------+      +------+------+
         +---+      | data | next |      | data | next |      | data | next |
    list | +-+--->  |   3  |   +--+--->  |   5  |   +--+--->  |   2  |   /  |
         +---+      +------+------+      +------+------+      +------+------+
                                                ^
         +---+                                  |
 current | +-+------>------>------>------>------+
         +---+
Some people prefer to visualize this differently. Instead of thinking of the variable current as sitting still while its arrow moves, some people prefer to think of the variable itself moving. So for the initial situation they'd draw this picture:

                    +------+------+      +------+------+      +------+------+
         +---+      | data | next |      | data | next |      | data | next |
    list | +-+--->  |   3  |   +--+--->  |   5  |   +--+--->  |   2  |   /  |
         +---+      +------+------+      +------+------+      +------+------+
                           ^
                           |          
                         +-+-+
                 current | + |
                         +---+
And after executing the statement "current = current.next", we'd have this situation:

                    +------+------+      +------+------+      +------+------+
         +---+      | data | next |      | data | next |      | data | next |
    list | +-+--->  |   3  |   +--+--->  |   5  |   +--+--->  |   2  |   /  |
         +---+      +------+------+      +------+------+      +------+------+
                                                ^
                                                |          
                                              +-+-+
                                      current | + |
                                              +---+
Either way of thinking about this works. Because in this new situation the variable current is not null, we once again go into the loop and print out current.data (which is now 5), and move current along again:

                    +------+------+      +------+------+      +------+------+
         +---+      | data | next |      | data | next |      | data | next |
    list | +-+--->  |   3  |   +--+--->  |   5  |   +--+--->  |   2  |   /  |
         +---+      +------+------+      +------+------+      +------+------+
                                                                     ^
         +---+                                                       |
 current | +-+------>------>------>------>------>------>------>------+
         +---+
Once again current is not null, so we go into the loop a third time and print the value of current.data (2) and reset current. But this time current.next has the value null, so when we reset current we get:

                    +------+------+      +------+------+      +------+------+
         +---+      | data | next |      | data | next |      | data | next |
    list | +-+--->  |   3  |   +--+--->  |   5  |   +--+--->  |   2  |   /  |
         +---+      +------+------+      +------+------+      +------+------+

         +---+
 current | / |
         +---+
Because current has become null, we break out of the loop having produced the following output:

        3
        5
        2
I pointed out that the corresponding array code would look like this:

        int i = 0;
        while (i < a.length) {
            System.out.println(a[i]);
            i++;
        }
Assuming you have some comfort with array-style programming, this might give you some useful insight into linked list programming. There are direct parallels here in terms of typical code:

Array/List Equivalents
Description Array code Linked list code
go to front of the list int i = 0; ListNode current = list;
test for more elements i < a.length current != null
current value a[i] current.data
go to next element i++; current = current.next;

In fact, knowing that we like to use for loops for array processing, you can imagine writing for loops for the processing of linked lists as well. Our code above could be rewritten as:

        for(ListNode current = list; current != null; current = current.next) {
            System.out.println(current.data);
        }
Some people like to write their list code this way. I tend to use while loops for list code, but it's an issue of personal taste.

Then I spent some time talking about how we are going to use linked lists to define a new class called LinkedIntList. I asked what data fields will be needed and there were several suggestions. I said that for now we'll stick with the minimum and in this case the only data field you need is a reference to the front of the list:

public class LinkedIntList { private ListNode front; <methods> } Then I reminded people that our node class has public fields, which in general is a bad idea. It's not of great concern to us because we're going to make sure that only our LinkedIntList object will ever manipulate individual nodes. By the time we are done, we are going to have two classes: one for individual nodes of a list and one for the list itself. We'll be careful to have a clean, well-encapsulated list object, but we don't have to worry about doing the same thing for the node class.

I made the following analogy. Suppose I want to hire someone to paint my apartment. One contractor tells me I'll be carrying messy cans of paint around and another says I won't have to worry about that. Given that choice, I'd rather have the contractor who said I wouldn't have to worry about that. It's important to me that I stay clean. That's like a client talking to our list class. We'll want to make sure that the client has a clean interface. But I don't particularly care if the people doing the actual painting of my apartment get dirty. If one contractor told me he had bought some fancy paint cans that would keep his workers clean and that he was going to charge me more for them, I'd say I don't want to pay the extra money. I'd rather go with a cheaper contractor who uses messy cans of paint as long as I personally don't get dirty. So it's okay for the list itself to deal with these "messy" list node objects as long as the client of the list never sees them.

The "right" way to do this would be to declare the node class inside the list class itself. We'd make it a static inner class. But we haven't talked about the idea of an inner class, let alone the idea of a static inner class, so we'll keep the node in a separate class for now.

Next I turned to the question of how we would implement the appending add operation from the old ArrayIntList class for our new LinkedIntList class that will look like this:

        public void add(int value) {
            ...
        }
The method is supposed to append the new value at the end of the list, which means we have to locate the end of the list. Let's think about the general case where we are appending to the end of a list that already has something in it. For example, suppose the list stores [3, 5, 2]:

                    +------+------+      +------+------+      +------+------+
         +---+      | data | next |      | data | next |      | data | next |
   front | +-+--->  |   3  |   +--+--->  |   5  |   +--+--->  |   2  |   /  |
         +---+      +------+------+      +------+------+      +------+------+

         +---+
 current | / |
         +---+
and we want to add the value 17 at the end of the list. First we have to get there. So here's a start:

        ListNode current = front;
        while (current != null) {
            current = current.next;
        }
What happens is that the variable current moves along the list from the first to the last node until it becomes null, leaving us in this situation:

                    +------+------+      +------+------+      +------+------+
         +---+      | data | next |      | data | next |      | data | next |
   front | +-+--->  |   3  |   +--+--->  |   5  |   +--+--->  |   2  |   /  |
         +---+      +------+------+      +------+------+      +------+------+

         +---+
 current | / |
         +---+
Some people think that we could then execute this line of code to complete the task:

        current = new ListNode(value);
But that won't work. It leaves us in this situation:

                    +------+------+      +------+------+      +------+------+
         +---+      | data | next |      | data | next |      | data | next |
   front | +-+--->  |   3  |   +--+--->  |   5  |   +--+--->  |   2  |   /  |
         +---+      +------+------+      +------+------+      +------+------+

                    +------+------+
         +---+      | data | next |
 current | +-+--->  |  17  |   /  |
         +---+      +------+------+
This allocates a new node, but this new node has no connection to the original list. The list is still composed of 3 nodes linked together. This fourth node has been constructed, but hasn't been properly linked into the list.

As an analogy, I mentioned the idea that you can think of the list nodes as being like railroad cars. You can drag the entire train by dragging the front car around, sort of the way we keep track of the front of the list to keep track of the whole thing. I then said to imagine the variable current as being a little like Sean Connery as James Bond. He starts out standing on top of the front car and then he leaps to the car behind that one (that's what happens when you set current to current.next). Then he jumps off the second car onto the third car. Then he jumps off the third car onto the tracks. At that point he notices that he has found the end of the train. But by jumping off the last car, he's jumped off the train. The train would be speeding off into the distance and he'd be standing there saying, "Come back. I want to attach a new car at the end."

As you learn about linked list programming, you'll find that there are only two ways to change the contents of a list:

To solve this problem, we have to stop one position early. We don't want to run off the end of the list as we did with the printing code. Instead, we want to position current to the final element. We can do this by changing our test. Instead of going until current becomes null, we want to go until current.next is null, because only the last node of the list will have a next field that is null:

        ListNode current = front;
        while (current.next != null) {
            current = current.next;
        }
        current.next = new ListNode(value);
The code above will correctly append a value to the end of the list.

We were trying to write code for the appending add. So this code would be included inside a method and we would have to alter it to use the name of the parameter:

        public void add(int value) {
            ListNode current = front;
            while (current.next != null) {
                current = current.next;
            }
            current.next = new ListNode(value);
        }
Even this code isn't quite correct because we have to deal with the special case where the list might be empty:

        public void add(int value) {
            if (front == null) {
                front = new ListNode(value);
            } else {
                ListNode current = front;
                while (current.next != null) {
                    current = current.next;
                }
                current.next = new ListNode(value);
            }
        }
I said that we would discuss this code in more detail in section as well as implementing other methods of the LinkedIntList class.


Stuart Reges
Last modified: Wed Oct 24 18:37:24 PDT 2018