CSE143 Notes for Monday, 4/9/18

We started a new topic. We are going to learn about what are known as linked lists. In doing so, we will explore in great detail the difference between:

data
a reference to data

We have been examining how to store a list of values in an array and we saw that arrays use a contiguous block of memory. One downside to that is that it is not easy to expand the array to store more values. So why have contiguous memory? Someone pointed out that it makes it fast. That's true. Arrays are what we call "random access" structures because we can quickly access any value within the array. Even if you ask for element 5000, you'd know right where to find it in memory because everything is stored together as one big block.

We'll find that linked lists have what we would call "sequential access." That means that it can be slow to access things in the middle of the list. A good analogy is to think of CDs versus cassette tapes. With a CD, you can quickly jump from track 2 to track 18. With a cassette tape, you have to fast forward through the tracks in between. This can take a long time. The same is true with linked lists. In fact, we'll find that the things that arrays do particularly well linked lists tend to do badly and vice versa.

I asked people if they could think of some other thing that arrays do badly. Someone mentioned that inserting or removing in the middle can be expensive. That's exactly right. If you have something like 10 thousand values in an array and you want to get rid of the first one, you have to shift 9,999 values over to fill in the gap. This will be a case where linked lists are much faster than arrays.

With arrays, we might store a list of 6 ints as follows:

                         [0]   [1]   [2]   [3]   [4]   [5]
             +---+     +-----+-----+-----+-----+-----+-----+
        list | +-+-->  |  0  |  2  |  40 |  23 |  14 |  72 |
             +---+     +-----+-----+-----+-----+-----+-----+

Imagine cutting this array up into individual variables and scattering them throughout memory:

        +-----+   +-----+   +-----+   +-----+   +-----+   +-----+
        |  23 |   |  2  |   |  40 |   |  0  |   |  14 |   |  72 |
        +-----+   +-----+   +-----+   +-----+   +-----+   +-----+

If the values are going to be scattered throughout memory, we would have to somehow connect them to each other to keep track of the order of our list:

           +---->---->---->---->---->---->---->----+
           ^                                       |
           |         +----<----<----<----+         V
           ^         |                   ^         |
           |         V                   |         V
        +-----+   +-----+   +-----+   +-----+   +-----+   +-----+
        |  23 |   |  2  |-->|  40 |   |  0  |   |  14 |-->|  72 |-end
        +-----+   +-----+   +-----+   +-----+   +-----+   +-----+
           ^                   |
           |                   V
           +----<----<----<----+

Each bit of data is going to point to the next bit of data and the final bit of data (72) will have a special value that will indicate that we are at the end of the list. You might think that even with this interconnected structure, we'd have to keep track of where each value is stored. In fact, we just need a reference to the front of the list. So if we can get to the value that stores 0 in it, then from there we can get to every other value in the list. This is the basic idea that we are going to explore with linked lists.

Linked lists are composed of individual elements called nodes. Each node is like a Lego building block. It looks unimpressive by itself, but once you put a bunch of them together, it can form an interesting structure.

A basic list node looks like this:

        +------+------+
        | data | next |
        |  18  |  +---+--->
        +------+------+

It's an object with two data fields: one for storing a single item of data and one for storing a reference to the next node in the list. For a list of int values, we'd declare this as follows:

        public class ListNode {
            public int data;
            public ListNode next;
        }

I pointed out that this isn't a nicely encapsulated object because of the public data fields. I said that I'd discuss this later in the week (why this is okay to do). I also pointed out that this is a recursive data structure (a class that is defined in terms of itself in that the class is called ListNode and it has a data field of type ListNode).

Then we wrote some code that would build up the list (3, 7, 12). Obviously we're going to need three nodes that are linked together. With linked lists, if you have a reference to the front of the list, then you can get to anything in the list. So we'll usually have a single variable of type ListNode that refers to (or points to) the front of the list. So we began with this declaration:

        ListNode list;

The variable "list" is not itself a node. It's a variable that is capable of referring to a node. So we'd draw it something like this:

         +---+
    list | ? |
         +---+

where we understand that the "?" is going to be replaced with a reference to a node. So this box does not have a "data" field or a "next" field. It's a box where we can store a reference to such an object.

We don't have an actual node until we call new:

        list = new ListNode();

This constructs a new node and tells Java to have the variable "list" refer to it:

                    +------+------+
         +---+      | data | next |
    list | +-+--->  |      |      |
         +---+      +------+------+

What do we want to do with this node? We want to store 3 in its data field (list.data) and we want its next field to point to a new node:

        list.data = 3;
        list.next = new ListNode();

which leads us to this situation:

                    +------+------+      +------+------+
         +---+      | data | next |      | data | next |
    list | +-+--->  |   3  |   +--+--->  |      |      |
         +---+      +------+------+      +------+------+

When you program linked lists, you have to be careful to keep track of what you're talking about. The variable "list" stores a reference to the first node. We can get inside that node with the dot notation (list.data and list.next). So "list.next" is the way to refer to the "next" box of the first node. We wrote code to assign it to refer to a new node, which is why "list.next" is pointing at this second node.

Now we want to assign the second node's data field (list.next.data) to the value 7 and assign the second node's next field to refer to a third node:

        list.next.data = 7;
        list.next.next = new ListNode();

which leads us to this situation:

                    +------+------+      +------+------+      +------+------+
         +---+      | data | next |      | data | next |      | data | next |
    list | +-+--->  |   3  |   +--+--->  |   7  |   +--+--->  |      |      |
         +---+      +------+------+      +------+------+      +------+------+

I again repeated the idea of paying close attention to list versus list.next versus list.next.next and remember which box each of those coincides with:

                    +------+------+      +------+------+      +------+------+
         +---+      | data | next |      | data | next |      | data | next |
    list | +-+--->  |   3  |   +--+--->  |   7  |   +--+--->  |      |      |
         +---+      +------+------+      +------+------+      +------+------+
           |                   |                    |
           |                   |                    |
          list             list.next          list.next.next

Finally, we want to set the data field of this third node to 12 (list.next.next.data) and we want to set its next field to null. The keyword "null" is a Java word that means "no object". This provides a "terminator" for the linked list (a special value that indicates that we are at the end of the list). So we'd execute these statements:

        list.next.next.data = 12;
        list.next.next.next = null;

which leaves us in this situation:

                    +------+------+      +------+------+      +------+------+
         +---+      | data | next |      | data | next |      | data | next |
    list | +-+--->  |   3  |   +--+--->  |   7  |   +--+--->  |  12  |   /  |
         +---+      +------+------+      +------+------+      +------+------+

We draw a diagonal line through the last "next" field as a way to indicate that it's value is null. The assignment to null is actually unnecessary. Java will initialize all data fields to the "zero equivalent" for that particular type. For type int, that means initializing to 0. For double, it initializes to 0.0. For boolean, it initializes to false. For arrays and other objects, it initializes to null. But it's not a bad idea to include the code to make it perfectly clear what's going on.

Obviously this is a very tedious way to manipulate a list. It's much better to write code that involves loops to manipulate lists. But it takes a while to get used to this idea, so we're first going to practice how to do some raw list operations without a loop.

The calendar includes this simple code along with a new version of the ListNode class that includes several constructors:

        public class ListNode {
            public int data;
            public ListNode next;
        
            public ListNode() {
                this(0, null);
            }
        
            public ListNode(int data) {
                this(data, null);
            }
        
          public ListNode(int data, ListNode next) {
              this.data = data;
              this.next = next;
          }
        }

As with the other classes we've seen, there is one "real" constructor (the one that takes two arguments). The other two use the "this(...)" notation to call the third constructor with default values (0 for the data, null for next). With the new version of the class, it is possible to write a single line of code to construct the list.

In section we will go over 10 different exercises that involve list operations. Each will have a "before" picture and an "after" picture. The challenge is to write code that gets you from the one state to the other state.

As an example, suppose that you have two variables of type ListNode called p and q and that you have the following situation:

                    +------+------+      +------+------+
         +---+      | data | next |      | data | next |
       p | +-+--->  |   2  |   +--+--->  |   4  |   /  |
         +---+      +------+------+      +------+------+

                    +------+------+      +------+------+
         +---+      | data | next |      | data | next |
       q | +-+--->  |   3  |   +--+--->  |   9  |   /  |
         +---+      +------+------+      +------+------+

and you want to get to this situation:

                    +------+------+      +------+------+      +------+------+
         +---+      | data | next |      | data | next |      | data | next |
       p | +-+--->  |   2  |   +--+--->  |   4  |   +--+--->  |   3  |   /  |
         +---+      +------+------+      +------+------+      +------+------+

                    +------+------+
         +---+      | data | next |
       q | +-+--->  |   9  |   /  |
         +---+      +------+------+

How do we do it? I started by asking people how many variables of type ListNode we have. I got various answers. Some people said two (probably thinking of p and q). Other people said four (probably thinking of p, q and the two non-null links). But in fact, there are six different variables of type ListNode. I numbered each one:

                              2                    3
           1        +------+------+      +------+------+
         +---+      | data | next |      | data | next |
       p | +-+--->  |   2  |   +--+--->  |   4  |   /  |
         +---+      +------+------+      +------+------+

                               5                    6
           4        +------+------+      +------+------+
         +---+      | data | next |      | data | next |
       q | +-+--->  |   3  |   +--+--->  |   9  |   /  |
         +---+      +------+------+      +------+------+

Then I asked which of these variables has to change in value. The answer is that the boxes numbered 3, 4 and 5 have to be changed. If we change them appropriately, we'll be done. But we have to be careful of how we do so. Order can be important. For example, suppose we were going to start by changing box 4. In the final situation, it's supposed to point at the node with 9 in it. But if we started with that change, then what would happen to the node with 3 in it? We'd lose track of it. This is potentially a problem.

Of the three values we have to change to solve this problem, the one that is safe to change is box 3 because it's currently null. So we begin by setting it to point to the node with 3 in it:

        p.next.next = q;

Now that we've used the value of box 4 to reset box 3, we can reset box 4. It's supposed to point to the node that has 9 in it. We can do this by "leap frogging" over the current node it's pointing to:

        q = q.next;

Now we just have to reset box 5. But we can no longer refer to box 5 as q.next because we've changed q. Now we have to refer to it this way:

        p.next.next.next = null;

Putting these three lines together, we see the code that is needed to get from the initial state to the final state:

        p.next.next = q;
        q = q.next;
        p.next.next.next = null;

Obviously this can be very confusing. It is essential that you draw pictures to keep track of what is pointing where and what is going on when this code executes. It's the only way to master linked list code. We'll practice these small problems in section so that in lecture on Wednesday we can turn to the question of how to use loops to do more generalized processing of linked lists.

I spent the last few minutes showing a program that demonstrates the basic idea of reference semantics. The program begins with these lines of code:

        int x = 18;
        System.out.println("initial x = " + x);
        int y = x;
        y++;
        System.out.println("x now     = " + x);
        System.out.println("y         = " + y);

For primitive types like int, variables store the actual values. So when you have an assignment statement like the one above that sets y to have the same value of x, you get an independent copy of the value. That's why incrementing y has no effect on x and we get the following output:

        initial x = 18
        x now     = 18
        y         = 19

The second example involves constructing a queue and then using an assgnment statement to set the variable copy to have the same value as the variable q:

        Queue<Integer> q = new LinkedList<Integer>();
        q.add(18);
        System.out.println("initial q = " + q);
        Queue<Integer> copy = q;
        copy.add(42);
        System.out.println("q now     = " + q);
        System.out.println("copy      = " + copy);

But these are objects, not primitive values. As a result, the variables store a reference to the object. The assignment statement causes copy to refer to the same object as q. At that point we have two variables both referring to one object. That is why using copy to change the structure produces a visible change when we print out q:

        initial q = [18]
        q now     = [18, 42]
        copy      = [18, 42]

We saw in the jGRASP debugger that it numbers the objects and that the variables q and copy both refer to the same object.

Stuart Reges

Last modified: Mon Apr 15 15:54:02 PDT 2019