Arrays and Nodes

Dynamic arrays, abstraction, linked nodes, and encapsulation.

  1. Dynamic Arrays
    1. A more efficient implementation
  2. Linked Nodes
    1. Encapsulation

All modern software is built with the help of other software. Millions of programmers have written code using Java list types like ArrayList. But have you ever wondered how lists are actually implemented in Java? Someone had to write the code to define the behavior for ArrayList. In this concept review, we’ll investigate several different ways to implement the list data type.

In software engineering, there’s a distinction between being a client of a program versus being an implementer of a program. If you’re working with a team of people on a software project, one person on your team might be responsible for writing the client while another person is responsible for writing the implementation. This distinction not only provides a blueprint for dividing a programming project into independent components, but it also allows for abstraction: the idea that we can change the implementation of the program without needing to rewrite all the client code too!

What would programming without abstraction look like? Without abstraction, everyone who wrote a program using lists would have to go back and fix their code every time the list data type is updated by the inventors of Java. There are millions of people using Java, and many of them use lists. Software development would grind to a halt if every update to Java required everyone who used a list to also make many changes just so that their code continues to work.

In this course, we’ll focus on being the implementer of data types such as lists. How might we actually implement an ArrayList in Java? What design decisions were made to implement the ArrayList? By studying the implementation for ArrayList, we’ll not only learn about how it works but also when and why we might choose to use an ArrayList in the future.

Dynamic Arrays

ResizingArrayStack.java

ArrayList is the first of many Java collections that we’ll learn about in this course.

Collection
An object that represents a group of zero or more objects called elements. In Java, lists, sets, and maps are examples of commonly-used collections.

In this lesson, we’ll study two approaches to implement ArrayList. Consider the following code that shows us how an ArrayList changes over time as we add numbers to it.

List<Integer> nums = new ArrayList<>();
nums.add(1); // [1]
nums.add(2); // [1, 2]
nums.add(3); // [1, 2, 3]

This example provides some insight into one possible approach. The brackets notation printed out after each example is what you get if you have exactly those values stored in an array! We can keep track of this array with a private field called elementData. Read the comments to see what’s happening in each step of the constructor and the add method.

// The use of E here indicates the type of elements in the ArrayList.
// Clients can say ArrayList<Integer> to represent a list of integers.
public class ArrayList<E> implements List<E> {
    private E[] elementData;

    public ArrayList() {
        // Create a zero-length array because it has exactly the
        // string representation that we want: []
        elementData = new E[0];
    }

    public void add(E element) {
        // Make space by creating a slightly longer new array.
        E[] newData = new E[elementData.length + 1];

        // Copy everything over from the old array to the new array.
        for (int i = 0; i < elementData.length; i += 1) {
            newData[i] = elementData[i];
        }

        // Add the element to the end of the new array.
        newData[newData.length - 1] = element;

        // Re-assign the private field to the new array.
        elementData = newData;
    }

    public int size() {
        return elementData.length;
    }

    public String toString() {
        return Arrays.toString(elementData);
    }
}

In this approach, our internal representation is exactly identical to the result that’s printed-out. So we have a working ArrayList. Job well done, right?

When running this code, clients complain that it's too slow. What could be causing the slowness?

The problem is that it takes too long to make a new elementData array every time we need to add just 1 element. Because arrays in Java have a fixed length, there’s not a simple way to make space for another element except by creating a brand new array of the desired length.

While this code works fine when the lists are small, even today’s fast computers and smartphones will struggle with lists that have thousands of elements. Imagine scrolling through your feed in an app and waiting a longer and longer time to load each element; as the number of elements increases, so too does the time it takes to copy everything over.

Optionally, open the preloaded Java Tutor environment and see how it takes 45 steps just to add the numbers 1, 2, 3 into an empty list! This approach takes a lot of time!

Java Tutor

A more efficient implementation

To address this inefficiency, we can introduce a second field called size.

public class ArrayList<E> implements List<E> {
    private E[] elementData;
    private int size;

    public ArrayList() {
        // Create a length-10 array to store elements.
        elementData = new E[10];

        // Assign size to 0 for an initially-empty list.
        size = 0;
    }

    public void add(E element) {
        // Add the element to the next space in the array.
        elementData[size] = element;

        // Increment the size to indicate the list has grown.
        size += 1;
    }

    public int size() {
        // Return the current size of the list.
        return size;
    }
}

Initially, size is 0. But, when we call add, the size is incremented by 1 to reflect the new element added to the array. The size tells us which values in the array are actually in-use by the list. Even though elementData has space for 10 elements, a new ArrayList should function as if it were empty. The size variable helps maintain the appearance of a dynamically-resizing list by treating only the first size-number of elements as actually in the list.

Open the preloaded Java Tutor environment and step through to see how the size field changes to match the actual number of elements added into the list. Note that this version only takes 28 steps to add the numbers 1, 2, 3. The difference will only become more appreciable as the number of elements increases.

Java Tutor

This approach avoids creating a new array each time we need to add an element, though this particular implementation is limited to just 10 elements. The actual Java implementation of the ArrayList class contains a little more code to create a new array double the current capacity when more space is needed, which is why we call this data structure the dynamic array.

Why would it be incorrect to implement size by returning the length of elementData?

In this second approach, the elementData is no longer exactly what the client sees, so the length of elementData is not the same as the size of the list. Instead, the elementData array will usually contain a lot of empty spaces that won’t be filled until the client adds elements to the list.

Linked Nodes

LinkedStack.javaLinkedQueue.java

Dynamic arrays aren’t the only way to implement lists. In fact, for some methods, they can actually be quite slow. Lists not only allow adding elements, but also removing elements by a given index. How might we implement the remove(int index) method for the above ArrayList class?

// This method also returns the element that was removed.
public E remove(int index) {
    // Return null if the index is not in the given list.
    if (index >= size) {
        return null;
    }

    // Hold onto the element so that it can be returned later.
    E result = elementData[index];

    // Shift all following elements left by 1 position, thereby
    // removing the element at the given index.
    for (int i = index + 1; i < size; i += 1) {
        elementData[i - 1] = elementData[i];
    }

    // Decrement the size to indicate the list has shrunk.
    size -= 1;

    // Return the saved element.
    return result;
}
Why do we need to shift all the following elements down by 1 spot?

We need to shift all the following elements down by 1 to maintain the ArrayList class invariant. The i-th index in the array must hold the i-th value in the list; we can’t have any gaps in the array! If we didn’t shift all the following elements down by 1, methods like toString would no longer work.

Shifting elements can take a lot of time: if we want to remove the first element in an ArrayList, every other element needs to move over too. Linked nodes are an alternative to arrays for representing an ordered sequence of elements. Whereas arrays store all the elements together in memory, each linked node contains its own element and a “link” or reference to the next node. For example, our LinkedList class can represent linked nodes with a Node class consisting of two fields:

  • The value of the node’s element as the value field.
  • A reference to the next element in the list.
public class Node<E> {
    private final E value;
    private Node<E> next;

    public Node(E value) {
        // Calls the other constructor to set the next field to null,
        // meaning that there is no next linked node in the sequence.
        this(value, null);
    }

    public Node(E value, Node<E> next) {
        this.value = value;
        this.next = next;
    }
}

It might seem strange for the Node class to contain a field referencing itself, but this is allowed in Java. We use the next field to access the node in the list after the current one. Remember that this is only a reference and does not store the actual object itself. A null next field indicates the end of a sequence of linked nodes.

The value field is marked final to prevent changes to value after the Node is created. In this course, we focus our study of linked nodes on changing only the references to next. For example, the following code represents the linked list [1, 2, 3].

Node<Integer> n1 = new Node<>(1);
Node<Integer> n3 = new Node<>(3);
Node<Integer> n2 = new Node<>(2, n3);
n1.next = n2;

Open the preloaded Java Tutor environment to visualize this code one step at a time. Before each time you press Next >, predict what will change in the visualization.

Java Tutor

After running the entire block of code, what is the value of n1.next.next.value?

Let’s reason about this one subexpression at a time.

  1. n1 refers to the node storing the value 1.
  2. n1.next refers to the node storing the value 2, or n2.
  3. n1.next.next refers to the node storing the value 3, or n3.
  4. n1.next.next.value evaluates to 3, the value stored in n3.

Encapsulation

Where do we write the add, remove, size, and toString methods? If each Node stores a single element’s value, how do we refer to all the elements? We need a LinkedList class that encapsulates all the Node objects by bundling together all the data with the methods for operating on all of it.

public class LinkedList<E> implements List<E> {
    // Reference to the first node in the sequence.
    private Node front;

    public void add(E element) {
        if (front == null) {
            // If the list is empty, assign the new node to the front.
            front = new Node(element);
        } else {
            // Otherwise, add the element to the end of the list.
            Node current = front;
            while (current.next != null) {
                current = current.next;
            }
            current.next = new Node(element);
        }
    }

    public E remove(int index) {
        // You'll implement this in your project.
    }

    public int size() {
        // Count the number of nodes starting from front.
        int result = 0;
        Node current = front;
        while (current != null) {
            result += 1;
            current = current.next;
        }
        return result;
    }

    private class Node {
        private final E value;
        private Node next;

        public Node(E value) {
            this(value, null);
        }

        public Node(E value, Node next) {
            this.value = value;
            this.next = next;
        }
    }
}

Java allows classes to be nested within other classes. In this case, we say that Node is an inner class of LinkedList. Note that the <E> generic type does not have to be repeated in the inner class because each Node is associated with its encapsulating LinkedList. We also changed the Node access modifier from public to private to indicate that Node is only relevant to the implementer of LinkedList.

Encapsulation doesn’t only describe the relationship between linked lists and their linked nodes. It can also be used to describe the relationship between array lists and their underlying arrays. Or really any class whose implementation details are hidden from clients—which is just about any program that we will write in this course! Encapsulation is the most common way to define an abstraction in data structures and algorithms.