Arrays and Nodes
Dynamic arrays, abstraction, linked nodes, and encapsulation.
All modern software is built with the help of other software. Millions of programmers have written code using Java list types like ArrayList
. But have you ever wondered how lists are actually implemented in Java? Someone had to write the code to define the behavior for ArrayList
. In this concept review, we’ll investigate several different ways to implement the list data type.
In software engineering, there’s a distinction between being a client of a program versus being an implementer of a program. If you’re working with a team of people on a software project, one person on your team might be responsible for writing the client while another person is responsible for writing the implementation. This distinction not only provides a blueprint for dividing a programming project into independent components, but it also allows for abstraction: the idea that we can change the implementation of the program without needing to rewrite all the client code too!
What would programming without abstraction look like? Without abstraction, everyone who wrote a program using lists would have to go back and fix their code every time the list data type is updated by the inventors of Java. There are millions of people using Java, and many of them use lists. Software development would grind to a halt if every update to Java required everyone who used a list to also make many changes just so that their code continues to work.
In this course, we’ll focus on being the implementer of data types such as lists. How might we actually implement an ArrayList
in Java? What design decisions were made to implement the ArrayList
? By studying the implementation for ArrayList
, we’ll not only learn about how it works but also when and why we might choose to use an ArrayList
in the future.
Dynamic Arrays
ArrayList
is the first of many Java collections that we’ll learn about in this course.
- Collection
- An object that represents a group of zero or more objects called elements. In Java, lists, sets, and maps are examples of commonly-used collections.
In this lesson, we’ll study two approaches to implement ArrayList
. Consider the following code that shows us how an ArrayList
changes over time as we add numbers to it.
List<Integer> nums = new ArrayList<>();
nums.add(1); // [1]
nums.add(2); // [1, 2]
nums.add(3); // [1, 2, 3]
This example provides some insight into one possible approach. The brackets notation printed out after each example is what you get if you have exactly those values stored in an array! We can keep track of this array with a private field called elementData
. Read the comments to see what’s happening in each step of the constructor and the add
method.
// The use of E here indicates the type of elements in the ArrayList.
// Clients can say ArrayList<Integer> to represent a list of integers.
public class ArrayList<E> implements List<E> {
private E[] elementData;
public ArrayList() {
// Create a zero-length array because it has exactly the
// string representation that we want: []
elementData = new E[0];
}
public void add(E element) {
// Make space by creating a slightly longer new array.
E[] newData = new E[elementData.length + 1];
// Copy everything over from the old array to the new array.
for (int i = 0; i < elementData.length; i += 1) {
newData[i] = elementData[i];
}
// Add the element to the end of the new array.
newData[newData.length - 1] = element;
// Re-assign the private field to the new array.
elementData = newData;
}
public int size() {
return elementData.length;
}
public String toString() {
return Arrays.toString(elementData);
}
}
In this approach, our internal representation is exactly identical to the result that’s printed-out. So we have a working ArrayList
. Job well done, right?
When running this code, clients complain that it's too slow. What could be causing the slowness?
The problem is that it takes too long to make a new elementData
array every time we need to add just 1 element. Because arrays in Java have a fixed length, there’s not a simple way to make space for another element except by creating a brand new array of the desired length.
While this code works fine when the lists are small, even today’s fast computers and smartphones will struggle with lists that have thousands of elements. Imagine scrolling through your feed in an app and waiting a longer and longer time to load each element; as the number of elements increases, so too does the time it takes to copy everything over.
Optionally, open the preloaded Java Tutor environment and see how it takes 45 steps just to add the numbers 1, 2, 3 into an empty list! This approach takes a lot of time!
A more efficient implementation
To address this inefficiency, we can introduce a second field called size
.
public class ArrayList<E> implements List<E> {
private E[] elementData;
private int size;
public ArrayList() {
// Create a length-10 array to store elements.
elementData = new E[10];
// Assign size to 0 for an initially-empty list.
size = 0;
}
public void add(E element) {
// Add the element to the next space in the array.
elementData[size] = element;
// Increment the size to indicate the list has grown.
size += 1;
}
public int size() {
// Return the current size of the list.
return size;
}
}
Initially, size
is 0. But, when we call add
, the size
is incremented by 1 to reflect the new element added to the array. The size
tells us which values in the array are actually in-use by the list. Even though elementData
has space for 10 elements, a new ArrayList
should function as if it were empty. The size
variable helps maintain the appearance of a dynamically-resizing list by treating only the first size
-number of elements as actually in the list.
Open the preloaded Java Tutor environment and step through to see how the
size
field changes to match the actual number of elements added into the list. Note that this version only takes 28 steps to add the numbers 1, 2, 3. The difference will only become more appreciable as the number of elements increases.
This approach avoids creating a new array each time we need to add an element, though this particular implementation is limited to just 10 elements. The actual Java implementation of the ArrayList
class contains a little more code to create a new array double the current capacity when more space is needed, which is why we call this data structure the dynamic array.
Why would it be incorrect to implement size by returning the length of elementData?
In this second approach, the elementData
is no longer exactly what the client sees, so the length of elementData
is not the same as the size of the list. Instead, the elementData
array will usually contain a lot of empty spaces that won’t be filled until the client adds elements to the list.
Linked Nodes
LinkedStack.javaLinkedQueue.java
Dynamic arrays aren’t the only way to implement lists. In fact, for some methods, they can actually be quite slow. Lists not only allow adding elements, but also removing elements by a given index. How might we implement the remove(int index)
method for the above ArrayList
class?
// This method also returns the element that was removed.
public E remove(int index) {
// Return null if the index is not in the given list.
if (index >= size) {
return null;
}
// Hold onto the element so that it can be returned later.
E result = elementData[index];
// Shift all following elements left by 1 position, thereby
// removing the element at the given index.
for (int i = index + 1; i < size; i += 1) {
elementData[i - 1] = elementData[i];
}
// Decrement the size to indicate the list has shrunk.
size -= 1;
// Return the saved element.
return result;
}
Why do we need to shift all the following elements down by 1 spot?
We need to shift all the following elements down by 1 to maintain the ArrayList
class invariant. The i-th index in the array must hold the i-th value in the list; we can’t have any gaps in the array! If we didn’t shift all the following elements down by 1, methods like toString
would no longer work.
Shifting elements can take a lot of time: if we want to remove the first element in an ArrayList
, every other element needs to move over too. Linked nodes are an alternative to arrays for representing an ordered sequence of elements. Whereas arrays store all the elements together in memory, each linked node contains its own element and a “link” or reference to the next node. For example, our LinkedList
class can represent linked nodes with a Node
class consisting of two fields:
- The value of the node’s element as the
value
field. - A reference to the
next
element in the list.
public class Node<E> {
private final E value;
private Node<E> next;
public Node(E value) {
// Calls the other constructor to set the next field to null,
// meaning that there is no next linked node in the sequence.
this(value, null);
}
public Node(E value, Node<E> next) {
this.value = value;
this.next = next;
}
}
It might seem strange for the Node
class to contain a field referencing itself, but this is allowed in Java. We use the next
field to access the node in the list after the current one. Remember that this is only a reference and does not store the actual object itself. A null
next
field indicates the end of a sequence of linked nodes.
The value
field is marked final
to prevent changes to value
after the Node
is created. In this course, we focus our study of linked nodes on changing only the references to next
. For example, the following code represents the linked list [1, 2, 3]
.
Node<Integer> n1 = new Node<>(1);
Node<Integer> n3 = new Node<>(3);
Node<Integer> n2 = new Node<>(2, n3);
n1.next = n2;
Open the preloaded Java Tutor environment to visualize this code one step at a time. Before each time you press Next >, predict what will change in the visualization.
After running the entire block of code, what is the value of n1.next.next.value?
Let’s reason about this one subexpression at a time.
n1
refers to the node storing the value 1.n1.next
refers to the node storing the value 2, orn2
.n1.next.next
refers to the node storing the value 3, orn3
.n1.next.next.value
evaluates to 3, the value stored inn3
.
Encapsulation
Where do we write the add
, remove
, size
, and toString
methods? If each Node
stores a single element’s value, how do we refer to all the elements? We need a LinkedList
class that encapsulates all the Node
objects by bundling together all the data with the methods for operating on all of it.
public class LinkedList<E> implements List<E> {
// Reference to the first node in the sequence.
private Node front;
public void add(E element) {
if (front == null) {
// If the list is empty, assign the new node to the front.
front = new Node(element);
} else {
// Otherwise, add the element to the end of the list.
Node current = front;
while (current.next != null) {
current = current.next;
}
current.next = new Node(element);
}
}
public E remove(int index) {
// You'll implement this in your project.
}
public int size() {
// Count the number of nodes starting from front.
int result = 0;
Node current = front;
while (current != null) {
result += 1;
current = current.next;
}
return result;
}
private class Node {
private final E value;
private Node next;
public Node(E value) {
this(value, null);
}
public Node(E value, Node next) {
this.value = value;
this.next = next;
}
}
}
Java allows classes to be nested within other classes. In this case, we say that Node
is an inner class of LinkedList
. Note that the <E>
generic type does not have to be repeated in the inner class because each Node
is associated with its encapsulating LinkedList
. We also changed the Node
access modifier from public
to private
to indicate that Node
is only relevant to the implementer of LinkedList
.
Encapsulation doesn’t only describe the relationship between linked lists and their linked nodes. It can also be used to describe the relationship between array lists and their underlying arrays. Or really any class whose implementation details are hidden from clients—which is just about any program that we will write in this course! Encapsulation is the most common way to define an abstraction in data structures and algorithms.