ArrayIntList

Letter Inventory
Collections
Class invariants
Pre/post conditions
Exceptions

Letter Inventory

Natural language processing (NLP) is an interdisciplinary area of computer science focused on programming computers to work with natural languages used by humans everyday. Since all data in computers are ultimately represented as numbers, a key challenge in NLP is vectorization: the process of turning a natural language text into a meaningful numeric representation.

A letter inventory is a data type that represents a character-by-character count vectorization for letters in the English alphabet. Each letter inventory keeps track of how many a’s, how many b’s, etc. are contained in a given string. For example, the letter inventory for the string “Washington State” corresponds to [aaeghinnosstttw]. The case of the letter is ignored, so ‘s’ and ‘S’ are exactly the same.

Letter Inventory

Oct 5

ArrayIntList: BJP 10.1, 15.1, 15.2

Define classes with an array (storing primitive data types or strings) as a field.
Define methods that reassign field values while maintaining class invariants.
Document pre/post conditions and throw exceptions to report problems to clients.

Oct 6

SectionArrayIntList

Oct 7

Stacks and Queues: BJP 14

Define static methods that accept multiple objects as parameters.
Loop over each item in a queue using the add, remove, and size methods.
Loop over each item in a stack, storing popped items in another queue or stack.

Oct 8

SectionStacks and Queues

Oct 9

Algorithm Analysis: BJP 13.2

Apply the runtime analysis process to formally describe an algorithm’s runtime.
Describe an example where a queue or stack would be preferred over a list.
Explain why ArrayList does not implement the Queue interface.

We can use letter inventories to autocorrect text as the user types!

Collections

Letter inventories are a specific type of collection. A collection is an object that stores elements (data). Almost every problem that we’ll solve in this course involves defining algorithms that use collections to store and manipulate data.

To help us learn how to design a letter inventory, we’ll study how Java implements a popular collection type called ArrayList by building a simplified ArrayIntList class. An ArrayIntList can be used to store a list (ordered sequence) of integer data.

// An ArrayIntList stores a list of integers
public class ArrayIntList {
    private int[] elementData;
    private int size;

    // Constructs an empty list
    public ArrayIntList() {
        elementData = new int[10];
        size = 0;
    }

    // Appends the given value to the end of the list
    public void add(int value) {
        elementData[size] = value;
        size += 1;
    }
}

Internally, the ArrayIntList class maintains two fields: an int[] elementData representing the list of integers and an int size representing the current number of elements in the list.

Class invariants

So what’s the difference between an ArrayIntList and a raw int[]? Lists disagree with two key features of arrays.

Array elements can be stored at any index.
Array size is fixed and cannot be changed after construction.

These two features are part of what make arrays so useful as building blocks for programs. But they also inconvenience client programmers in two ways.

Since array elements can be stored at any index, there can be empty indices between elements.
Since array size is fixed, full arrays cannot store any more elements. (A larger array needs to be constructed, and all of the elements need to be copied over.)

Lists help clients store ordered sequences of elements without having to worry about these inconveniences. Our study of ArrayIntList will primarily focus on the first point about the location of array elements. Unlike int[], ArrayIntList enforces a class invariant (a relationship that must always hold) between the values in elementData and the size field.

ArrayIntList class invariant: The i-th item in the list is always stored at elementData[i].

Before any public method call, the class invariant must hold for external correctness.
In the implementation, the class invariant can be temporarily broken so long as it is later restored.
After any public method call, the class invariant must hold for external correctness.

Pre/post conditions

A getter method for ArrayIntList elements can be written in a few lines.

// Returns the integer at the given index in the list
public int get(int index) {
    return elementData[index];
}

Give an example of a problematic input index to the get method.

Calling get(-8) or get(100000) would cause an ArrayOutOfBoundsException.

A more subtle example: any index greater than or equal to the size of the list but still within the bounds of the underlying elementData array. get would return whatever “garbage” value was stored in the vacant array index.

To tell the client about these errors, we’ll introduce a documentation format called pre/post conditions.

Preconditions: Conditions that the client needs to follow in order for the method to behave as intended.
Postconditions: The intended behavior of a method assuming preconditions hold.

// pre : 0 <= index < size()
// post: Returns the integer at the given index in the list
public int get(int index) {
    return elementData[index];
}

While these comments are helpful for the client, it would be better to add validation checks in the code to prevent unintended behavior from occurring when the preconditions are not met. Here are two approaches.

Returning a dummy value: We can return a specific dummy value to notify the client. For example, some implementations of indexOf (returns the index of a given value in a list) return -1 if the value is not contained in the list. Since the client knows that the only valid indices that should be returned from indexOf are in the range 0 <= index < size(), -1 signifies a false result. If implemented this way, this behavior should be clearly documented!

Why might it be bad for get to return -1 when given an invalid index?

The client might have actually added the value -1 as an element to the list earlier, so they wouldn’t know if the -1 returned from get represented the dummy value or the value that they added to the list.

Throwing an exception: The implementer can report a problem to the client by throwing an exception. Most of the time, we prefer this approach.

Exceptions

We can check preconditions by adding an if statement. In order to stop the program within the if statement, throw an exception.

// pre : 0 <= index < size() (throws IllegalArgumentException if not)
// post: Returns the integer at the given index in the list
public int get(int index) {
    if (index < 0 || index >= size) {
        throw new IllegalArgumentException();
    }
    return elementData[index];
}

In this example, we chose to throw new IllegalArgumentException(). There are many different types of exceptions, each of which informs the client of a different problem. Don’t worry about which one to use: for the purpose of this course, we’ll walk you through the exceptions that need to be thrown when they become relevant.

Note that we also documented this exception by adding it to our method comment.

ArrayIntList

Table of contents

Letter Inventory

Letter Inventory

Collections

Class invariants

Pre/post conditions

Exceptions