Chapter 7
Arrays

Copyright © 2004 by Stuart Reges

7.1 Introduction

The sequential nature of files severely limits the number of interesting things you can easily do with them. The algorithms you have examined so far have all been sequential algorithms: algorithms that can be performed by examining each data item once in sequence. There is an entirely different class of algorithms that can be performed when you have random access of the data, when you can access the data items as many times as you like and in whatever order you like.

This chapter examines a new object that provides this random access--the array. The discussion of array applications begins with some sequential algorithms that are enhanced by arrays.

7.2 Arrays

Arrays, like files, are structured objects that have individual components. The array structure, however, allows random access. You can access any component at any time, no matter what you have accessed already.

An array is a collection of data values that have the same type. These different components are indexed by integers. The indexes differentiate the different components. This is similar to the way post office boxes are set up. Each post office has a series of boxes that all store the same thing: mail. The boxes are indexed with numbers so that you can refer to an individual box by using a description like "post office box 884."

You might, for example, have a series of temperature readings that you want to store. You could keep them in a series of different variables:

        double temperature1;
        double temperature2;
        double temperature3;
But you can simplify this scheme by using an array:

        double[] temperature = new double[3];
This declares a variable called temperature of type "double[]" (an array of doubles). Arrays are objects, so they need to be constructed by a call on "new". The right-hand side of this declaration tells Java to construct an array of doubles that is 3 long. They are all initialized to 0.0:

                                    +-------+
                                [0] |  0.0  |
                    +---+           +-------+
        temperature | +-+--->   [1] |  0.0  |
                    +---+           +-------+
                                [2] |  0.0  |
                                    +-------+
As the picture above indicates, the variable temperature is not itself the array. Instead, it stores a reference to the array. Notice that the lowest index of the array is 0. This is a convention that goes back to the C programming language and is referred to as "0 based indexing".

Once declared, we can refer to individual elements of this array using the square bracket notation:

        temperature[0] = 75.2;
        temperature[1] = 68.4;
        temperature[2] = 70.3;
which would modify the array to have the following values:

                                    +-------+
                                [0] |  75.2 |
                    +---+           +-------+
        temperature | +-+--->   [1] |  68.4 |
                    +---+           +-------+
                                [2] |  70.3 |
                                    +-------+
As noted above, the type of this variable is "double[]". In general, you can add empty square brackets after any Java type to form a new type (an array of those type of values). For example, the following are legal types:

        int
        double
        char
        String
Which means that the following are legal types:

        int[]: an array of ints
        double[]: an array of double values
        char[]: an array of characters
        String[]: an array of Strings
In fact, you can apply this idea indefinitely to form what are known as multidimensional arrays:

        int: one int
        int[]: a one-dimensional array of ints
        int[][]: a 2-dimensional grid of ints
        int[][][]: a 3-dimensional collection of ints
        ...
Arrays are always indexed using integers with a starting index of 0, but when constructed, you specify how many elements you would like in this particular array. For example, to have a list of 100 temperatures, we could have instead declared our variable as follows.

        double[] temperature = new double[100];
Java always allocates a contiguous block of memory for each array, which is what allows you to quickly access any element quickly. If you executed the declaration above and put some values into the components of temperature, it would look something like this.

                             +-------+
                        [0]  |  74.3 |
                             +-------+
                        [1]  |  85.2 |
                             +-------+
                        [2]  | 102.3 |
            +---+            +-------+
temperature | +-+--->   [3]  |  99.8 |
            +---+            +-------+
                        [4]  | 101.7 |
                             +-------+
                      [...]  |  ...  |
                             +-------+
                       [99]  |  84.6 |
                             +-------+
Sometimes we draw the array sideways:

                         [0]     [1]     [2]     [3]     [4]    [...]    [99]
            +---+     +-------+-------+-------+-------+-------+-------+-------+
temperature | +-+---> |  74.3 |  85.2 | 102.3 |  99.8 | 101.7 |  ...  |  84.6 |
            +---+     +-------+-------+-------+-------+-------+-------+-------+
The key point is that this represents a contiguous block of memory that can be indexed using a specific array index. For example, temperature[3] refers to the value with index 3, which, because of zero-based indexing, turns out to be the fourth value in the array (in the picture above, the box with the value 99.8 in it).

You are not restricted to simple literal values inside of the brackets. You can use any integer expression. This allows you to combine arrays with loops, which greatly simplifies the code you write. For example, suppose we want to find the average of the values in our array. We could do so by finding the sum and dividing by the length.

        double sum = 0.0;
        sum += temperature[0];
        sum += temperature[1];
        sum += temperature[2];
        sum += temperature[3];
        ...
        sum += temperature[99];
        double average = sum/100;
Obviously this code is very tedious and redundant. The only thing that changes in the various summing statements is the specific index we are accessing. We can use a for loop to capture this redundancy:

        double sum = 0.0;
        for (int i = 0; i < 100; i++)
            sum += temperature[i];
        double average = sum/100;
This is a very concise way to access all elements of the array. The code works when the array has a length of 100, but you can imagine the array having a different length. We can improve this code by referring to the length field of the array. Every array keeps track of its length and we can ask for it by putting ".length" after the name of an array variable (temperature.length in our example). Notice that this is similar to how we ask a String for its length, although there we have to include parentheses as in str.length(). This is one of those unfortunate inconsistencies in Java that you just have to remember.

Using the length field, we can improve the code above as follows.

        double sum = 0.0;
        for (int i = 0; i < temperature.length; i++)
            sum += temperature[i];
        double average = sum/temperature.length;
This code provides a template that you will see often with array processing code: a for loop that starts at 0 and that continues while the loop variable is less than the length of the array, doing something with [i] in the body of the loop.

It is possible to refer to an illegal index of an array, in which case Java throws an exception. For example, if we were to accidentally use "<=" in the for loop above instead of "<", then the loop would attempt to access temperature[100], which is not a legal element of our array. If your code makes such an illegal reference, Java will halt your program with an ArrayIndexOutOfBoundsException.

7.2.1 Program Example--Tally

Suppose you have an input file of integers between 0 and 10 and you want to compute how many of each number are in the input file. To do so, you must first recognize that you are doing eleven separate tallies. You are tallying the occurrences of the number 0, the number 1, the number 2, the number 3, and so on up to and including the number 10. You will more easily solve the overall task if you first solve the task of performing one tally. To tally the occurrences of the number 0, you simply count how often they occur in the input file.

	set count to 0.
	while (more numbers to read) {
	    read a number.
	    if (the number is 0)
		increment total.
        }
	report the value of count.
To solve the task of tallying all eleven numbers, you can use an array with one component for each of the numbers you are tallying. Because the simple tally involves a single int variable, you will want your array to have int components. That way you will have a different int variable for each of the eleven tallying tasks. Here is the declaration you would use for the array.

	int[] count = new int[11];
Here is what count looks like initially:

                   [0]     [1]     [2]     [3]     [4]    [...]    [10]
      +---+     +-------+-------+-------+-------+-------+-------+-------+
count | +-+---> |   0   |   0   |   0   |   0   |   0   |  ...  |   0   |
      +---+     +-------+-------+-------+-------+-------+-------+-------+
You could solve the problem by processing the file once looking for 0's, then on a second pass looking for 1's, and then a third pass looking for 2's, and so on. This is highly inefficient, however. It is more efficient to do all the tallying operations in parallel. Thus, our pseudocode becomes:

	set all counts to 0.
	while (more numbers to read) {
	    read a number.
 	    increment the count for the component corresponding to number.
        }
	report the value of each count.
How do we increment the appropriate component of total depending on the number read in? We use the number read in to index the component we are interested in. Thus, instead of saying

        increment the count for the component corresponding to number.
You can approach Pascal more by saying

        increment count[number].
One last point you should consider is what happens if number is out of range. What if the file contains a number that is less than 0 or greater than 10? You would obtain an array index out of bounds. There are many ways to account for this problem. One simple solution is to keep track of how many illegal values appear. Thus, the line above is replaced by:

	if (number is in range)
	    increment count[number].
	else
	    increment number of illegal values.
This requires introducing a new variable that tallies the number of illegal values. You would initialize the variable to 0 before the loop.

	set all counts to 0.
        set illegal count to 0.
	while (more numbers to read) {
	    read a number.
	    if (number is in range)
	        increment count[number].
	    else
	        increment number of illegal values.
        }
	report the value of each count.
        report the number of illegal values.
This now can be fairly easily translated into Java. The program includes the boilerplate code from chapter 6 for opening a file and introduces a class constant for the maximum value to be tallied.

// This program reads a series of values between 0 and MAX from the user and // reports the frequency of occurrence of each value. import java.io.*; public class Tally { public static final int MAX = 10; // maximum legal value public static void main(String[] args) { System.out.println("This program will count the frequency of"); System.out.println("occurrence of various numbers in a file."); System.out.println(); Scanner console = new Scanner(System.in); Scanner input = getInput(console); tally(input); } public static Scanner getInput(Scanner console) { System.out.print("What is the name of the input file? "); String name = console.nextLine(); Scanner result; try { result = new Scanner(new File(name)); } catch (FileNotFoundException e) { throw new RuntimeException("file not found"); } System.out.println(); return result; } public static void tally(Scanner input) { int[] count = new int[MAX + 1]; int numBad = 0; while (input.hasNextInt()) { int next = input.nextInt(); if (next >= 0 && next <= MAX) count[next]++; else numBad++; } System.out.println("Value\tOccurrences"); for (int i = 0; i < count.length; i++) System.out.println(i + "\t" + count[i]); System.out.println("Other\t" + numBad); } } Given an input file like the following:

The program would execute as follows.

This program will count the frequency of occurrence of various numbers in a file. What is the name of the input file? tally.dat Value Occurrences 0 7 1 8 2 11 3 5 4 10 5 7 6 7 7 6 8 5 9 3 10 2 Other 7

7.3 Variable length data

So far we have discussed an array that is in some sense full: all of its elements were in use. This is not always the case. Often we build an array with a maximum in mind, but only fill it up partially.

Suppose we want to use an array to process an input file of integers where the various input lines have different lengths. Some lines might have many integers on them and some might have very few. Because the lines differ in length, we don't know how big to make the array. Therefore, the best we can do is to set some arbitrary maximum for the length of a line and make the array that large.

Suppose that we assume that no input line has more than 30 integers on it. Then we can define an array of length 30 as follows:

        int[] numbers = new int[30];
To solve this problem we define not only an array for the integers, but also an integer variable to keep track of how many elements of the array are currently being used. We can call this variable "length". If we were to read the following input line:

        18 203 14 9 8 7
We would expect the variables numbers and length to look like this.

                          [0]   [1]   [2]   [3]   [4]   [5]   [6]  [...]  [29]
            +---+       +-----+-----+-----+-----+-----+-----+-----+-----+-----+
    numbers | +-+--->   |  18 | 203 |  14 |  9  |  8  |  7  |  0  | ... |  0  |
            +---+       +-----+-----+-----+-----+-----+-----+-----+-----+-----+

            +---+
    length  | 6 |
            +---+
The integers from the input line appear in the first six cells of this array. Because six cells of the array are in use, we store the value 6 in the memory cell for length. The elements 6 through 29 are 0 in this picture, but they could really have any value at all. By keeping track of the length, we can be sure to use only elements of the array that are currently active.

Let's examine a program that uses this data structure. Consider the problem of reversing the integers on each line of a file. Why does this problem require an extra structure? With a file you can read the numbers from a line in sequential order. You can't, however, read them backwards. An array has no such limitations. To reverse the line you read it into an array and then manipulate the array. You should think of this as a transfer of information. The numbers from the input line are transferred from the file structure to the array structure. Once transferred, you can manipulate the numbers using the array and not worry about the file until it is time to process another line. Thus, your pseudocode description would be:

        while (more lines to process) {
            read next input line.
            transfer line from file to array.
            write array contents in reverse order.
        }
Let's assume that as in the program examples in the last chapter, we will read each line of the file into a String and use that String to construct a Scanner for that particular input line. To transfer numbers from the Scanner for a particular input line into the array, we'd do the following.

        while (more numbers to read)
            read a number into the next array component.
To improve this pseudocode we need to specify what we mean by the next array component. Because we want to store the first number in component [0], the next in component [1], and so on, we need an integer variable to keep track of this. We also need to keep track of the length. Fortunately, we can easily combine these two tasks.

        set length to 0.
        while (more numbers to read) {
            read a number into numbers[length].
            increment length.
        }
Some novices reverse the order of the two lines inside the while loop and that makes a certain amount of sense. It seems like you'd want to increase your length first and then store the value in the array. But remember that arrays use zero-based indexing. That means that length is always one ahead of where you last stored something. In other words:

        store the first value in position [0] and remember a length of 1
        store the second value in position [1] and remember a length of 2
        store the third value in position [2] and remember a length of 3
        store the fourth value in position [3] and remember a length of 4
        ...
As a result, it makes sense to store first and then increment length rather than the other way around.

Writing the contents of the array in reverse order is a simple operation. We run our loop backwards using i-- instead of i++. Below is a complete program that puts all of this together.

// This program takes an input file of numbers and prints the numbers with // each line reversed in order. import java.io.*; public class ReverseLines { public static final int MAX_NUMBERS = 30; // max # of numbers on a line public static void main(String[] args) { System.out.println("This program takes an input file of numbers and"); System.out.println("reverses the numbers on each line."); System.out.println(); Scanner console = new Scanner(System.in); Scanner input = getInput(console); processFile(input); } // prompts the user for a file name and opens and returns a file Scanner public static Scanner getInput(Scanner console) { System.out.print("What is the name of the input file? "); String name = console.nextLine(); Scanner result; try { result = new Scanner(new File(name)); } catch (FileNotFoundException e) { throw new RuntimeException("file not found"); } System.out.println(); return result; } // processes the file line by line, reversing each line public static void processFile(Scanner input) { while (input.hasNextLine()) { Scanner data = new Scanner(input.nextLine()); reverseLine(data); } } // reads the numbers in the given Scanner, writing them in reverse order public static void reverseLine(Scanner data) { int[] numbers = new int[MAX_NUMBERS]; int length = 0; while (data.hasNextInt()) { numbers[length] = data.nextInt(); length++; } for (int i = length - 1; i >= 0; i--) System.out.print(numbers[i] + " "); System.out.println(); } } Given the following input file:

18 203 14 9 8 7 2 3 4 5 6 8 4 12 14 19 23 43 23 33 34 43 23 98 76 36 57 37 28 38 54 58 39 29 39 18 42 19 23 40 702 83 4 2 3 1 0 4 2 The program would execute as follows.

This program takes an input file of numbers and reverses the numbers on each line. What is the name of the input file? reverse.dat 7 8 9 14 203 18 5 4 3 2 4 8 6 39 29 39 58 54 38 28 37 57 36 76 98 23 43 34 33 23 43 23 19 14 12 702 40 23 19 42 18 83 2 4 0 1 3 2 4

7.4 Limitations of arrays

You should be aware of some general limitations of arrays.

There is one distinct advantage of arrays being objects, which is that you can pass them as parameters to another method and that other method will be able to change the contents of the array.

7.5 Initializing arrays

Java has a special syntax for initializing an array when you know exactly what you want to put into it. For example, you could write the following code to initialize an array of ints and an array of Strings.

        int[] numbers = new int[5];
        numbers[0] = 13;
        numbers[1] = 23;
        numbers[2] = 480;
        numbers[3] = -18;
        numbers[4] = 75;

        String[] words = new String[6];
        words[0] = "hello";
        words[1] = "how";
        words[2] = "are";
        words[3] = "you";
        words[4] = "feeling";
        words[5] = "today";
This works, but it's a rather tedious way to declare these arrays. Java provides a shorthand:

        int[] numbers = {13, 23, 480, -18, 75};
        String[] words = {"hello", "how", "are", "you", "feeling", "today"};
You use the curly braces to enclose a series of values that will be stored in the array. Order is important. The first value will go into index 0, the second value will go into index 1 and so on. Java counts how many values you include and constructs an array of just the right size. It then stores the various values into the appropriate spots in the array.

This is one of only two places where Java will construct an object without an explicit call on new. The other place we saw this was with String literals where Java constructs String objects for you without you having to call new. Both of these are conveniences for programmers because these tasks are so common that the designers of the language wanted to make it easy to do them.

7.6 Arrays in the Java class libraries

You will find arrays appearing throughout the Java class libraries. In fact, there is a special class called Arrays that has a collection of utility programs for manipulating arrays. One particularly handy method in the Arrays class is the method sort. For example, given the following variable declarations:

        int[] numbers = {13, 23, 480, -18, 75};
        String[] words = {"hello", "how", "are", "you", "feeling", "today"};
We could make the following calls:
        Arrays.sort(numbers);
        Arrays.sort(words);
If you then examine the values of the arrays, you will see that the numbers appear in numerically increasing order and the words appear in alphabetically increasing order. Below is a complete program that demonstrates this. The Arrays class is part of the java.util package, so you will see that the program includes an import from java.util.

import java.util.*; public class SortingExample { public static void main(String[] args) { int[] numbers = {13, 23, 480, -18, 75}; String[] words = {"hello", "how", "are", "you", "feeling", "today"}; System.out.println("Before sorting:"); display(numbers, words); Arrays.sort(numbers); Arrays.sort(words); System.out.println("After sorting:"); display(numbers, words); } public static void display(int[] numbers, String[] words) { for (int i = 0; i < numbers.length; i++) System.out.print(numbers[i] + " "); System.out.println(); for (int i = 0; i < words.length; i++) System.out.print(words[i] + " "); System.out.println(); System.out.println(); } } The program produces the following output: Before sorting: 13 23 480 -18 75 hello how are you feeling today After sorting: -18 13 23 75 480 are feeling hello how today you Arrays appear in many other places in the Java class libraries. For example, the String class has a method called toCharArray that will construct an array of characters that stores the individual characters of the array. There is a corresponding constructor for the String class that allows you to construct a String from a character array. As an example, the code below converts a word into a corresponding character array, then sorts the characters in the array, then recombines the characters to make a string:

        public static String sorted(String s) {
            char[] letters = s.toCharArray();
            Arrays.sort(letters);
            String result = new String(letters);
            return result;
        }
This might seem like a strange method to define, but it can be very helpful for solving word puzzles known as anagrams. An anagram of a word is a word that has the same letters but in a different order. For example, the following words are all anagrams of each other: pears, spear, pares, reaps, spare. Using the method above, all of these words would be converted to "aeprs". So to solve an anagram, you can simply go through a dictionary, converting every word to its sorted form and see if it matches the sorted form of the anagram you are working with.

7.7 Programming problems

  1. Java's type int has a limit on how large of an integer it can store. This limit can be circumvented by representing an integer as an array of digits. Write an interactive program that adds two integers of up to 50 digits each.

  2. Personal mailing labels can prove quite useful. Write a program that reads a five-line address from an input file and produces an output file with the address repeated 50 times in three columns.

  3. Write a program that reads an input file of numbers and reports the average of the numbers as well as how far each number is from the average.

  4. Write a program that plays a variation of the game of Mastermind with a user. For example, the program can use pseudorandom numbers to generate a 4-digit number. The user should be allowed to make guesses until the user gets the number correct. Clues should be given to the user indicating how many digits of the guess are correct and in the correct place, and how many are correct but in the wrong place.