The sequential nature of files severely limits the number of interesting things you can easily do with them. The algorithms you have examined so far have all been sequential algorithms: algorithms that can be performed by examining each data item once in sequence. There is an entirely different class of algorithms that can be performed when you have random access of the data, when you can access the data items as many times as you like and in whatever order you like.
This chapter examines a new object that provides this random access--the array. The discussion of array applications begins with some sequential algorithms that are enhanced by arrays.
Arrays, like files, are structured objects that have individual components. The array structure, however, allows random access. You can access any component at any time, no matter what you have accessed already.
An array is a collection of data values that have the same type. These different components are indexed by integers. The indexes differentiate the different components. This is similar to the way post office boxes are set up. Each post office has a series of boxes that all store the same thing: mail. The boxes are indexed with numbers so that you can refer to an individual box by using a description like "post office box 884."
You might, for example, have a series of temperature readings that you want to store. You could keep them in a series of different variables:
double temperature1; double temperature2; double temperature3;But you can simplify this scheme by using an array:
double[] temperature = new double[3];This declares a variable called temperature of type "double[]" (an array of doubles). Arrays are objects, so they need to be constructed by a call on "new". The right-hand side of this declaration tells Java to construct an array of doubles that is 3 long. They are all initialized to 0.0:
+-------+ [0] | 0.0 | +---+ +-------+ temperature | +-+---> [1] | 0.0 | +---+ +-------+ [2] | 0.0 | +-------+As the picture above indicates, the variable temperature is not itself the array. Instead, it stores a reference to the array. Notice that the lowest index of the array is 0. This is a convention that goes back to the C programming language and is referred to as "0 based indexing".
Once declared, we can refer to individual elements of this array using the square bracket notation:
temperature[0] = 75.2; temperature[1] = 68.4; temperature[2] = 70.3;which would modify the array to have the following values:
+-------+ [0] | 75.2 | +---+ +-------+ temperature | +-+---> [1] | 68.4 | +---+ +-------+ [2] | 70.3 | +-------+As noted above, the type of this variable is "double[]". In general, you can add empty square brackets after any Java type to form a new type (an array of those type of values). For example, the following are legal types:
int double char StringWhich means that the following are legal types:
int[]: an array of ints double[]: an array of double values char[]: an array of characters String[]: an array of StringsIn fact, you can apply this idea indefinitely to form what are known as multidimensional arrays:
int: one int int[]: a one-dimensional array of ints int[][]: a 2-dimensional grid of ints int[][][]: a 3-dimensional collection of ints ...Arrays are always indexed using integers with a starting index of 0, but when constructed, you specify how many elements you would like in this particular array. For example, to have a list of 100 temperatures, we could have instead declared our variable as follows.
double[] temperature = new double[100];Java always allocates a contiguous block of memory for each array, which is what allows you to quickly access any element quickly. If you executed the declaration above and put some values into the components of temperature, it would look something like this.
+-------+ [0] | 74.3 | +-------+ [1] | 85.2 | +-------+ [2] | 102.3 | +---+ +-------+ temperature | +-+---> [3] | 99.8 | +---+ +-------+ [4] | 101.7 | +-------+ [...] | ... | +-------+ [99] | 84.6 | +-------+Sometimes we draw the array sideways:
[0] [1] [2] [3] [4] [...] [99] +---+ +-------+-------+-------+-------+-------+-------+-------+ temperature | +-+---> | 74.3 | 85.2 | 102.3 | 99.8 | 101.7 | ... | 84.6 | +---+ +-------+-------+-------+-------+-------+-------+-------+The key point is that this represents a contiguous block of memory that can be indexed using a specific array index. For example, temperature[3] refers to the value with index 3, which, because of zero-based indexing, turns out to be the fourth value in the array (in the picture above, the box with the value 99.8 in it).
You are not restricted to simple literal values inside of the brackets. You can use any integer expression. This allows you to combine arrays with loops, which greatly simplifies the code you write. For example, suppose we want to find the average of the values in our array. We could do so by finding the sum and dividing by the length.
double sum = 0.0; sum += temperature[0]; sum += temperature[1]; sum += temperature[2]; sum += temperature[3]; ... sum += temperature[99]; double average = sum/100;Obviously this code is very tedious and redundant. The only thing that changes in the various summing statements is the specific index we are accessing. We can use a for loop to capture this redundancy:
double sum = 0.0; for (int i = 0; i < 100; i++) sum += temperature[i]; double average = sum/100;This is a very concise way to access all elements of the array. The code works when the array has a length of 100, but you can imagine the array having a different length. We can improve this code by referring to the length field of the array. Every array keeps track of its length and we can ask for it by putting ".length" after the name of an array variable (temperature.length in our example). Notice that this is similar to how we ask a String for its length, although there we have to include parentheses as in str.length(). This is one of those unfortunate inconsistencies in Java that you just have to remember.
Using the length field, we can improve the code above as follows.
double sum = 0.0; for (int i = 0; i < temperature.length; i++) sum += temperature[i]; double average = sum/temperature.length;This code provides a template that you will see often with array processing code: a for loop that starts at 0 and that continues while the loop variable is less than the length of the array, doing something with
It is possible to refer to an illegal index of an array, in which case Java throws an exception. For example, if we were to accidentally use "<=" in the for loop above instead of "<", then the loop would attempt to access temperature[100], which is not a legal element of our array. If your code makes such an illegal reference, Java will halt your program with an ArrayIndexOutOfBoundsException.
Suppose you have an input file of integers between 0 and 10 and you want to compute how many of each number are in the input file. To do so, you must first recognize that you are doing eleven separate tallies. You are tallying the occurrences of the number 0, the number 1, the number 2, the number 3, and so on up to and including the number 10. You will more easily solve the overall task if you first solve the task of performing one tally. To tally the occurrences of the number 0, you simply count how often they occur in the input file.
set count to 0. while (more numbers to read) { read a number. if (the number is 0) increment total. } report the value of count.To solve the task of tallying all eleven numbers, you can use an array with one component for each of the numbers you are tallying. Because the simple tally involves a single int variable, you will want your array to have int components. That way you will have a different int variable for each of the eleven tallying tasks. Here is the declaration you would use for the array.
int[] count = new int[11];Here is what count looks like initially:
[0] [1] [2] [3] [4] [...] [10] +---+ +-------+-------+-------+-------+-------+-------+-------+ count | +-+---> | 0 | 0 | 0 | 0 | 0 | ... | 0 | +---+ +-------+-------+-------+-------+-------+-------+-------+You could solve the problem by processing the file once looking for 0's, then on a second pass looking for 1's, and then a third pass looking for 2's, and so on. This is highly inefficient, however. It is more efficient to do all the tallying operations in parallel. Thus, our pseudocode becomes:
set all counts to 0. while (more numbers to read) { read a number. increment the count for the component corresponding to number. } report the value of each count.How do we increment the appropriate component of total depending on the number read in? We use the number read in to index the component we are interested in. Thus, instead of saying
increment the count for the component corresponding to number.You can approach Pascal more by saying
increment count[number].One last point you should consider is what happens if number is out of range. What if the file contains a number that is less than 0 or greater than 10? You would obtain an array index out of bounds. There are many ways to account for this problem. One simple solution is to keep track of how many illegal values appear. Thus, the line above is replaced by:
if (number is in range) increment count[number]. else increment number of illegal values.This requires introducing a new variable that tallies the number of illegal values. You would initialize the variable to 0 before the loop.
set all counts to 0. set illegal count to 0. while (more numbers to read) { read a number. if (number is in range) increment count[number]. else increment number of illegal values. } report the value of each count. report the number of illegal values.This now can be fairly easily translated into Java. The program includes the boilerplate code from chapter 6 for opening a file and introduces a class constant for the maximum value to be tallied.
So far we have discussed an array that is in some sense full: all of its elements were in use. This is not always the case. Often we build an array with a maximum in mind, but only fill it up partially.
Suppose we want to use an array to process an input file of integers where the various input lines have different lengths. Some lines might have many integers on them and some might have very few. Because the lines differ in length, we don't know how big to make the array. Therefore, the best we can do is to set some arbitrary maximum for the length of a line and make the array that large.
Suppose that we assume that no input line has more than 30 integers on it. Then we can define an array of length 30 as follows:
int[] numbers = new int[30];To solve this problem we define not only an array for the integers, but also an integer variable to keep track of how many elements of the array are currently being used. We can call this variable "length". If we were to read the following input line:
18 203 14 9 8 7We would expect the variables numbers and length to look like this.
[0] [1] [2] [3] [4] [5] [6] [...] [29] +---+ +-----+-----+-----+-----+-----+-----+-----+-----+-----+ numbers | +-+---> | 18 | 203 | 14 | 9 | 8 | 7 | 0 | ... | 0 | +---+ +-----+-----+-----+-----+-----+-----+-----+-----+-----+ +---+ length | 6 | +---+The integers from the input line appear in the first six cells of this array. Because six cells of the array are in use, we store the value 6 in the memory cell for length. The elements 6 through 29 are 0 in this picture, but they could really have any value at all. By keeping track of the length, we can be sure to use only elements of the array that are currently active.
Let's examine a program that uses this data structure. Consider the problem of reversing the integers on each line of a file. Why does this problem require an extra structure? With a file you can read the numbers from a line in sequential order. You can't, however, read them backwards. An array has no such limitations. To reverse the line you read it into an array and then manipulate the array. You should think of this as a transfer of information. The numbers from the input line are transferred from the file structure to the array structure. Once transferred, you can manipulate the numbers using the array and not worry about the file until it is time to process another line. Thus, your pseudocode description would be:
while (more lines to process) { read next input line. transfer line from file to array. write array contents in reverse order. }Let's assume that as in the program examples in the last chapter, we will read each line of the file into a String and use that String to construct a Scanner for that particular input line. To transfer numbers from the Scanner for a particular input line into the array, we'd do the following.
while (more numbers to read) read a number into the next array component.To improve this pseudocode we need to specify what we mean by the next array component. Because we want to store the first number in component [0], the next in component [1], and so on, we need an integer variable to keep track of this. We also need to keep track of the length. Fortunately, we can easily combine these two tasks.
set length to 0. while (more numbers to read) { read a number into numbers[length]. increment length. }Some novices reverse the order of the two lines inside the while loop and that makes a certain amount of sense. It seems like you'd want to increase your length first and then store the value in the array. But remember that arrays use zero-based indexing. That means that length is always one ahead of where you last stored something. In other words:
store the first value in position [0] and remember a length of 1 store the second value in position [1] and remember a length of 2 store the third value in position [2] and remember a length of 3 store the fourth value in position [3] and remember a length of 4 ...As a result, it makes sense to store first and then increment length rather than the other way around.
Writing the contents of the array in reverse order is a simple operation. We run our loop backwards using i-- instead of i++. Below is a complete program that puts all of this together.
You should be aware of some general limitations of arrays.
Java has a special syntax for initializing an array when you know exactly what you want to put into it. For example, you could write the following code to initialize an array of ints and an array of Strings.
int[] numbers = new int[5]; numbers[0] = 13; numbers[1] = 23; numbers[2] = 480; numbers[3] = -18; numbers[4] = 75; String[] words = new String[6]; words[0] = "hello"; words[1] = "how"; words[2] = "are"; words[3] = "you"; words[4] = "feeling"; words[5] = "today";This works, but it's a rather tedious way to declare these arrays. Java provides a shorthand:
int[] numbers = {13, 23, 480, -18, 75}; String[] words = {"hello", "how", "are", "you", "feeling", "today"};You use the curly braces to enclose a series of values that will be stored in the array. Order is important. The first value will go into index 0, the second value will go into index 1 and so on. Java counts how many values you include and constructs an array of just the right size. It then stores the various values into the appropriate spots in the array.
This is one of only two places where Java will construct an object without an explicit call on new. The other place we saw this was with String literals where Java constructs String objects for you without you having to call new. Both of these are conveniences for programmers because these tasks are so common that the designers of the language wanted to make it easy to do them.
You will find arrays appearing throughout the Java class libraries. In fact, there is a special class called Arrays that has a collection of utility programs for manipulating arrays. One particularly handy method in the Arrays class is the method sort. For example, given the following variable declarations:
int[] numbers = {13, 23, 480, -18, 75}; String[] words = {"hello", "how", "are", "you", "feeling", "today"};We could make the following calls:
Arrays.sort(numbers); Arrays.sort(words);If you then examine the values of the arrays, you will see that the numbers appear in numerically increasing order and the words appear in alphabetically increasing order. Below is a complete program that demonstrates this. The Arrays class is part of the java.util package, so you will see that the program includes an import from java.util.
public static String sorted(String s) { char[] letters = s.toCharArray(); Arrays.sort(letters); String result = new String(letters); return result; }This might seem like a strange method to define, but it can be very helpful for solving word puzzles known as anagrams. An anagram of a word is a word that has the same letters but in a different order. For example, the following words are all anagrams of each other: pears, spear, pares, reaps, spare. Using the method above, all of these words would be converted to "aeprs". So to solve an anagram, you can simply go through a dictionary, converting every word to its sorted form and see if it matches the sorted form of the anagram you are working with.