Chapter 6
File Processing

Copyright © 2005 by Stuart Reges

6.1 Introduction

In chapter 4 we saw how to construct a Scanner object to read input from the Console. Now we will see how to construct Scanner objects to read input from files. The idea is fairly straightforward, but Java does not make it easy to read from input files. This is unfortunate because many interesting problems can be formulated as file processing tasks. Many introductory computer science classes have abandoned file processing altogether or the topic has moved into the second course because it is considered too advanced for novices.

There is nothing intrinsically complex about file processing. The languages C++ and C# provide mechanisms for easily reading and writing files. But Java was not designed for file processing and Sun has not been particularly eager to provide a simple solution. They did, however, introduce the Scanner class as a way to simplify some of the details associated with reading files. The result is that file reading is still awkward in Java, but at least the level of detail is manageable.

Before we can write a file processing program, we have to explore some issues related to Java exceptions. Remember that exceptions are errors that halt the execution of a program. In the case of file processing, we might try to open a file that doesn't exist, which would generate an exception.

6.2 Using a Scanner to Read an External File

An external file is a collection of characters and numbers that appear on one or more lines. They are external because they are not contained within the program and are not obtained from the user during execution. They are outside the scope of the program and its execution. You create external input files before the execution of a program. For example, you might create a file called "numbers.dat" with the following content.

308.2 14.9 7.4 2.8 3.9 4.7 -15.4 2.8 Then you might write a program that processes this external input file and produces some kind of report. Such a program should be general enough that it could process any external input file that matches a specified format. You can use simple text editors to create external input files. For example, on the Windows operating system you can use the NotePad editor to create such files.

We have been constructing our Scanner objects by passing System.in to the Scanner constructor:

        Scanner console = new Scanner(System.in);
This instructs the computer to construct a Scanner that reads from the console (i.e, pausing for input from the user). Instead of passing "System.in" to the constructor, we can pass information about the external input file to have the input come from the file rather than from the console. In particular, we can construct Scanner objects by passing an object of type File. A File object stores information about where to find an external input file. File objects in turn are constructed by passing a String that represents the file's name. For example, given our file called "numbers.dat", we can construct a File object that is linked to it:

        new File("numbers.dat")
And using this File object we can construct a Scanner object:

        new Scanner(new File("numbers.dat"))
Putting this all together, we'd say something like the following:

        Scanner input = new Scanner(new File("numbers.dat"));
This particular line of code or something like it will appear in all of your file processing programs. Once we've constructed the Scanner so that it reads from the file, we can manipulate it like any other Scanner. When we read from the console, we always prompt before reading to give the user an indication of what kind of data we want. When we read from a file, we don't need to prompt because the data is already there, stored in the file. For example, we might write the following short program to read 5 numbers from the file and to echo the five numbers along with their sum:

// Flawed program--doesn't even compile. import java.io.*; public class Echo1 { public static void main(String[] args) { Scanner input = new Scanner(new File("numbers.dat")); double sum = 0.0; for (int i = 1; i <= 5; i++) { double next = input.nextDouble(); System.out.println("number " + i + " = " + next); sum += next; } System.out.println("Sum = " + sum); } } The File class that we want to use is in a package known as java.io ("io" is short for "input/output"), which is why there is an import declaration at the beginning of the program. Unfortunately, this program doesn't compile. The compiler will give a message like the following:

    Echo1.java:7: unreported exception java.io.FileNotFoundException; must be
    caught or declared to be thrown
            Scanner input = new Scanner(new File("numbers.dat"));
                            ^
    1 error
The issue involves exceptions, which were first introduced in chapter 4. Remember that exceptions are errors that prevent a programming from continuing normal execution. In this case the compiler is worried that it might not be able to find a file called "numbers.dat". What is it supposed to do if that happens? It wouldn't have any way to continue executing the rest of the code because it wouldn't have a file to read from.

If the program is unable to locate the specified input file, it will throw what is known as a FileNotFoundException. This particular exception is known as a checked exception.

Checked Exception

An exception that must be caught or specifically declared in the header of the method that might generate it.

Because FileNotFoundException is a checked exception, we can't just ignore it. Java provides a construct known as the try/catch statement for handling such errors. Later in this chapter we will see how to use a try/catch to handle this error. But for now we will use a less sophisticated but simpler approach. Java allows us to avoid handling this error as long as we clearly indicate the fact that we aren't handling the error. In particular, we can include what is known as a "throws" clause in the header for the main method to clearly state the fact that our main method might generate this exception:

// This version does compile because of the throws clause in the header // for method main. import java.io.*; public class Echo2 { public static void main(String[] args) throws FileNotFoundException { Scanner input = new Scanner(new File("numbers.dat")); double sum = 0.0; for (int i = 1; i <= 5; i++) { double next = input.nextDouble(); System.out.println("number " + i + " = " + next); sum += next; } System.out.println("Sum = " + sum); } } This version of the program compiles and executes properly, generating the following output:

number 1 = 308.2 number 2 = 14.9 number 3 = 7.4 number 4 = 2.8 number 5 = 3.9 Sum = 337.19999999999993 If you add up those numbers by hand, you get the answer 3.4. Java comes up with a slightly different answer because of roundoff errors. These values are converted into base 2 and are stored with a limited accuracy, so it is possible to get these slight variations.

The preceding program read exactly 5 numbers from the file. More typically we read indefinitely using a while loop as long as there are more numbers to read. Remember that the Scanner class includes a series of "has" methods that parallel the various "next" methods. In this case, we are using nextDouble to read a value of type double, so we can use hasNextDouble to test whether there is such a value to read.

// Variation that reads while there are more numbers to read. import java.io.*; public class Echo3 { public static void main(String[] args) throws FileNotFoundException { Scanner input = new Scanner(new File("numbers.dat")); double sum = 0.0; int count = 0; while (input.hasNextDouble()) { double next = input.nextDouble(); count++; System.out.println("number " + count + " = " + next); sum += next; } System.out.println("Sum = " + sum); } } This program would work on an input file with any number of numbers. Our file happens to have eight numbers and when we run this version of the program, we get the following output:

number 1 = 308.2 number 2 = 14.9 number 3 = 7.4 number 4 = 2.8 number 5 = 3.9 number 6 = 4.7 number 7 = -15.4 number 8 = 2.8 Sum = 329.29999999999995

6.2.1 Structure of Files and Consuming Input

We think of text as being two-dimensional, but from the computer's point of view, each file is really just a one-dimensional sequence of characters. For example, consider the file called "numbers.dat" that we saw in the last section:

308.2 14.9 7.4 2.8 3.9 4.7 -15.4 2.8 We think of this as a 6-line file with text going across and down. The computer views the file differently. When someone types in a file like this, they hit the "Enter" key to go to a new line. This inserts special "new line" characters in the file. We have seen that the escape sequence "\n" can be used to produce a newline character for output. We can annotate the file above with "\n" characters to indicate the end of each line:

308.2 14.9 7.4\n 2.8\n \n \n 3.9 4.7 -15.4\n 2.8\n Once we have marked the end of each line, we no longer need to use a 2-dimensional representation. We can collapse this to a one-dimensional sequence of characters:

308.2 14.9 7.4\n2.8\n\n\n3.9 4.7 -15.4\n2.8\n From this one-dimensional sequence, we can reconstruct the various lines of the file. This is how the computer views the file, as a one-dimensional sequence of characters including special characters that represent "new line". On some systems, including Windows machines, there are two different characters that represent "new line", but for our discussion, we'll use just "\n" to represent both. Objects like Scanner handle these differences for us so we can generally ignore them, but for those who are interested, the brief explanation is that Windows machines end each line with a "\r" followed by a "\n".

To process a file the Scanner object keeps track of a current position in the file. You can think of this as a cursor or pointer into the file.

Input cursor

A pointer to the current position in an input file.

When the Scanner object is first constructed, this cursor points to the beginning of the file. But as we perform various "next" operations, this cursor moves forward. The Echo3 program from the last section processes the file through a series of calls on nextDouble. Let's take a moment to simulate how that works. When the Scanner is first constructed, the input cursor will be positioned at the beginning of the file (indicated below with an up-arrow pointing at the first character in the file):

        308.2 14.9 7.4\n2.8\n\n\n3.9 4.7 -15.4\n2.8\n
        ^
        |
      input
      cursor
After the first call on nextDouble, the cursor will be positioned in the middle of the first line after the token "308.2".

        308.2 14.9 7.4\n2.8\n\n\n3.9 4.7 -15.4\n2.8\n
             ^
             |
           input
           cursor
We refer to this process as consuming input.

Consuming input

Moving the input cursor forward past some input.

The first call on nextDouble consumes the text "308.2" from the input file and leaves the input cursor positioned at the first character after this token. Notice that this leaves the input cursor positioned at a space. When the second call is made on nextDouble, the Scanner first skips past this space to get to the next token and then consumes the text "14.9" and leaves the cursor positioned at the space that follows it:

        308.2 14.9 7.4\n2.8\n\n\n3.9 4.7 -15.4\n2.8\n
                  ^
                  |
                input
                cursor
A third call on nextDouble skips the space it is positioned at and consumes the text "7.4".

        308.2 14.9 7.4\n2.8\n\n\n3.9 4.7 -15.4\n2.8\n
                      ^
                      |
                    input
                    cursor
At this point, the input cursor is positioned at the newline character at the end of the first line of input. A fourth call on nextDouble skips past this newline character and consumes the text "2.8".

        308.2 14.9 7.4\n2.8\n\n\n3.9 4.7 -15.4\n2.8\n
                           ^
                           |
                         input
                         cursor
At this point, the input cursor is positioned at the end of the second line of input. When a fifth call is made on nextDouble, the Scanner finds two newline characters in a row. This isn't a problem for the Scanner, because it simply skips past any leading whitespace characters (spaces, tabs, newline characters) until it finds an actual token. So it skips both of these newline characters and consumes the text "3.9".

        308.2 14.9 7.4\n2.8\n\n\n3.9 4.7 -15.4\n2.8\n
                                    ^
                                    |
                                  input
                                  cursor
At this point the input cursor is positioned in the middle of the fourth line of input (the third line of input was a blank line). The sixth call on nextDouble consumes the text "4.7":

        308.2 14.9 7.4\n2.8\n\n\n3.9 4.7 -15.4\n2.8\n
                                        ^
                                        |
                                      input
                                      cursor
The seventh call consumes the text "-15.4":

        308.2 14.9 7.4\n2.8\n\n\n3.9 4.7 -15.4\n2.8\n
                                              ^
                                              |
                                            input
                                            cursor
And the eight call consumes the text "2.8":

        308.2 14.9 7.4\n2.8\n\n\n3.9 4.7 -15.4\n2.8\n
                                                   ^
                                                   |
                                                 input
                                                 cursor
At this point the input cursor is positioned at the newline character at the end of the file. If you attempted to call nextDouble again, it would throw an exception because there are no more tokens left to process. But remember that the Echo3 program has a while loop that calls hasNextDouble() before calling nextDouble() to make sure that there actually is a double value to process. When you call methods like hasNextDouble, the Scanner looks ahead in the file to see whether there is a next token and whether it could be interpreted as being of that type (in this case a double). So the Echo3 program will continue executing until it reaches the end of the file or until it encounters a token that could not be interpreted as a double.

Scanner objects are very flexible about when and how you consume input. You can consume part of the input in one section of code, then do some other work, then come back to consuming more of the input in another section of code. You decide exactly how much input to consume at a time through the calls you make on the Scanner object.

Keeping track of exactly where the input cursor is positioned can be tricky. If the data is line-oriented, it is best to read it in a line-oriented manner. We will see how to do that in a later section.

6.2.2 File Names

In the previous section we used the file name "numbers.dat". When Java finds you using a simple name like that, it looks in the current directory to find the file. The definition of "current directory" varies depending upon what Java environment you are using. If you are using the TextPad editor, then the current directory is the directory in which your program appears.

You can also use a fully-qualified file name. For example, if you are on a Windows machine and you have stored the file in a directory known as c:\data, we could use a file name like this:

        Scanner input = new Scanner(new File("c:\\data\\numbers.dat"));
Notice that we have to use the escape sequence "\\" to represent a single backslash character. This approach works well when you know exactly where your file is going to be stored on your system.

Another alternative is to ask the user for a file name. In the last chapter we saw a program called FindSum that prompted the user for a series of numbers to add together. Below is a variation that prompts the user for the name of a file of numbers to be added together.

// This program adds together a series of numbers from a file. It prompts // the user for the file name, then reads the file and reports the sum. import java.io.*; public class FindSum2 { public static void main(String[] args) throws FileNotFoundException { System.out.println("This program will add together a series of real"); System.out.println("numbers from a file."); System.out.println(); Scanner console = new Scanner(System.in); System.out.print("What is the file name? "); String name = console.nextLine(); Scanner input = new Scanner(new File(name)); System.out.println(); double sum = 0; while (input.hasNextDouble()) { double next = input.nextDouble(); sum += next; } System.out.println("Sum = " + sum); } } We read the file name using a call on nextLine() to read an entire line of input from the user. This allows the user to type in file names that have spaces in them. Notice that we still need the "throws FileNotFoundException" in the header for main because even though we are prompting the user for a file name, there won't necessarily be a file of that name.

If we have this program read from the file "numbers.dat" that we saw in the last section, then the program would execute like this:

This program will add together a series of real numbers from a file. What is the file name? numbers.dat Sum = 329.29999999999995 The user also has the option of specifying a full file name, as in:

This program will add together a series of real numbers from a file. What is the file name? c:\data\numbers.dat Sum = 329.29999999999995 Notice that the user doesn't have to type two backslashes to get a single backslash. That's because the Scanner object that reads the user's input is able to read it without escape sequences.

6.2.3 A more complex input file

Suppose that you have an input file that has information about how many hours have been worked by each employee of a company. For example, it might look like the following:

Erica 7.5 8.5 10.25 8 8.5 Greenlee 10.5 11.5 12 11 10.75 Simone 8 8 8 Ryan 6.5 8 9.25 8 Kendall 2.5 3 The idea is that we have a list of hours worked by each employee and we want to find out the total hours worked by each individual. We can construct a Scanner object linked to this file to solve this task. As you start writing more complex file processing programs, you will want to divide it up into methods to break up the code into logical subtasks. In this case, we can open the file in main and write a separate method to process the file.

Most file processing will involve while loops because we won't know in advance how much data the file has in it. We'll choose different tests depending upon the particular file we are processing, but they will almost all be calls on the various "has" methods of the Scanner class. We basically want to say, "while you have more data for me to process, let's keep reading."

In this case we have a series of input lines that each begin with a name. For this program we are assuming that names are simple, with no spaces in the middle. That means we'll be reading them with a call on the next() method. As a result, our overall test involves seeing if there is another name in the input file:

while (input.hasNext()) { <process next person> } So how do we process one person? We have to read their name and then read their list of hours. If you look at the sample input file, you will see that the list of hours is not always the same length. This is a common occurrence in input files. For example, some employees might have worked on 5 different days while others worked only 2 days or 3 days. So we will use a loop for this as well. This is a nested loop. The outer loop is handling one person at a time and the inner loop will handle one number at a time. The task is a fairly straightforward cumulative sum:

	double sum = 0.0;
	while (input.hasNextDouble())
	    sum += input.nextDouble();
Putting this all together, we end up with the following complete program.

// This program reads an input file of hours worked by various employees. Each // line of the input file should have an employee's name (without any spaces) // followed by a list of hours worked, as in: // // Erica 7.5 8.5 10.25 8 // Greenlee 10.5 11.5 12 11 // Ryan 6.5 8 9.25 8 // // The program reports the total hours worked by each employee. import java.io.*; public class HoursWorked { public static void main(String[] args) throws FileNotFoundException { Scanner input = new Scanner(new File("hours.dat")); process(input); } public static void process(Scanner input) { while (input.hasNext()) { String name = input.next(); double sum = 0.0; while (input.hasNextDouble()) sum += input.nextDouble(); System.out.println("Total hours worked by " + name + " = " + sum); } } } Notice that we again need the "throws FileNotFoundException" in the header for main. We don't need to include this in the "process" method because the code to open the file appears in method main.

If we put the input above into a file called "hours.dat" and execute the program, we get the following result.

Total hours worked by Erica = 42.75 Total hours worked by Greenlee = 55.75 Total hours worked by Simone = 24.0 Total hours worked by Ryan = 31.75 Total hours worked by Kendall = 5.5

6.3 Line-based input and String-based Scanners

The program in the last section required that names have no spaces in them. This isn't a very practical restriction. It would be more convenient to be able to type anything for a name, including numbers. One way to do that is to put the name on a separate line from the rest of the data. For example, suppose that you want to compute weighted GPAs for a series of students. Suppose, for example, that a student has a 3-unit 3.0, a 4-unit 2.9, a 3-unit 3.2 and a 2-unit 2.5. We can compute an overall GPA that is weighted by the individual units for each course.

So we might have an input file that has its data on pairs of lines. For each pair the name will appear on the first line and the grade data will appear on the second line. For example, we might have an input file that looks like this:

Erica Kane 3 2.8 4 3.9 3 3.1 Greenlee Smythe 3 3.9 3 4.0 4 3.9 Ryan Laveree 2 4.0 3 3.6 4 3.8 1 2.8 Adam Chandler 3 3.0 4 2.9 3 3.2 2 2.5 Adam Chandler, Jr 4 1.5 5 1.9 When you have data that appears on multiple lines, it is best to read entire lines of the input file using calls on the nextLine method. That means that we can control our overall file processing loop with a test on hasNextLine. For the input file above, our basic structure will be:

while (input.hasNextLine()) { String name = input.nextLine(); String grades = input.nextLine(); <process this student's data> } This works well for reading the name because it's all one piece of data. But the input line with grades has internal structure to it. Wouldn't it be nice to use a Scanner to process the individual parts of the line? Java makes this possible. We can construct a Scanner object from an individual String. So instead of reading each second line of input into a String, let's instead put it into a Scanner object:

while (input.hasNextLine()) { String name = input.nextLine(); Scanner grades = new Scanner(input.nextLine()); <process this student's data> } Notice that for each input line of grades we construct a Scanner object. Because it is inside the loop, we construct a different Scanner object for each such input line. We can process the input line the same way we process an input file. The Scanner object will have an input cursor to keep track of a position within the String and we can consume input through calls on various "next" and "has" methods.

This approach to file processing will work well for any input file that is line oriented. Some lines might represent a single value like the name in the example above. For those lines, we can use a call on nextLine to read the entire line as a String that we can keep track of. Other lines will have multiple data values on the line, in which case we can construct a Scanner object from the String that will allow us to extract the individual data values.

Let's explore how we would process the grades using a Scanner. This is a place to introduce a static method. The code above involves processing the overall file. The task of processing one list of grades is a lower level task that can be split off into its own method. Let's call it processGrades. Obviously it can't do its work without the Scanner object that has the grades, so we'll pass that as a parameter. What exactly needs to be done? The plan was to compute a weighted GPA for each student. So this method needs to read the individual grades and turn that into a single GPA score.

Weighted GPAs involve computing a value known as the "quality points" for each grade. The quality points are defined as the units times the grade. The weighted GPA is calculated by dividing the total quality points by the total units. So we just need to add up the total quality points and add up the total units, then divide. This involves a pair of cumulative sum tasks that we can express in pseudocode as follows:

        set total units to 0.
        set total quality points to 0.
        while (more grades to process) {
            read next units and next grade.
            add next units to total units.
            add (next units) * (next grade) to total quality points.
        }
        set gpa to (total quality points)/(total units).
This is fairly simple to translate into Java code by incorporating our Scanner object called "data":

	double totalQualityPoints = 0.0;
	double totalUnits = 0;
	while (data.hasNextInt()) {
	    int units = data.nextInt();
	    double grade = data.nextDouble();
	    totalUnits += units;
	    totalQualityPoints += units * grade;
	}
        double gpa = totalQualityPoints/totalUnits;
Because our Scanner object data was constructed from a single line of input, we can process just one person's grades with this loop. There is still a potential problem. What if there are no grades? Some students might have dropped all of their classes, for example. There are several ways we might handle that situation, but let's assume that it is appropriate to use a GPA of 0.0 in that case.

Making that correction and putting this into a method, we end up with the following code.

    public static double processGrades(Scanner data) {
	double totalQualityPoints = 0.0;
	double totalUnits = 0;
	while (data.hasNextInt()) {
	    int units = data.nextInt();
	    double grade = data.nextDouble();
	    totalUnits += units;
	    totalQualityPoints += units * grade;
	}
	if (totalUnits == 0)
	    return 0.0;
	else
	    return totalQualityPoints/totalUnits;
    }
Recall that our high-level code looked like this: while (input.hasNextLine()) { String name = input.nextLine(); Scanner grades = new Scanner(input.nextLine()); <process this student's data> } We can now start to fill in the details of what it means to "process this student's data." We will call the method we just wrote to process the grades for this student and to turn it into a weighted GPA and then print the results:

	    double gpa = processGrades(grades);
	    System.out.println("GPA for " + name + " = " + gpa);
This would complete the program, but let's add one more calculation. Let's compute the max and min GPA that we see among these students. We can accomplish this fairly easily with some simple if statements after the println:

        if (gpa > max)
	    max = gpa;
        if (gpa < min)
	    min = gpa;
We simply compare the current gpa against what we currently consider the max and min, resetting if the new gpa represents a new max or a new min. But how do we initialize these variables? We have two approaches to choose from. One approach involves initializing the max and the min to the first value in the sequence. We could do that, but it would make our loop much more complicated than it is currently. The second approach involves setting the max to the lowest possible value and setting the min to the highest possible value. This approach isn't always possible because we don't always know how high or low our values might go. But in the case of GPAs, we know that they will always be between 0.0 and 4.0.

Thus, we can initialize the variables as follows:

	double max = 0.0;
	double min = 4.0;
It may seem odd to set the max to 0 and the min to 4, but that's because we are intending to have them reset inside the loop. If the first student has a GPA of 3.2, for example, then this will constitute a new max (higher than 0.0) and a new min (lower than 4.0). Of course, it's possible that all students end up with a 4.0, but then our choice of 4.0 for the min is appropriate. Or all students could end up with a 0.0, in which case our choice of a max of 0.0 is appropriate.

Putting this all together we get the following complete program.

// This program reads an input file with GPA data for a series of students // and reports a weighted GPA for each. The input file should consist of // a series of line pairs where the first line has a student's name and the // second line has a series of grade entries. The grade entries should be // a number of units (an integer) followed by a grade (a number between 0.0 // and 4.0). For example, the input might look like this: // // Erica Kane // 3 2.8 4 3.9 3 3.1 // Greenlee Smythe // 3 3.9 3 4.0 4 3.9 // Ryan Laveree // 2 4.0 3 3.6 4 3.8 1 2.8 // // The program reports the weighted GPA for each student along with the // max and min GPA. import java.io.*; public class Gpa { public static void main(String[] args) throws FileNotFoundException { Scanner input = new Scanner(new File("gpa.dat")); process(input); } public static void process(Scanner input) { double max = 0.0; double min = 4.0; while (input.hasNextLine()) { String name = input.nextLine(); Scanner grades = new Scanner(input.nextLine()); double gpa = processGrades(grades); System.out.println("GPA for " + name + " = " + gpa); if (gpa > max) max = gpa; if (gpa < min) min = gpa; } System.out.println(); System.out.println("max GPA = " + max); System.out.println("min GPA = " + min); } public static double processGrades(Scanner data) { double totalQualityPoints = 0.0; double totalUnits = 0; while (data.hasNextInt()) { int units = data.nextInt(); double grade = data.nextDouble(); totalUnits += units; totalQualityPoints += units * grade; } if (totalUnits == 0) return 0.0; else return totalQualityPoints/totalUnits; } } Once again our main method has the "throws FileNotFoundException" in its header. This program executes as follows assuming the data above is placed in a file called "gpa.dat".

GPA for Erica Kane = 3.3299999999999996 GPA for Greenlee Smythe = 3.9299999999999997 GPA for Ryan Laveree = 3.6799999999999997 GPA for Adam Chandler = 2.9333333333333336 GPA for Adam Chandler, Jr = 1.7222222222222223 max GPA = 3.9299999999999997 min GPA = 1.7222222222222223

6.4 Try/Catch Statements

Including the clause "throws FileNotFoundException" in the header for main allows our programs to compile, but it's not a very satisfying solution to the underlying problem. To actually handle the potential error, we'd want to use something called a try/catch statement. We will not be exploring all of the details of try/catch, but we will examine how to write some basic try/catch statements that we could use for file processing.

The try/catch statement has the following general syntax.

try { <statement>; <statement>; ... <statement>; } catch (<type> <name>) { <statement>; <statement>; ... <statement>; } Notice that it is divided into two blocks using the keywords "try" and "catch". The first block has the code you want to execute. The second block has error recovery code that should be executed if an exception is thrown. So think of this as saying, "Try to execute these statements, but if something goes wrong, I'm going to give you some other code in the catch part that you should execute if an error occurs."

Notice that the catch part of this statement has a set of parentheses in which you include a type and name. The type should be the type of exception you are trying to catch. The name can be any legal identifier. For example, in the case of our Scanner code, we know that a FileNotFoundException might be thrown. What do we do if the exception occurs? That's a tough question, but for now let's just write an error message.

        try {
            Scanner input = new Scanner(new File("numbers.dat"));
        } catch (FileNotFoundException e) {
            System.out.println("File not found");
        }
This code says to try constructing the Scanner from the file "numbers.dat" but if the file is not found, then print an error message instead. This is the basic idea we want to follow, but there are several issues we must address to make this code work for us. First of all, there is a scope issue. The variable input isn't going to be much use to us if it's trapped inside the try block. So we have to declare the Scanner variable outside the try/catch statement:

        Scanner input;
        try {
            input = new Scanner(new File("numbers.dat"));
        } catch (FileNotFoundException e) {
            System.out.println("File not found");
        }
We have a bigger problem in that simply printing an error message isn't a good way to recover from this problem. How is the program supposed to proceed with execution if it can't read from the file? It probably can't. So what would be a more appropriate way to recover from the error? That depends a lot on the particular program you are writing, so the answer is likely to vary from one program to the next.

Let's explore how you might handle this when you are prompting the user for a file name in the console window. In that case, we could keep prompting the user until they give us a legal file name. Let's begin by modifying the code above to prompt and read a file name.

        Scanner input;
        System.out.print("What is the name of the input file? ");
	String name = console.nextLine();
        try {
            input = new Scanner(new File(name));
        } catch (FileNotFoundException e) {
            System.out.println("File not found");
        }
This code catches the potential exception and prints an error message, but we want to add a loop that executes while the user has not given us a legal file name. We want it to look something like this:

        Scanner input;
        while (user hasn't given a legal name) {
            
            
        }
We have a classic problem of how to prime this loop so that it enters the first time through. We're trying to construct a Scanner from a file. When we succeed, we'll be giving a value to the variable "input". Can we initialize input to something that would indicate that we aren't yet done? The answer is yes. There is a special keyword in Java called "null" that is used to represent "no object". We can initialize the variable input to null as a way to say, "This variable doesn't yet point to an actual object." The primary advantage of initializing the variable to null is that we can test whether it's null in the while loop.

        Scanner input = null;
        while (input == null) {
            
            
        }
We start the variable with the value null, so it enters the while loop the first time through. If the code in the try/catch fails to properly open the file, then the variable will still be null and we'll execute the loop a second time, prompting for another file name and trying to open it. If the code in the try/catch fails again, then we generate yet another error message and go through the loop a third time.

We can combine this pseudocode with the try/catch code we saw earlier. It seems prudent to modify the error message to make it clear that the user is being given another chance to enter a legal file name.

        Scanner input = null;
        while (input == null) {
	    System.out.print("What is the name of the input file? ");
	    String name = console.nextLine();
	    try {
		input = new Scanner(new File(name));
	    } catch (FileNotFoundException e) {
		System.out.println("File not found.  Please try again.");
	    }
        }
This loop executes repeatedly until the call on "new Scanner" inside the try block succeeds and gives the variable input a non-null value. This code could be included in method main, although we'd have to construct a Scanner for console input to be able to prompt the user for a file name. Below is a variation of the HoursWorked program that prompts for a file name.

import java.io.*; public class HoursWorked2 { public static void main(String[] args) { Scanner console = new Scanner(System.in); Scanner input = null; while (input == null) { System.out.print("What is the name of the input file? "); String name = console.nextLine(); try { input = new Scanner(new File(name)); } catch (FileNotFoundException e) { System.out.println("File not found. Please try again."); } } process(input); } public static void process(Scanner input) { while (input.hasNext()) { String name = input.next(); double sum = 0.0; while (input.hasNextDouble()) sum += input.nextDouble(); System.out.println("Total hours worked by " + name + " = " + sum); } } } Notice that we no longer need the "throws FileNotFoundException" in the header for main because we handle the potential exception. Here is a log of execution showing what happens when the user types in illegal file names:

What is the name of the input file? ours.dat File not found. Please try again. What is the name of the input file? hours.txt File not found. Please try again. What is the name of the input file? data.dat File not found. Please try again. What is the name of the input file? file.dat File not found. Please try again. What is the name of the input file? hours.dat Total hours worked by Erica = 42.75 Total hours worked by Greenlee = 55.75 Total hours worked by Simone = 24.0 Total hours worked by Ryan = 31.75 Total hours worked by Kendall = 5.5 This code for opening a file is complicated enough that you might want to put it in its own static method. Below is a final variation that includes a method called getInput that prompts the user for a legal file name that can be used to construct a Scanner.

import java.io.*; public class HoursWorked3 { public static void main(String[] args) { Scanner console = new Scanner(System.in); Scanner input = getInput(console); process(input); } public static Scanner getInput(Scanner console) { Scanner result = null; while (result == null) { System.out.print("What is the name of the input file? "); String name = console.nextLine(); try { result = new Scanner(new File(name)); } catch (FileNotFoundException e) { System.out.println("File not found. Please try again."); } } System.out.println(); return result; } public static void process(Scanner input) { while (input.hasNext()) { String name = input.next(); double sum = 0.0; while (input.hasNextDouble()) sum += input.nextDouble(); System.out.println("Total hours worked by " + name + " = " + sum); } } } The code we have written for opening a file tends to be fairly standard in that we could use it without modification in many programs. We refer to this as "boilerplate" code.
Boilerplate Code

Code that tends to be the same from one program to another.

The method getInput is a good example of the kind of boilerplate code that you might use in many different file-processing programs.

6.5 Programming Problems

  1. Write a program that takes as input a single-spaced text file and produces as output a double-spaced text file.

  2. Students are often told that their term papers should have a certain number of words in them. Counting words in a long paper is a tedious task, but the computer can help. Write a program that counts the number of words in a paper assuming that consecutive words are separated either by spaces or end-of-line characters. You could then extend the program to count not just the number of words, but the number of lines and the total number of characters in the file.

  3. Write a program that takes as input lines of text like:

    This is some text here. and produces as output the same text inside a box, as in:

    	+--------------+
    	| This is some |
    	| text here.   |
    	+--------------+
    
    Your program will have to assume some maximum line length (e.g., 12 above).