In chapter 4 we saw how to construct a Scanner object to read input from the Console. Now we will see how to construct Scanner objects to read input from files. The idea is fairly straightforward, but Java does not make it easy to read from input files. This is unfortunate because many interesting problems can be formulated as file processing tasks. Many introductory computer science classes have abandoned file processing altogether or the topic has moved into the second course because it is considered too advanced for novices.
There is nothing intrinsically complex about file processing. The languages C++ and C# provide mechanisms for easily reading and writing files. But Java was not designed for file processing and Sun has not been particularly eager to provide a simple solution. They did, however, introduce the Scanner class as a way to simplify some of the details associated with reading files. The result is that file reading is still awkward in Java, but at least the level of detail is manageable.
Before we can write a file processing program, we have to explore some issues related to Java exceptions. Remember that exceptions are errors that halt the execution of a program. In the case of file processing, we might try to open a file that doesn't exist, which would generate an exception.
An external file is a collection of characters and numbers that appear on one or more lines. They are external because they are not contained within the program and are not obtained from the user during execution. They are outside the scope of the program and its execution. You create external input files before the execution of a program. For example, you might create a file called "numbers.dat" with the following content.
We have been constructing our Scanner objects by passing System.in to the Scanner constructor:
Scanner console = new Scanner(System.in);This instructs the computer to construct a Scanner that reads from the console (i.e, pausing for input from the user). Instead of passing "System.in" to the constructor, we can pass information about the external input file to have the input come from the file rather than from the console. In particular, we can construct Scanner objects by passing an object of type File. A File object stores information about where to find an external input file. File objects in turn are constructed by passing a String that represents the file's name. For example, given our file called "numbers.dat", we can construct a File object that is linked to it:
new File("numbers.dat")And using this File object we can construct a Scanner object:
new Scanner(new File("numbers.dat"))Putting this all together, we'd say something like the following:
Scanner input = new Scanner(new File("numbers.dat"));This particular line of code or something like it will appear in all of your file processing programs. Once we've constructed the Scanner so that it reads from the file, we can manipulate it like any other Scanner. When we read from the console, we always prompt before reading to give the user an indication of what kind of data we want. When we read from a file, we don't need to prompt because the data is already there, stored in the file. For example, we might write the following short program to read 5 numbers from the file and to echo the five numbers along with their sum:
Echo1.java:7: unreported exception java.io.FileNotFoundException; must be caught or declared to be thrown Scanner input = new Scanner(new File("numbers.dat")); ^ 1 errorThe issue involves exceptions, which were first introduced in chapter 4. Remember that exceptions are errors that prevent a programming from continuing normal execution. In this case the compiler is worried that it might not be able to find a file called "numbers.dat". What is it supposed to do if that happens? It wouldn't have any way to continue executing the rest of the code because it wouldn't have a file to read from.
If the program is unable to locate the specified input file, it will throw what is known as a FileNotFoundException. This particular exception is known as a checked exception.
Checked Exception An exception that must be caught or specifically declared in the header of the method that might generate it. |
Because FileNotFoundException is a checked exception, we can't just ignore it. Java provides a construct known as the try/catch statement for handling such errors. Later in this chapter we will see how to use a try/catch to handle this error. But for now we will use a less sophisticated but simpler approach. Java allows us to avoid handling this error as long as we clearly indicate the fact that we aren't handling the error. In particular, we can include what is known as a "throws" clause in the header for the main method to clearly state the fact that our main method might generate this exception:
The preceding program read exactly 5 numbers from the file. More typically we read indefinitely using a while loop as long as there are more numbers to read. Remember that the Scanner class includes a series of "has" methods that parallel the various "next" methods. In this case, we are using nextDouble to read a value of type double, so we can use hasNextDouble to test whether there is such a value to read.
We think of text as being two-dimensional, but from the computer's point of view, each file is really just a one-dimensional sequence of characters. For example, consider the file called "numbers.dat" that we saw in the last section:
To process a file the Scanner object keeps track of a current position in the file. You can think of this as a cursor or pointer into the file.
Input cursor A pointer to the current position in an input file. |
When the Scanner object is first constructed, this cursor points to the beginning of the file. But as we perform various "next" operations, this cursor moves forward. The Echo3 program from the last section processes the file through a series of calls on nextDouble. Let's take a moment to simulate how that works. When the Scanner is first constructed, the input cursor will be positioned at the beginning of the file (indicated below with an up-arrow pointing at the first character in the file):
308.2 14.9 7.4\n2.8\n\n\n3.9 4.7 -15.4\n2.8\n ^ | input cursorAfter the first call on nextDouble, the cursor will be positioned in the middle of the first line after the token "308.2".
308.2 14.9 7.4\n2.8\n\n\n3.9 4.7 -15.4\n2.8\n ^ | input cursorWe refer to this process as consuming input.
Consuming input Moving the input cursor forward past some input. |
The first call on nextDouble consumes the text "308.2" from the input file and leaves the input cursor positioned at the first character after this token. Notice that this leaves the input cursor positioned at a space. When the second call is made on nextDouble, the Scanner first skips past this space to get to the next token and then consumes the text "14.9" and leaves the cursor positioned at the space that follows it:
308.2 14.9 7.4\n2.8\n\n\n3.9 4.7 -15.4\n2.8\n ^ | input cursorA third call on nextDouble skips the space it is positioned at and consumes the text "7.4".
308.2 14.9 7.4\n2.8\n\n\n3.9 4.7 -15.4\n2.8\n ^ | input cursorAt this point, the input cursor is positioned at the newline character at the end of the first line of input. A fourth call on nextDouble skips past this newline character and consumes the text "2.8".
308.2 14.9 7.4\n2.8\n\n\n3.9 4.7 -15.4\n2.8\n ^ | input cursorAt this point, the input cursor is positioned at the end of the second line of input. When a fifth call is made on nextDouble, the Scanner finds two newline characters in a row. This isn't a problem for the Scanner, because it simply skips past any leading whitespace characters (spaces, tabs, newline characters) until it finds an actual token. So it skips both of these newline characters and consumes the text "3.9".
308.2 14.9 7.4\n2.8\n\n\n3.9 4.7 -15.4\n2.8\n ^ | input cursorAt this point the input cursor is positioned in the middle of the fourth line of input (the third line of input was a blank line). The sixth call on nextDouble consumes the text "4.7":
308.2 14.9 7.4\n2.8\n\n\n3.9 4.7 -15.4\n2.8\n ^ | input cursorThe seventh call consumes the text "-15.4":
308.2 14.9 7.4\n2.8\n\n\n3.9 4.7 -15.4\n2.8\n ^ | input cursorAnd the eight call consumes the text "2.8":
308.2 14.9 7.4\n2.8\n\n\n3.9 4.7 -15.4\n2.8\n ^ | input cursorAt this point the input cursor is positioned at the newline character at the end of the file. If you attempted to call nextDouble again, it would throw an exception because there are no more tokens left to process. But remember that the Echo3 program has a while loop that calls hasNextDouble() before calling nextDouble() to make sure that there actually is a double value to process. When you call methods like hasNextDouble, the Scanner looks ahead in the file to see whether there is a next token and whether it could be interpreted as being of that type (in this case a double). So the Echo3 program will continue executing until it reaches the end of the file or until it encounters a token that could not be interpreted as a double.
Scanner objects are very flexible about when and how you consume input. You can consume part of the input in one section of code, then do some other work, then come back to consuming more of the input in another section of code. You decide exactly how much input to consume at a time through the calls you make on the Scanner object.
Keeping track of exactly where the input cursor is positioned can be tricky. If the data is line-oriented, it is best to read it in a line-oriented manner. We will see how to do that in a later section.
In the previous section we used the file name "numbers.dat". When Java finds you using a simple name like that, it looks in the current directory to find the file. The definition of "current directory" varies depending upon what Java environment you are using. If you are using the TextPad editor, then the current directory is the directory in which your program appears.
You can also use a fully-qualified file name. For example, if you are on a Windows machine and you have stored the file in a directory known as c:\data, we could use a file name like this:
Scanner input = new Scanner(new File("c:\\data\\numbers.dat"));Notice that we have to use the escape sequence "\\" to represent a single backslash character. This approach works well when you know exactly where your file is going to be stored on your system.
Another alternative is to ask the user for a file name. In the last chapter we saw a program called FindSum that prompted the user for a series of numbers to add together. Below is a variation that prompts the user for the name of a file of numbers to be added together.
If we have this program read from the file "numbers.dat" that we saw in the last section, then the program would execute like this:
Suppose that you have an input file that has information about how many hours have been worked by each employee of a company. For example, it might look like the following:
Most file processing will involve while loops because we won't know in advance how much data the file has in it. We'll choose different tests depending upon the particular file we are processing, but they will almost all be calls on the various "has" methods of the Scanner class. We basically want to say, "while you have more data for me to process, let's keep reading."
In this case we have a series of input lines that each begin with a name. For this program we are assuming that names are simple, with no spaces in the middle. That means we'll be reading them with a call on the next() method. As a result, our overall test involves seeing if there is another name in the input file:
double sum = 0.0; while (input.hasNextDouble()) sum += input.nextDouble();Putting this all together, we end up with the following complete program.
If we put the input above into a file called "hours.dat" and execute the program, we get the following result.
The program in the last section required that names have no spaces in them. This isn't a very practical restriction. It would be more convenient to be able to type anything for a name, including numbers. One way to do that is to put the name on a separate line from the rest of the data. For example, suppose that you want to compute weighted GPAs for a series of students. Suppose, for example, that a student has a 3-unit 3.0, a 4-unit 2.9, a 3-unit 3.2 and a 2-unit 2.5. We can compute an overall GPA that is weighted by the individual units for each course.
So we might have an input file that has its data on pairs of lines. For each pair the name will appear on the first line and the grade data will appear on the second line. For example, we might have an input file that looks like this:
This approach to file processing will work well for any input file that is line oriented. Some lines might represent a single value like the name in the example above. For those lines, we can use a call on nextLine to read the entire line as a String that we can keep track of. Other lines will have multiple data values on the line, in which case we can construct a Scanner object from the String that will allow us to extract the individual data values.
Let's explore how we would process the grades using a Scanner. This is a place to introduce a static method. The code above involves processing the overall file. The task of processing one list of grades is a lower level task that can be split off into its own method. Let's call it processGrades. Obviously it can't do its work without the Scanner object that has the grades, so we'll pass that as a parameter. What exactly needs to be done? The plan was to compute a weighted GPA for each student. So this method needs to read the individual grades and turn that into a single GPA score.
Weighted GPAs involve computing a value known as the "quality points" for each grade. The quality points are defined as the units times the grade. The weighted GPA is calculated by dividing the total quality points by the total units. So we just need to add up the total quality points and add up the total units, then divide. This involves a pair of cumulative sum tasks that we can express in pseudocode as follows:
set total units to 0. set total quality points to 0. while (more grades to process) { read next units and next grade. add next units to total units. add (next units) * (next grade) to total quality points. } set gpa to (total quality points)/(total units).This is fairly simple to translate into Java code by incorporating our Scanner object called "data":
double totalQualityPoints = 0.0; double totalUnits = 0; while (data.hasNextInt()) { int units = data.nextInt(); double grade = data.nextDouble(); totalUnits += units; totalQualityPoints += units * grade; } double gpa = totalQualityPoints/totalUnits;Because our Scanner object data was constructed from a single line of input, we can process just one person's grades with this loop. There is still a potential problem. What if there are no grades? Some students might have dropped all of their classes, for example. There are several ways we might handle that situation, but let's assume that it is appropriate to use a GPA of 0.0 in that case.
Making that correction and putting this into a method, we end up with the following code.
public static double processGrades(Scanner data) { double totalQualityPoints = 0.0; double totalUnits = 0; while (data.hasNextInt()) { int units = data.nextInt(); double grade = data.nextDouble(); totalUnits += units; totalQualityPoints += units * grade; } if (totalUnits == 0) return 0.0; else return totalQualityPoints/totalUnits; }Recall that our high-level code looked like this:
double gpa = processGrades(grades); System.out.println("GPA for " + name + " = " + gpa);This would complete the program, but let's add one more calculation. Let's compute the max and min GPA that we see among these students. We can accomplish this fairly easily with some simple if statements after the println:
if (gpa > max) max = gpa; if (gpa < min) min = gpa;We simply compare the current gpa against what we currently consider the max and min, resetting if the new gpa represents a new max or a new min. But how do we initialize these variables? We have two approaches to choose from. One approach involves initializing the max and the min to the first value in the sequence. We could do that, but it would make our loop much more complicated than it is currently. The second approach involves setting the max to the lowest possible value and setting the min to the highest possible value. This approach isn't always possible because we don't always know how high or low our values might go. But in the case of GPAs, we know that they will always be between 0.0 and 4.0.
Thus, we can initialize the variables as follows:
double max = 0.0; double min = 4.0;It may seem odd to set the max to 0 and the min to 4, but that's because we are intending to have them reset inside the loop. If the first student has a GPA of 3.2, for example, then this will constitute a new max (higher than 0.0) and a new min (lower than 4.0). Of course, it's possible that all students end up with a 4.0, but then our choice of 4.0 for the min is appropriate. Or all students could end up with a 0.0, in which case our choice of a max of 0.0 is appropriate.
Putting this all together we get the following complete program.
Including the clause "throws FileNotFoundException" in the header for main allows our programs to compile, but it's not a very satisfying solution to the underlying problem. To actually handle the potential error, we'd want to use something called a try/catch statement. We will not be exploring all of the details of try/catch, but we will examine how to write some basic try/catch statements that we could use for file processing.
The try/catch statement has the following general syntax.
Notice that the catch part of this statement has a set of parentheses in which you include a type and name. The type should be the type of exception you are trying to catch. The name can be any legal identifier. For example, in the case of our Scanner code, we know that a FileNotFoundException might be thrown. What do we do if the exception occurs? That's a tough question, but for now let's just write an error message.
try { Scanner input = new Scanner(new File("numbers.dat")); } catch (FileNotFoundException e) { System.out.println("File not found"); }This code says to try constructing the Scanner from the file "numbers.dat" but if the file is not found, then print an error message instead. This is the basic idea we want to follow, but there are several issues we must address to make this code work for us. First of all, there is a scope issue. The variable input isn't going to be much use to us if it's trapped inside the try block. So we have to declare the Scanner variable outside the try/catch statement:
Scanner input; try { input = new Scanner(new File("numbers.dat")); } catch (FileNotFoundException e) { System.out.println("File not found"); }We have a bigger problem in that simply printing an error message isn't a good way to recover from this problem. How is the program supposed to proceed with execution if it can't read from the file? It probably can't. So what would be a more appropriate way to recover from the error? That depends a lot on the particular program you are writing, so the answer is likely to vary from one program to the next.
Let's explore how you might handle this when you are prompting the user for a file name in the console window. In that case, we could keep prompting the user until they give us a legal file name. Let's begin by modifying the code above to prompt and read a file name.
Scanner input; System.out.print("What is the name of the input file? "); String name = console.nextLine(); try { input = new Scanner(new File(name)); } catch (FileNotFoundException e) { System.out.println("File not found"); }This code catches the potential exception and prints an error message, but we want to add a loop that executes while the user has not given us a legal file name. We want it to look something like this:
Scanner input; while (user hasn't given a legal name) {We have a classic problem of how to prime this loop so that it enters the first time through. We're trying to construct a Scanner from a file. When we succeed, we'll be giving a value to the variable "input". Can we initialize input to something that would indicate that we aren't yet done? The answer is yes. There is a special keyword in Java called "null" that is used to represent "no object". We can initialize the variable input to null as a way to say, "This variable doesn't yet point to an actual object." The primary advantage of initializing the variable to null is that we can test whether it's null in the while loop.}
Scanner input = null; while (input == null) {We start the variable with the value null, so it enters the while loop the first time through. If the code in the try/catch fails to properly open the file, then the variable will still be null and we'll execute the loop a second time, prompting for another file name and trying to open it. If the code in the try/catch fails again, then we generate yet another error message and go through the loop a third time.}
We can combine this pseudocode with the try/catch code we saw earlier. It seems prudent to modify the error message to make it clear that the user is being given another chance to enter a legal file name.
Scanner input = null; while (input == null) { System.out.print("What is the name of the input file? "); String name = console.nextLine(); try { input = new Scanner(new File(name)); } catch (FileNotFoundException e) { System.out.println("File not found. Please try again."); } }This loop executes repeatedly until the call on "new Scanner" inside the try block succeeds and gives the variable input a non-null value. This code could be included in method main, although we'd have to construct a Scanner for console input to be able to prompt the user for a file name. Below is a variation of the HoursWorked program that prompts for a file name.
Boilerplate Code Code that tends to be the same from one program to another. |
The method getInput is a good example of the kind of boilerplate code that you might use in many different file-processing programs.
This is some text here. and produces as output the same text inside a box, as in:
+--------------+ | This is some | | text here. | +--------------+Your program will have to assume some maximum line length (e.g., 12 above).