In chapter 4 we saw how to construct a Scanner object to read input from the Console. Now we will see how to construct Scanner objects to read input from files. The idea is fairly straightforward, but Java does not make it easy to read from input files. This is unfortunate because many interesting problems can be formulated as file processing tasks. Many introductory computer science classes have abandoned file processing altogether or the topic has moved into the second course because it is considered too advanced for novices.
There is nothing intrinsically complex about file processing. The languages C++ and C# provide mechanisms for easily reading and writing files. But Java was not designed for file processing and Sun has not been particularly eager to provide a simple solution. They did, however, introduce the Scanner class as a way to simplify some of the details associated with reading files. The result is that file reading is still awkward in Java, but at least the level of detail is manageable.
Before we can write a file processing program, we have to explore some issues related to Java exceptions. Remember that exceptions are errors that halt the execution of a program. In the case of file processing, we might try to open a file that doesn't exist, which would generate an exception.
Java has a special construct for catching exceptions that is known as the try/catch statement. We will not be exploring all of the details of try/catch, but we will explore how to write some basic try/catch statements that we will need for file processing. Let's first see why we need this. We have been constructing our Scanner objects by passing System.in to the Scanner constructor:
Scanner console = new Scanner(System.in);You are also allowed to construct Scanner objects by passing an object of type File. File objects in turn are constructed by passing a String that represents the file's name. For example, suppose that we have a file called "numbers.dat" that contains a sequence of real numbers. So using this file name we can construct a File object:
new File("numbers.dat")And using this File object we can construct a Scanner object:
new Scanner(new File("numbers.dat"))Putting this all together, we'd say something like the following:
Scanner input = new Scanner(new File("numbers.dat"));But what if Java can't find a file named "numbers.dat"? Then what happens? The answer is that this version of the Scanner constructor throws an exception known as a FileNotFound exception. This particular exception is known as a checked exception.
Checked Exception An exception that must be caught or specifically declared in the header of the method that might generate it. |
Because it is a checked exception, we can't just ignore it. One alternative is to include what are known as "throws" clauses in the header of any method that might generate such an exception. This approach works, but it can be rather tedious. Instead, we will see how to use a try/catch statement to handle the error.
The try/catch statement has the following general syntax.
Notice that the catch part of this statement has a set of parentheses in which you include a type and name. The type should be the type of exception you are trying to catch. The name can be any legal identifier. For example, in the case of our Scanner code, we know that a FileNotFound exception might be thrown. What do we do if the exception occurs? That's a tough question, but for now let's just write an error message.
try { Scanner input = new Scanner(new File("numbers.dat")); } catch (FileNotFoundException e) { System.out.println("File not found"); }This code says to try constructing the Scanner from the file "numbers.dat" but if the file is not found, then print an error message instead. This is the basic idea we want to follow, but there are several issues we must address to make this code work for us. First of all, there is a scope issue. The variable input isn't going to be much use to us if it's trapped inside the try block. So we have to declare the Scanner variable outside the try/catch statement:
Scanner input; try { input = new Scanner(new File("numbers.dat")); } catch (FileNotFoundException e) { System.out.println("File not found"); }We have a bigger problem in that simply printing an error message isn't a good way to recover from this problem. How is the program supposed to proceed with execution if it can't read from the file? It probably can't. So what would be a more appropriate way to recover from the error? That depends a lot on the particular program you are writing, so the answer is likely to vary from one program to the next. Later in the chapter we'll explore putting this into a loop where we keep prompting for a legal file name until the user gives us something that works.
For now we'll look at an alternative that just stops the program from executing. One way to do this is to call a special method System.exit. Some people would write the following code:
Scanner input; try { input = new Scanner(new File("numbers.dat")); } catch (FileNotFoundException e) { System.exit(1); }The call on System.exit stops the program from executing. The value you pass to System.exit (a 1 in the example above) indicates the conditions under which you exited. It is a convention to return the value 0 as a way to say, "We exited normally without errors." By passing a value like 1 you are saying, "We exited abnormally, with error code 1." This solution works, but there is a better solution.
Java has a family of exceptions that are unchecked. In particular, we can throw something called a RuntimeException.
Scanner input; try { input = new Scanner(new File("numbers.dat")); } catch (FileNotFoundException e) { throw new RuntimeException("File not found"); }In effect, we are turning the FileNotFoundException, which is a checked exception, into a RuntimeException, which is not checked. There are several advantages to this. If someone who calls our code wants to write their own try/catch statement for our RuntimeException, they can handle this error. If not, then this will halt the program in the same way that the call on System.exit halts the program, but in this case Java will display a stack trace showing where the exception was thrown and how Java ended up there (a list in backwards order of each method called).
We will use this code snippet as a model to follow in the programs that we write.
We are now ready to look at a complete program that reads an input file. Suppose that we have used a text editor to create a file called "numbers.dat" with the following content.
To process the file the Scanner object keeps track of a current position in the file. You can think of this as a cursor or pointer into the file.
Input cursor A pointer to the current position in an input file. |
When the Scanner object is first constructed, this cursor points to the beginning of the file. But as we perform various "next" operations, this cursor moves forward. After the first call on nextDouble, the cursor will be positioned in the middle of the first line after the token "308.2". After another call on nextDouble the cursor is positioned between the tokens "14.9" and "7.4". And so on.
We refer to this process as consuming input.
Consuming input Moving the input cursor forward past some input. |
Scanner objects are very flexible about when and how you consume input. You can consume part of the input in one section of code, then do some other work, then come back to consuming more of the input in another section of code. You decide exactly how much input to consume at a time through the calls you make on the Scanner object.
The various "has" methods of the Scanner class also consume input. Consider our sample program. The fourth call on "nextDouble" will read in the value 2.8. This leaves the input cursor positioned at the end of the line with 2.8. The program then performs the while loop test again which has a call on "hasNextDouble". But the input file has two blank lines after the line with 2.8. The Scanner object has to consume these blank lines before it encounters the value 3.9 at the beginning of the fifth line of input. Keeping track of exactly where the input cursor is positioned can be tricky. If the data is line-oriented, it is best to read it in a line-oriented manner. We will see how to do that in a later section.
Notice that the Echo1 program does not necessarily consume the entire input file. It has a while loop that continues as long as it sees a double. If it encounters anything other than a double, it will stop reading without processing that part of the input file.
In the previous section we used the file name "numbers.dat". When Java finds you using a simple name like that, it looks in the current directory to find the file. The definition of "current directory" varies depending upon what Java environment you are using. If you are using the TextPad editor, then the current directory is the directory in which your program appears.
You can also use a fully-qualified file name. For example, if you are on a Windows machine and you have stored the file in a directory known as c:\data, we could use a file name like this:
Scanner input; try { input = new Scanner(new File("c:\\data\\numbers.dat")); } catch (FileNotFoundException e) { throw new RuntimeException("File not found"); }Notice that we have to use the escape sequence "\\" to represent a single backslash character. This approach works well when you know exactly where your file is going to be stored on your system.
Another alternative is to ask the user for a file name. In the last chapter we saw a program called FindSum that prompted the user for a series of numbers to add together. Below is a variation that prompts the user for the name of a file of numbers to be added together.
Suppose that you have an input file that has information about how many hours have been worked by each employee of a company. For example, it might look like the following:
We have already looked in detail at how to open a file and the code for doing so tends to be fairly standard. We refer to this as "boilerplate" code.
Boilerplate Code Code that tends to be the same from one program to another. |
The more interesting code involves processing the file. Most file processing will involve while loops because we won't know in advance how much data the file has in it. We'll choose different tests depending upon the particular file we are processing, but they will almost all be calls on the various "has" methods of the Scanner class. We basically want to say, "while you have more data for me to process, let's keep reading."
In this case we have a series of input lines that each begin with a name. For this program we are assuming that names are simple, with no spaces in the middle. That means we'll be reading them with a call on the next() method. As a result, our overall test involves seeing if there is another name in the input file:
double sum = 0.0; while (input.hasNextDouble()) sum += input.nextDouble();Putting this all together, we end up with the following complete program.
The program in the last section required that names have no spaces in them. This isn't a very practical restriction. It would be more convenient to be able to type anything for a name, including numbers. One way to do that is to put the name on a separate line from the rest of the data. For example, suppose that you want to compute weighted GPAs for a series of students. Suppose, for example, that a student has a 3-unit 3.0, a 4-unit 2.9, a 3-unit 3.2 and a 2-unit 2.5. We can compute an overall GPA that is weighted by the individual units for each course.
So we might have an input file that has its data on pairs of lines. For each pair the name will appear on the first line and the grade data will appear on the second line. For example, we might have an input file that looks like this:
This approach to file processing will work well for any input file that is line oriented. Some lines might represent a single value like the name in the example above. For those lines, we can use a call on nextLine to read the entire line as a String that we can keep track of. Other lines will have multiple data values on the line, in which case we can construct a Scanner object from the String that will allow us to extract the individual data values.
Let's explore how we would process the grades using a Scanner. This is a place place to introduce a static method. The code above involves processing the overall file. The task of processing one list of grades is a lower level task that can be split off into its own method. Let's call it processGrades. Obviously it can't do its work without the Scanner object that has the grades, so we'll pass that as a parameter. What exactly needs to be done? The plan was to compute a weighted GPA for each student. So this method needs to read the individual grades and turn that into a single GPA score.
Weighted GPAs involve computing a value known as the "quality points" for each grade. The quality points are defined as the units time the grade. The weighted GPAs is calculated by dividing the total quality points by the total units. So we just need to add up the total quality points and add up the total units, then divide. This involves a pair of cumulative sum tasks that we can express in pseudocode as follows:
set total units to 0. set total quality points to 0. while (more grades to process) { read next units and next grade. add next units to total units. add (next units) * (next grade) to total quality points. } set gpa to (total quality points)/(total units).This is fairly simple to translate into Java code by incorporating our Scanner object called "data":
double totalQualityPoints = 0.0; double totalUnits = 0; while (data.hasNextInt()) { int units = data.nextInt(); double grade = data.nextDouble(); totalUnits += units; totalQualityPoints += units * grade; } double gpa = totalQualityPoints/totalUnitsBecause our Scanner object data was constructed from a single line of input, we can process just one person's grades with this loop. There is still a potential problem. What if there are no grades? Some students might have dropped all of their classes, for example. There are several ways we might handle that situation, but let's assume that it is appropriate to use a GPA of 0.0 in that case.
Making that correction and putting this into a method, we end up with the following code.
public static double processGrades(Scanner data) { double totalQualityPoints = 0.0; double totalUnits = 0; while (data.hasNextInt()) { int units = data.nextInt(); double grade = data.nextDouble(); totalUnits += units; totalQualityPoints += units * grade; } if (totalUnits == 0) return 0.0; else return totalQualityPoints/totalUnits; }Recall that our high-level code looked like this:
double gpa = processGrades(grades); System.out.println("GPA for " + name + " = " + gpa);This would complete the program, but let's add one more calculation. Let's compute the max and min GPA that we see among these students. We can accomplish this fairly easily with some simple if statements after the println:
if (gpa > max) max = gpa; if (gpa < min) min = gpa;We simply compare the current gpa against what we currently consider the max and min, resetting if the new gpa represents a new max or a new min. But how do we initialize these variables? We have two approaches to choose from. One approach involves initializing the max and the min to the first value in the sequence. We could do that, but it would make our loop much more complicated than it is currently. The second approach involves setting the max to the lowest possible value and setting the min to the highest possible value. This approach isn't always possible because we don't always know how high or low our values might go. But in the case of GPAs, we know that they will always be between 0.0 and 4.0.
Thus, we can initialize the variables as follows:
double max = 0.0; double min = 4.0;It may seem odd to set the max to 0 and the min to 4, but that's because we are intending to have them reset inside the loop. If the first student has a GPA of 3.2, for example, then this will constitute a new max (higher than 0.0) and a new min (lower than 4.0). Of course, it's possible that all students end up with a 4.0, but then our choice of 4.0 for the min is appropriate. Or all students could end up with a 0.0, in which case our choice of a max of 0.0 is appropriate.
Putting this all together we get the following complete program.
This is some text here. and produces as output the same text inside a box, as in:
+--------------+ | This is some | | text here. | +--------------+Your program will have to assume some maximum line length (e.g., 12 above).