File I/O in C++ and C

Overview
C++ File I/O
C File I/O
Organizing Input to the Program
Creating Data Files
Appropriate Problem Size

This page provides an overview of how to do file I/O in C and C++.  At the end there are additional hints for the Assignment 1 program: on organizing the inputs to your program, creating data files from which to read your array values, and collecting timings. By all means, let the staff know if you find any mistakes in the code examples.

The point of Assignment 1 is to get you warmed up (not to stress you out and make you use all your late days). In some ways, this will be the hardest program you have to do this quarter, because it is asking you to exercise lots of basic skills.  Future programs will re-use these same skills to implement and/or use ADT's. Note, that we don't necessarily expect the programs to get bigger and bigger as the quarter goes on.

Hopefully this page will answer all your burning questions about file I/O.  However if you have questions please e-mail the staff (cse373-staff@cs.washington.edu).



Overview

The most important thing to know about reading and writing files in C++ is that it is very similar to reading from the console (using cin and cout in C++  and scanf/printf in C ). The difference is that rather than having the user sit there and type characters, the program just reads the characters from the file as though they were being typed in (or rather than printing things to the screen, it prints them to a file). If you understand this concept, the rest is just syntax.

The steps to reading or writing a file are:

1) to open the file
2) to read or write the data from/to the file
3) to close the file

Opening the file requires you to supply a filename (this is just an array of characters like any other string in C/C++) and indicate whether you want to read it or write it.

Reading or writing the file is similar to reading and writing the console

Closing the file just tells the operating system that you're done with it. (Generally, forgetting to close a file is no big deal -- when your program exits, it'll usually close the file automatically. It's good style to close the file after you use it, though).


 C++ File I/O

To use file I/O in C++ you typically include the iostream.h and fstream.h header files:

#include <iostream.h>
#include <fstream.h>

In C ++ you declare variables of the ofstream and ifstream classes to get output and input file streams, respectively. Output streams are used to write to files just as you would cout. Input streams are used to read from files just as you would cin.

Opening the file is done automatically by the stream's constructor. The constructor takes a filename and an indication of whether the stream is an input or output stream (ios::in for input, ios::out for output). So for example to declare an input stream named infile that reads from the file "mydata.txt", you would use the following declaration:

ifstream infile("mydata.txt",ios::in);
 PC Note:
You may want to open your data file like:  ifstream infile(filename,ios::in|ios::nocreate);
Apparently if you just open a file ios::in and it dosn't exist, C++ will create the file (empty) for you. The ios::nocreate says "if the file doesn't exist, don't create it for me"

Then to read from the file, you use a variation of the operations that you'd normally use with cin. For example, to read an integer from mydata.txt, you would use:

infile >> myint;

Other than the fact that it's reading from a file, the use of >> is identical to its use with cin.

We believe that the destructor for ifstream/ofstream will automatically close the file for you, but if you want to do it by hand, you can do so by calling the close method:

infile.close();

To write to a file, you do the exact same thing, except you declare the variable to be of type ofstream, and replace ios::in with ios::out. Then you'd use the << operator just as you would with cout.

Potential Error Conditions:

Whenever you open a file, it's a good idea to check and make sure that there weren't any problems in doing so. Problems can include trying to open a file for reading when it doesn't exist. Or trying to open a file for writing in a directory that you don't have write permissions in. If your C++ stream variable is NULL after creating it, it implies that there was a problem.
Reading or writing a file that had problems when opening it can lead to error conditions or your program crashing.
 

Runtime Hint:
 If (on the PC's) you're running your program using the "!" icon, we believe your input file should be in the same directory as your source code. If you run your program directly (by CD-ing into the directory that the .exe file is in and running it), it should be in the same directory as your .exe file.
 


C FILE I/O

To use file I/O in C, you generally have stdio.h included at the top of your file (most C programs include this anyway):

#include <stdio.h>

In C, you declare a file pointer variable for any file you want to use. For example, the following line declares a file pointer variable called "infile" (our standard name for the file to be read from. You can call yours anything you like).

FILE *infile;

To open the file, you call the function "fopen()", passing it a filename and "r" for reading or "w" for writing. For example, to open a file called mydata.txt, you would use:

infile = fopen("mydata.txt","r");

Then to read from the file, you use a variation of scanf() called fscanf() which takes a file as its first parameter. For example, to  read an integer from mydata.txt, you would use:

fscanf(infile,"%d",&myint);

Other than the fact that it's reading from a file, fscanf() is identical to fprintf().

Once you've read all the data you want to, close the file using the  fclose() function:

fclose(infile);

To write to a file, you do the exact same thing, except you use "w" rather than "r" in the fopen() line and fprintf() rather than fscanf() to write the data out.

Potential Error Conditions:

Whenever you open a file, it's a good idea to check and make sure that there weren't any problems in doing so. Problems can include trying to open a file for reading when it doesn't exist. Or trying to open a file for writing in a directory that you don't have write permissions in. If your C file pointer is NULL after calling fopen() it indicates that there was a problem.
Reading or writing a file that had problems when opening it can lead to error conditions or your program crashing.



Organizing the Input For Your Program

We don't want  to dictate how you input the parameters for your program because it doesn't matter to us and because we'd like you to use whatever's most convenient for you.  However, we *do* want the parameters to be input in some manner for two reasons: (1) it's a reasonable interface that a "real" search utility program might have; (2) it allows us to run your program for a variety of problem sizes, data sets, search algorithms, and target values without recompiling.

Here's one scheme that uses the console to get its input, though you're more than welcome to do something else:

1) Print to the console: "enter a problem size:"
2) Read an integer from the console for n
3) Print to the console: "enter a filename:"
4) Read a string from the console for the filename
5) Print to the console: "choose one of these search algorithms:"
6) Print a menu of the algorithms you implemented, having each represented by a letter or number
7) Read a letter or number indicating the algorithm
8) Print to the console: "enter a value to search for"
9) Read an integer from the console as your target value

Your program will then go on to allocate the array, read data values for it from the specified file, start the timer, perform the indicated search, check the timer, and print whatever output you think is appropriate.

Advanced tip: Note that if you use the above scheme and know how to redirect console I/O for your program, you can create configuration files that contain the four input parameters for a specific trial run and pipe them into your program. This isn't necessary, but you may find it to be an easy way to set up a bunch of runs without typing in the parameters each time.



Creating Data Files

The data files that contain integers are simply text files that contain values for your array in sorted order. Thus, one way to create them would be to open your favorite editor and start typing numbers in, saving the file as a text file. However, it's likely that to get times that are large enough for you to see asymptotic trends, you'll have to run on problem sizes that are larger than you're willing to type in (1000 values? 10,000 values? 1,000,000 values?)

So... an easy way to create these data files is just to write a small program that creates them (free advice: always make computers do work that you're not willing to do yourself whenever you can). Writing this small program from scratch would be a great chance to practice your file I/O skills. Or if you want to, you can look at the simple ones below. Note, that we don't check to see whether there were problems opening the files.  Note also that the consecutive numbers 1 through 10 may not be the best input set to cause the worst case performance for some of the search algorithms.

C++ version:

#include <iostream.h>
#include <fstream.h>

void main() {
int i;
ofstream outfile("out.dat",ios::out);

for (i=0;i<10;i++) {
outfile << i+1 << "\n";
}
}
 

C version:

#include <stdio.h>

void main() {
int i;
FILE *outfile;

outfile = fopen("out.dat","w");
for (i=0;i<10;i++) {
fprintf(outfile,"%d\n",i+1);
}
fclose(outfile);
}
 

One more hint on the data file: You *could* create different data files for each problem size that you're planning on running. However, another idea would be to create a single data file that's as big as the biggest problem size you intend to use and then just read the first "n" elements out of it rather than the whole thing.


Appropriate Problem Size

This totally depends on the speed of your processor, size of your memory, etc. etc. Here's the way you should think of this: pretend that you have a friend who doesn't believe all this "nonsense" in lecture about asymptotic speedup and primary and secondary performance differences.  You want to prove to your friend that real codes do exhibit these behaviors, so you code up some different search algorithms, and time how long they take for different problem sizes. Your goal is to find a range of problem sizes that illustrate the expected trends.

Note that (as in any scientific experiment), you should start by sampling a wide range of problem sizes to find out which ones take a reasonable amount of time to run. Modern computers are fast enough that "a reasonable amount of time" might be something like milliseconds, seconds, or a minute. You're probably not going to need to run any longer than that--the timers that we're using are accurate to milliseconds, at least. Note that the longer your program takes to run, the more accurate your timings will tend to be, since fluctuations in runtime will be smaller compared to your total runtime.

So for example, you might start by trying sizes like:

10, 100, 1000, 10000, 100000, 1000000, etc.

until you figure out which ones were giving times in the range that you want.  Next,  do some more timings in that range -- not tons of them, just enough to show the trends you want (5 is probably a minimal number to shoot for -- a dozen would probably make your curves even more convincing).