The UNIX I/O interface is rich, but can at times be daunting. There is often more than one way to do the same thing, but sometimes only one way to do something properly.
This note discusses UNIX I/O channels, including file descriptors,
FILE* streams, and select. It should be read in the context of the
UNIX man pages. For example, to learn more about select, you would
type man select
from a terminal window. You can also use
google to find man pages on the web, but be warned that man pages can
vary from version to version, so unless you are sure that you are
looking at the right version, be careful. In contrast, the shell
command man
should always return the appropriate man
page.
UNIX I/O is built on the concept of channel, which is a named source or sink for data. Some channels are input channels (stdin or a file opened for reading), some are output channels (stdout/stderr or a file opened for writing), and some are both input and output channels (like a socket).
The operating system provides a set of system calls for manipulating
channels. These channels are named by file descriptors which
are small integers that correspond to an index in a table in the
operating system. When you access a channel by way of a file
descriptor, for example, using read
or write
, you are
making a direct request of the operating system to read or write
some number of bytes. The actual number of bytes produced or
consumed is returned as a result of the call. The typical OS
channel calls include open and close, which are used to create file
channels. Others include socket, connect, accept, which are
specific to creating networking channels. Certain routines, such as
read and write can be used to move data across channels regardless
of their type. Others, such as sendmsg and recvmsg, are specific to
networking channels.
On top of the operating system system call interface, the standard C library provides an additional set of services that provide a richer, but sometimes easier to use interface to I/O. These routines generally (but not always) start with an f, as in fopen, fread, fwrite, and fclose. Rather than returning a file descriptor (small integer), they return an opaque reference to a data structure maintained in the C library (FILE *).
printf
writes
its output to a special channel called standard out (stdout
)
that is by default buffered. Given this buffering, consider the
behavior of the following program:
main() { printf("hello world1"); sleep(10); printf(".1\n"); printf("hello world2\n"); sleep(10); printf(".2\n"); }When you run this, nothing happens for the first 10 seconds, then the first hello world is printed, followed immediately by the second hello world followed another 10 seconds later by .2 and a newline. This behavior reflects the fact that stdout is buffered, and that the buffering is flushed at newline.
Alternatively, consider this program:
main() { fprintf(stderr, "hello world1"); sleep(10); printf(".1\n"); fprintf(stderr, "hello world2\n"); sleep(10); printf(".2\n"); }(note that fprintf() takes a named FILE* stream, and that printf is simply a wrapper around fprintf(stdout...).) Here, the standard error channel (stderr) is not buffered, and as a result, the chararacters are displayed to the output channel as soon as they are produced by the program.
Buffered output can sometimes be very confusing when trying to
associate a sequence program events with a sequence of characters
generated by the program. The output may be produced long after the
the event described by the output occurs. To mitigate this, use
stderr
to display debugging information, or use the setbuf call to
turn off buffering.
You can get a buffered channel either by creating one directly (fopen, for example), or by turning an OS descriptor into a buffered channel (fdopen). Both produce an opaque descriptor (FILE *) which corresponds to an underlying OS file descriptor (int fd). You can also map from a FILE* to the underlying descriptor using fileno().
For any given channel, it's best to use that channel in a consistent
way. As a general rule, for 461, I recommend that you use the C
library interfaces for stdin
, stdout
and stderr
, but use the underlying OS interfaces for
sockets. See the solution for HW1 for example. The reason being
that the C library interfaces are really designed for terminal and
file I/O and do not provide access to many socket-oriented interfaces.
For files, it's your call, although I often prefer to use the underlying
read/write routines and do the buffering myself.
Sometimes though, a program needs to adapt its behavior to an otherwise blocking I/O request. For example, when reading from a socket which is expected to produce data within a second but which doesn't, the program may need to take some alternative action.
The basic way to avoid blocking indefinitely is to first check if the
channel is ready, and then to access it if it is. Under UNIX,
the select
system call is used to determine if one or more
given channels are ready for an I/O operation. The basic model is to
give select a list of file descriptors on which you'd like to
read and/or write and an optional timeout. The call will block for up to the
time specified in the timeout, returning which, if any, of the
descriptors are ready for I/O. If none are ready within the specified
timeout and no error has otherwise occurred, select
returns, indicating that there are no ready descriptors.
Here's some sample code that shows how a
program can use select to block on an input channel (in this
case stdin
) for a little while, ultimately giving up
after a couple of little whiles. In addition to showing the use of
select, it also shows the proper way to check return codes from read
(< 0 means error, 0 means EOF, > 0 means number of characters
returned).