CSE 303, Spring 2008, Assignment 3

Due: Tuesday, April 29, 2007 at 9 pm

Assignment goal

The purpose of this assignment is to gain some experience with C programming by implementing a utility program that is similar to grep, but without the ability to process regular expressions. In particular, in this assignment, you will:

This assignment does not include any particularly complicated algorithms, but it will require you to organize your code well and make effective use of the C language and libraries.

Synopsis

Implement in C a Unix utility program gasp. The command

        gasp [options] STRING FILE...

should read the listed files (FILE...) and copy each line from an input file to stdout if it contains STRING. Each output line should be preceded by the name of the file that contains it. The argument STRING may be any sequence of characters (as expanded, of course, by the shell depending on how it is quoted). There are two available options, which may appear in any order if both are present:

(This is basically the same output produced by grep if the STRING argument is treated as literal data and not as a regular expression.)

Technical Requirements

Besides the general specification given above, your program should meet the following requirements to receive full credit.

  1. Be able to handle input lines containing up to 500 characters (including the terminating \0). This number should not be hard-wired in the code, but should be specified with an appropriate #define preprocessor command so it can be changed easily. Your program is allowed to produce incorrect results or fail if presented with input data containing lines longer than this limit.
  2. If you need to create a copy of a string or other variable-size data, you should dynamically allocate an appropriate amount of storage using malloc and return the storage with free when you are done with it. The amount allocated should be based on the actual size needed, not some arbitrary size that is assumed to be "large enough". Exception: you may allocate a character array or two that is large enough for the largest input line (see #1) and reuse it if needed to process each input line, without having to count the characters in each input line and (re-)allocate a new array for each one.
  3. Use standard C library functions where possible; do not reimplement operations available in the basic libraries. For instance, strcpy in <string.h> can be used to copy \0-terminated strings; you should not be writing loops to copy such strings one character at a time.
  4. For the -i option, two characters are considered to be equal ignoring case if they are the same when translated by the tolower(c) function (or, alternatively, toupper(c)) in <ctype.h>.
  5. If an error occurs when opening or reading a file, the program should write an appropriate error message to stderr and continue processing any remaining files on the command line.
  6. Your code must compile and run without errors or warnings when compiled with gcc -Wall on attu.

Code Quality Requirements

As with any program you write, your code should be readable and understandable to anyone who knows C. In particular, for full credit your code must observe the following requirements.

  1. Divide your program into suitable functions, each of which does a single well-defined task. For example, there should almost certainly be a function that processes a single input file, which is called as many times as needed to process the list of files on the command line. Your program most definitely may not consist of one huge main function that does everything. If you wish, you may include all of your functions in a single C source file, since the total size of the program will be fairly small. Be sure to include appropriate function prototypes near the beginning of the file.
  2. Comment sensibly, but not excessively. You should not use comments to repeat the obvious or explain how the C language works - assume that the reader knows C at least as well as you do. Your code should, however, include the following minimum comments: In addition, there should be a comment at the top of the file giving basic identifying information, including your name, the date, and the purpose of the file.
  3. Use appropriate names for variables and functions: nouns or noun phrases suggesting the contents of variables or the results of value-returning functions; verbs or verb phrases for void functions that perform an action without returning a value. Variables of local significance like loop counters or indices should be given simple names like i or n, and do not require further comments.
  4. No global variables. Use parameters (particularly pointers) appropriately. Exception: if you wish, you may have two global variables that indicate whether the -i or -n options are selected or not.
  5. No unnecessary computation. For example, if you need to translate the STRING argument to lower- or upper-case, make a copy of it and translate that copy once; don't do this repeatedly for each input line or even for each input file. Don't use malloc or free excessively - they are expensive. Don't make unnecessary copies of large data structures; use pointers. (Copies of ints, pointers, and similar things are cheap; copies of arrays and large structs are expensive.)

Implementation Hints

  1. There are a lot of things to get right here; the job may seem overwhelming if you try to do it all at once. But if you break it into small tasks, each one of which can be done individually by itself, it should be quite manageable. For instance, figure out how to process a single file before you implement the logic to process all of the files on the command line. Figure out how to open, read, and copy all of a file to stdout before you add the code to search for the STRING argument and selectively print lines containing it. Be able to search for exact matches before adding the -i option to ignore case. Add the -n option sometime when you're not trying to get something else to work.
  2. Every time you add something new to your code (see hint #1), test it. Right Now! It is much easier to find and fix problems if you can isolate the potential bug to a small section of code you just added or changed. printf is your friend here to print values while debugging.
  3. The standard C library contains many functions that you will find useful. In particular, look at the <stdio.h>, <string.h>, <ctype.h> and <stdlib.h> libraries.
  4. An easy way to implement the -i option is to translate both the STRING argument and each input line to lowercase, then search for the translated STRING in the translated input line.
  5. Be sure to check for errors like trying to open a nonexistent file to see if your error handling is working properly.
  6. Once you're done, read the instructions again to see if you overlooked anything.

Extra Credit

A small amount of extra credit will be awarded for adding the following extensions to an already complete, working assignment. No extra credit will be awarded if the basic program is not fully implemented and substantially bug-free.