
CSE 374 - Homework 4
Due: Thursday, Feb. 7, at 11 pm.
Assignment goal
The purpose of this assignment is to gain some experience with C programming
  by implementing a utility program that is similar to grep, but
  without the ability to process regular expressions (i.e., a lot like a simple
  version of fgrep). In particular, in this assignment,
  you will:
- Gain experience creating and running C programs,
- Become familiar with some of the basic C libraries, including those for file and string handling,
- Get a better understanding of how Unix utilities are implemented, and
- Gain some basic experience with the unix debugger, gdb.
This assignment does not include any particularly complicated logic or algorithms, but it will require you to organize your code well and make effective use of the C language and libraries. You will also have to explore the details of the C string and file I/O libraries to discover how to do various operations that should already be familiar from your programming experience in other languages although the details in C will be different, or course. It is meant as an orientation to the Unix/Linux C programming environment. You should do this assignment by yourself.
Synopsis
Implement in C a Unix utility program gasp.  The command
gasp [options] STRING FILE...
should read the listed files (FILE...) and copy each line from
  the input  to stdout  if
  it contains STRING. Each output line should be preceded by the
  name of the file that contains it. The argument STRING may be
  any sequence of characters (as expanded, of course, by the shell depending
  on how it is quoted). There are two available options,
which may appear in any order if  both are present:
- -iIgnore case when searching for lines that contain- STRING. If the- -ioption is used, the strings "- this", "- This", "- THIS", and "- thiS" all match; if- -iis not used, they are all considered different.
- -nNumber lines in output. Each line copied to- stdoutshould include the line number in the file where it was found in addition to the file name. The lines in each file are numbered from 1.
Your program does not need to be able to handle combinations of option letters
  written as a single multi-character option 
like -in or -ni. But it does need to be able to handle
any combination of either or both (or neither) option when they appear separately
on the command line.
(This is basically the same output produced by grep if
  the STRING  argument is treated as literal data and not as a regular
  expression. You should pretty much match the output format of grep,
  although your program's output does not  need to be byte-for-byte
  identical. One difference, though, is that a file name should be printed on
  every output line, even if only one file is specified on the gasp command
  line.)
Technical Requirements
Besides the general specification given above, your program should meet the following requirements to receive full credit.
- Be able to handle input lines containing up to 500 characters (including
    the terminating \0). This number should be specified with an appropriate#definepreprocessor command so it can be changed easily. Your program is allowed to produce incorrect results or fail if presented with input data containing lines longer than this limit.
- You may assume that the string pattern on the command line is no longer
    than 100 characters (including the terminating \0). This length should also be specified by an appropriate#define.
- Use standard C library functions where possible; do not reimplement operations
    available in the basic libraries. For instance, strncpyin<string.h>can be used to copy\0-terminated strings; you should not be writing loops to copy such strings one character at a time.
 Exception: there is agetoptfunction in the Linux library that provides simplified handling of command line options. For this assignment, only, you may not use this function. You should implement the processing of command line options yourself, of course using the string library functions when these are helpful.
- You should use "safe" versions of file and string handling routines
    such as fgetsandstrncpyinstead of routines likegetsandstrcpy. The safe functions allow specification of maximum buffer or array lengths and will not overrun adjacent memory if used properly.
- For the -ioption, two characters are considered to be equal ignoring case if they are the same when translated by thetolower(c)function (or, alternatively,toupper(c)) in<ctype.h>.
- If an error occurs when opening or reading a  file, the program should
    write an appropriate error message to stderrand continue processing any remaining files on the command line.
- Your mainfunction must be in a source file namedgasp.c. We suggest you put all the functions that make up the program in this file (to keep things simple), but you may have additional source files if you wish.
- Your code must compile and run  without errors or warnings
    when compiled and executed on klaatuor the CSE Fedora 17 Linux VM usinggccwith the-Walloption. Since this assignment should not need to use any unusual or system-dependent code you can almost certainly develop and test your code on any recent Linux system or other system that supports a standard C compiler. However, we will test your submissions using the CSE systems, so you should verify your program there before the submission deadline.
Code Quality Requirements
As with any program you write, your code should be readable and understandable to anyone who knows C. In particular, for full credit your code must observe the following requirements.
- Divide your program into suitable functions, each of which does a single
    well-defined task. For example, there should almost certainly be a function
    that processes a single input file, which is called as many times as needed
    to process  files listed on the command line (and which, in turn, might
    call other functions to perform identifiable subtasks). Your program most
    definitely may not consist of one huge mainfunction that does everything. However it should not contain tiny functions that only fragment the code instead of breaking it into coherent pieces. If you wish, you may include all of your functions in a single C source file, since the total size of this program will be fairly small. Be sure to include appropriate function prototypes near the beginning of the file so the actual function definitions can appear in whatever order is most appropriate for presenting the code in the remainder of the file.
- Comment sensibly, but not excessively. You should not use comments to repeat
    the obvious or explain how the C language works -- assume that the reader
    knows C at least as well as you. Your code should, however, include the
    following minimum comments:
	- Every function must include a heading comment that explains what the function does (not how it does it), including the significance of all parameters and any effects on or use of global variables (to the extent that there are any). It must not be necessary to read the function code to determine how to call it or what happens when it is called. (But these comments do not need to be nearly as verbose as, for example JavaDoc comments.)
- Every significant variable must include a comment that is sufficient to understand what information is stored in the variable and how it is stored. It must not be necessary to read code that initializes or uses a variable to understand this. It may be helpful to describe several related variables in a single comment that explains their contents and relationship.
- In addition, there should be a comment at the top of the file giving basic identifying information, including your name, the date, and the purpose of the file.
 
- Use appropriate names for variables and functions: nouns or noun phrases
    suggesting the contents of variables or the results of value-returning functions;
    verbs or verb phrases for voidfunctions that perform an action without returning a value. Variables of local significance like loop counters, indices, or pointers should be given simple names likei,k,n, orp, and normally do not require further comments.
- No global variables. Use parameters (particularly pointers) appropriately.
    Exception: if you wish, you may have two global variables that indicate whether
    the -ior-noptions are selected or not.
- No unnecessary computation. For example, if you need to translate the STRINGargument to lower- or upper-case, translate it (or a copy of it) once; don't do this repeatedly for each input line. Don't make unnecessary copies of large data structures; use pointers. (Copies ofints, pointers, and similar things are cheap; copies of arrays and large structs are expensive.) Don't read the input by calling a library function to read each individual character. Read the input a line at a time (it costs just about the same to call an I/O function to read an entire line into a char array as it does to read a single character). But don't overdo it. Your code should be simple and clear, not complex containing lots of micro-optimizations that don't matter.
Implementation Hints
- There are a lot of things to get right here; the job may seem overwhelming
    if you try to do  all of it at once. But if you break it into small tasks,
    each one of which can be done individually by itself, it should be quite
    manageable.
    For instance, figure out how to process a single file before you implement
    the logic to process all of the files on the command line. Figure out how
    to open, read, and copy all of a file to stdoutbefore you add the code to search for theSTRINGargument and selectively print lines containing it. Be able to search for exact matches before adding the-ioption. Add the-noption separately when you're not trying to do something else.
- Every time you add something new to your code (see hint #1), test it. Right
    Now! Immediately! BEFORE YOU DO ANYTHING ELSE!!! (Did we
    mention that you should test new changes right away?) It is much easier to
    find and fix problems if you can isolate the potential bug
    to a small section of code you just added
    or changed. printfis your friend here to print values while debugging. The debugger is also your friend -- learn how to use it.
- The standard C library contains many functions that you will find useful.
    In particular, look at the <stdio.h>,<string.h>,<ctype.h>and<stdlib.h>libraries.
- An easy way to implement the -ioption is to translate both theSTRINGargument and each input line to lowercase, then search for the translatedSTRINGin the translated input line. (Translating a string to lower-case sure sounds like a well-defined operation that should be in a separate function!) However, if the string is found, the original line from the input file should be printed, not the translated one.
- Be sure to test for errors like trying to open or read a nonexistent file to see if your error handling is working properly.
- Once you're done, read the instructions again to see if you overlooked anything.
- See #6.
Debugging
Learning how to use a debugger effectively can be a big help in getting
  your programs to work (although it is no substitute for thinking and careful
  programming). To encourage you to gain a basic familiarity with gdb,
  you are required to do the following.
- Be sure your program is compiled with the -goption, to include debugging information in the executable file.
- Run the scriptprogram to capture the following console session in a text file nameddebug.txt.
- Start gdbwith your executable program as an argument.
- Set two breakpoints: one at the beginning of main, and the other at the point in your program right after thefopenfunction call that opens the input files.
- Start your program with the gdbruncommand, providing a search string and at least one file name as arguments.
- When the program hits the breakpoint at the beginning of main, use the appropriategdbcommand to display the contents of the variable containing the search string (the first argument to the program following any options that are present). When you've done that, continue execution of the program.
- When the program hits the second breakpoint immediately after opening an
    input file, use the appropriate gdbcommands to display (i) a backtrace showing the functions active at the time the breakpoint was reached, (ii) source code showing the line containing the breakpoint and a couple of surrounding lines, (iii) the name of the file that was supplied to thefopenfunction (this string should be in a variable somewhere), and (iv) the pointer value returned byfopen(presumably just a hex address, although it might beNULLif the file can't be opened).
- Continue execution of the program until it stops, then quit gdband exit from thescriptprogram's shell. Save thedebug.txtoutput file fromscriptto turn in later.
Extra Credit
A small amount of extra credit will be awarded for adding the following extensions to an already complete, working assignment. No extra credit will be awarded if the basic program is not fully implemented and substantially bug-free.
If you do any extra credit parts, you should turn in both your original program
  without the extra credit and your extended program. The extra credit version
   should be in a separate file whose name ends with "-extra", like gasp-extra.c.
- Allow  input file names to be omitted from the command line. In this
    case, when no filenames are given the program should read data from stdin. This should be fairly easy to add if your code is organized as a well-designed collection of functions.
- Add an option -wto search for words separated by whitespace. If-wis specified, theSTRINGon thegaspcommand line should only be found if it is surrounded by whitespace (blanks, tabs, newlines, etc.) in the input file(s) and not as part of some other string. For example, theSTRING foowould matchfoobut notfood. A charactercin the input should be treated as whitespace if the library function callisspace(c)returns true. If you implement this option, the program should find the word if it appears at the beginning or end of a line, as well as in the middle. You may also use an additional global variable to record the state of this option if you wish.
- Enhance your program so it can deal safely with input
    files containing lines of an arbitrary length. Lines longer than the program
    is prepared to
    handle may be truncated by discarding excess input characters, but doing
    so should not cause the -noption to count line numbers incorrectly. However you decide to implement this, long input lines should not cause your program to fail or crash.
- A bit harder than above: Enhance your program so it correctly deals with
    input lines of any length, copying them to the output if they contain the
    STRINGparameter anywhere in the line. If you read arbitrarily long input lines in chunks that have only part of an input line, be sure you can correctly handle situations where theSTRINGvalue spans two parts of the line instead of falling entirely inside one chunk.
Turn-in Instructions
Use the regular online dropbox
      to turn in the source code to your program (file gasp.c and any other source files if you have them), and a copy of the script (console)
      output file debug.txt from the Debugging exercise above. If
      you did any extra credit, you should turn in a separate source file with
      your additions. You should also turn in a plain text file named README that
      describes any extra credit that you added to your program, or contains
       a brief note that you did not implement any of the extra credit parts.
      Be
      sure
      your name is included
in the source code and README files.
Computer Science & Engineering University of Washington Box 352350 Seattle, WA 98195-2350 (206) 543-1695 voice, (206) 543-2969 FAX
