Homework 4

Due: Wednesday, May 5, 2021, at 11:59pm

Assignment Goal

The purpose of this assignment is to gain some experience with C programming. In particular, in this assignment, you will:

This project should be done independently. If you work with a classmate make sure you create your own implementation.

The material that we have learned in lecture is not enough to complete this assignment. It is expected that you will investigate the resources and libraries mentioned in this document to learn about how to use them.

Synopsis

This project is meant to be fun and allow students to explore the material. As such, you may choose to implement a project of your choice that meets the following criteria:

Choosing a project

You may choose any project you wish that fulfills the criteria listed above. These criteria lead to something the processes a text file in some way - search, sorting, or replacing text. One idea is to consider coding your own version of one of the Bash commands we have learned.

These are not the only projects available. Slightly more complicated ideas include frequency counting (for letters and words), encryption, decryption, or html generation. If you have an idea that you are unsure about, post it on Ed and the staff will take a look at it.

Your first step should be to specify your goal. You will want to focus on a small goal to start, and if you envision different options you can record them as 'stretch goals'. First create the basic features, and then enhance if you have time. When describing your essential feature be sure to specify what input is needed, and what out put is produced - your main function will use these as arguments and output.
Protip: Implement a main that gets the arguments and prints a place holder for the output (either to screen or file), and you are off to a good start.

You will hand in (at least) three files for this project:

Debugging

Learning how to use a debugger effectively can be a big help in getting your programs to work (although it is no substitute for thinking and careful programming). To encourage you to gain a basic familiarity with gdb, you are required to do the following:

  1. Be sure your program is compiled with the -g option, to include debugging information in the executable file.
  2. Run the script program to capture the following console session in a text file named debug.script.
  3. Run valgrind with leak-check=full on your code.
  4. Start gdb with your executable program as an argument.
  5. Set two breakpoints: one at the beginning of main, and the other at the point in your program right after the fopen function call that opens the input files.
  6. Start your program with the gdb run command, providing appropriate arguments.
  7. When the program hits the breakpoint at the beginning of main, use the appropriate gdb command to display the contents of variable that holds the input filename. When you've done that, continue execution of the program.
  8. When the program hits the second breakpoint immediately after opening an input file, use the appropriate gdb commands to display a backtrace showing the functions active at the time the breakpoint was reached.
  9. Continue execution of the program until it stops, then quit gdb and exit from the script program's shell. Save the debug.script output file from script to turn in later.

You should use gdb's basic command-line interface for this part of the assignment, even if you use the -tui option for your routine debugging. The full-screen -tui interface generates a great deal of extra output in the script file, which makes it almost impossible to read.

Technical Requirements

You should pay attention to the following guidelines for meeting performance expectations.

  1. Use standard C library functions where possible; do not reimplement operations available in the basic libraries. For instance, strncpy in <string.h> can be used to copy \0-terminated strings; you should not be writing loops to copy such strings one character at a time.
  2. You should use "safe" versions of file and string handling routines such as fgets and strncpy instead of routines like gets and strcpy. The safe functions allow specification of maximum buffer or array lengths and will not overrun adjacent memory if used properly.
  3. If an error occurs when opening or reading a file, the program should write an appropriate error message to stderr and continue processing any remaining files on the command line.
  4. Since this program is likely relatively short, all of the functions should be in this single file. You should arrange your code so that related functions are grouped together in a logical order in the file.
  5. Your code must compile and run without errors or warnings when compiled and executed on klaatu or the current CSE Linux VM using gcc with the -Wall and -std=c11 options. Since this assignment should not need to use any unusual or system-dependent code you can probably develop and test your code on any recent Linux system or other system that supports a standard C compiler. However, we will test your submissions using the CSE systems, so you should verify your program there before the submission deadline.
  6. Your program must be robust. It should not crash (segfault or otherwise) or produce meaningless or incorrect output regardless of the contents of command line parameters or input files (except, of course, you are not required to deal with files or string parameters with lines longer than the limits given above). If the program terminates prematurely because of some error, it should print an appropriate error message to stderr and exit with an exit code of EXIT_FAILURE (defined in <stdlib.h> -- see the description of the exit() function).
  7. If the program terminates normally after attempting to open and process all of the files listed on the command line, it should terminate with an exit code of EXIT_SUCCESS (see <stdlib.h>). This is normally done by returning this value as the int result of the main function.

Code Quality Requirements

As with any program you write, your code should be readable and understandable to anyone who knows C. In particular, for full credit your code must observe the following requirements.

  1. Divide your program into suitable functions, each of which does a single well-defined task. For example, there should almost certainly be a function that processes a single input file, which is called as many times as needed to process each of the files listed on the command line (and which, in turn, might call other functions to perform identifiable subtasks). Your program most definitely may not consist of one huge main function that does everything. However it should not contain tiny functions that only contain isolated statements or code fragments instead of dividing the program into coherent pieces. If you wish, you may include all of your functions in a single C source file, since the total size of this program will be fairly small. Be sure to include appropriate function prototype declarations near the beginning of the file so the actual function definitions can appear in whatever order is most appropriate for presenting the code in the remainder of the file in a logical sequence and so that related functions are grouped together.
  2. Comment sensibly, but not excessively. You should not use comments to repeat the obvious or explain how the C language works -- assume that the reader knows C at least as well as you. Your code should, however, include the following minimum comments:
    • Every function must include a heading comment that explains what the function does (not how it does it), including the significance of all parameters and any effects on or use of global variables (to the extent that there are any). It must not be necessary to read the function code to determine how to call it or what happens when it is called. (But these comments do not need to be nearly as verbose as, for example Java's JavaDoc comments.)
    • Every significant variable must include a comment that is sufficient to understand what information is stored in the variable and how it is stored. It must not be necessary to read code that initializes or uses a variable to understand this. It may be helpful to describe several related variables in a single comment that explains their contents and relationship.
    • Any code based on someone else's work must include a comment for citation. (If the code is longer than a couple of lines you should read for understanding and then create your own version. Citations are still appropriate.)
    • In addition, there should be a comment at the top of the file giving basic identifying information, including your name, the date, and the name and purpose of the file.
  3. Use appropriate names for variables and functions: nouns or noun phrases suggesting the contents of variables or the results of value-returning functions; verbs or verb phrases for void functions that perform an action without returning a value. Variables of local significance like loop counters, indices, or pointers should be given simple names like i, k, n, or p, and often do not require further comments.
  4. No global variables. Use parameters (particularly pointers) appropriately.
  5. No unnecessary computation. Don't make unnecessary copies of large data structures; use pointers. (Copies of ints, pointers, and similar things are cheap; copies of arrays and large structs are expensive.) Don't read the input by calling a library function to read each individual character. Read the input a line at a time (it costs just about the same to call an I/O function to read an entire line into a char array as it does to read a single character). But don't overdo it. Your code should be simple and clear, not complex containing lots of micro-optimizations that don't matter.
  6. You should use the clint.py style checker (right-click to download, and chmod +x to make it executable if needed) to review your code. While this checker may flag a few things that you wish to leave as-is, most of the things it catches, including whitespace errors in the code, should be fixed. We will run this style checker on your code to check for any issues that should have been fixed. Use the discussion board or office hours if you have questions about particular clint warnings.
    Hint: All reasonable programming text editors have commands or settings to use spaces instead of tabs at the beginning of lines, which is required by the style checker and is much more robust than having tabs in the code. For example, if you are a emacs user, you can add the following line to the .emacs file in your home directory to ensure that emacs translates leading tabs to spaces:
    (setq-default indent-tabs-mode nil).
  7. You should check your code with valgrind to ensure that you have handled memorry correctly, and fix any outstanding issues.

Implementation Hints

  1. There are a lot of things to get right here; the job may seem overwhelming if you try to do all of it at once. But if you break it into small tasks, each one of which can be done individually by itself, it should be quite manageable. For instance, figure out how to process a single file before you implement the logic to process all of the files on the command line. Figure out how to open, read, and copy all of a file to stdout before adding another step.
  2. Think before you code. You will ultimately get the job done faster, better, and with less pain if you spend some time to sketch your design (which functions are needed? what exactly do they do? what are the main data structures?) before you write detailed code. Start coding by writing function headings and heading comments and creating significant variables -- and commenting those too. Then as you write detailed code and test it you will have your written design information in the comments to compare and check as you work on the code. That should greatly reduce the number of bugs that wind up in the code and ultimately help you get correct, working code faster and with less effort.
  3. Every time you add something new to your code (see hint #1), test it. Right Now! Immediately!! BEFORE YOU DO ANYTHING ELSE!!! (Did we mention that you should test new changes right away?) It is much easier to find and fix problems if you can isolate the potential bug to a small section of code you just added or changed. The debugger is your friend here-- learn how to use it (and you are required to do this). printf can also be your friend to print values while executing and testing the code.
  4. The standard C library contains many functions that you will find useful. In particular, look at the <stdio.h>, <string.h>, <ctype.h> and <stdlib.h> libraries. Use the cplusplus reference link on the course home page to look up details about functions and their parameters; use a good book like The C Programming Language for background and perspective about how they are intended to be used.
  5. Use the compiler -Wall option and (if you can) the runtime assert function (in assert.h) to catch coding bugs and to check for things that "must happen" or "can't happen" during execution. Don't waste time manually searching for errors that the compiler or run-time tests could have caught for you.
  6. Be sure to test for errors like trying to open or read a nonexistent file to see if your error handling is working properly.
  7. Once you're done, read the instructions again to see if you overlooked anything.
  8. See #8.

Turning In

Please submit your files via the Gradescope HW4 Assignment. You should submit:

Be sure that your name is included in the source code and README files.

Canvas will link to Gradescope for you. If you wish to use late days, please comment on the Late Days assignment.