Homework 3
Due: Tuesday, Feb 7 2023, at 11:59 PM PST
Goal: Gain experience with C
- Create and run C programs,
- Use basic C libraries (i.e. for file and string handling),
- Use the unix debugger,
gdb, and
- Use a style-checking tool to locate source code that may need attention
This project should be done independently. You should not copy and paste code.
Start early! You will need to investigate the resources and libraries mentioned in this document to learn
how to use them.
Hint: Error messages should be written to stderr. In order to do this, you can use the following syntax:
fprintf(stderr, "This is my error message: %s . Done!\n", var) //but you are required to provide a better error message
Synopsis
You will create your own version of the unix command wc.
You will read in a file and report stats including the number of lines and number of words in the file.
Your code should behave as follows:
- Compile with the command
gcc -Wall -std=c11 -o wordcount wordcount.c
- Requires one or more input files.
- If there are no input files, reports an error and exits with a (1) code.
- Does not need to work with redirected input
- If a file can not be opened an error message is printed that the file is skipped.
- For each input file calculate the number of lines, words, and characters
- Can check against wc on any test file - the results should be the same.
- Prints a message with the results and the file name, similar to
wc
- Additionally prints the total number of lines for all files
- Processes a limited number of options -
-l, -w, -c
- If an option is detected the program will output only the number of lines, words, or characters respectively.
- The program will additionally not print the total number of lines in all the files
- The program shouldn't process more than one option.
If more than one option is submitted the first one should be activated.
- To simplify the option handling you will assume that any options will come before the names of the input files.
If you detect any input argument string that is not a valid option, you will assume that it, and any subsequent arguments are file names.
- Hint: you may find
strncmp helpful
- Be able to handle input lines containing up to 500 characters (including the terminating \0 byte).
The performance for other files is undefined, and will not be evaluated.
Example files: shorttext, point.c
Example Operation:
$ ./wordcount -l
Usage: ./wordcount requires an input file.
$ echo $?
1
$ ./wordcount point.c "NON FILE" shorttext
73 247 1574 point.c
NON FILE will not open. Skipping.
4 13 68 shorttext
Total Lines = 77
$ wc point.c "NON FILE" shorttext
73 247 1574 point.c
wc: 'NON FILE': No such file or directory
4 13 68 shorttext
77 260 1642 total
$ ./wordcount -l point.c
73
$ ./wordcount -c point.c shorttext
1574
68
$ ./wordcount -l -wc shorttext
-wc will not open. Skipping.
4
Technical Requirements
- Use standard C library functions where possible; do not
reimplement operations available in the basic libraries.
For example, if you copy strings, use
strncpy in <string.h>
to copy \0-terminated strings; do not write loops to copy such strings one character at a
time.
- Use "safe" versions of file and string
handling routines. Do not use
gets and strcpy;
instead use fgets and strncpy if you
need these functionalities. The safe
functions allow specification of maximum buffer or array lengths
and will not overrun adjacent memory if used properly.
- If an error occurs when opening or reading a file, the program
should write an appropriate error message to
stderr
and continue processing any remaining files on the command
line.
- All of the functions must be in a single file called
wordcount.c.
- Your code must compile and run without errors or warnings when
compiled and executed on
cancun with the -Wall
and -std=c11 options.
- Your program must be robust. It should not
crash (segfault or otherwise) or produce meaningless or incorrect
output regardless of the contents of command line parameters or input
files (except, of course, you are not required to deal with files
or string parameters with lines longer than the limits given above).
If the program ends while running because of some error,
it should print an appropriate error message to
stderr and
exit with an exit code of EXIT_FAILURE (defined in
<stdlib.h> -- see the description of the
exit() function).
- If the program ends normally after attempting to open and
process all of the files listed on the command line, it
should terminate with an exit
code of
EXIT_SUCCESS (see <stdlib.h>).
This is normally done by returning this value as the int
result of the main function.
- You may assume that any file we test against ends in a newline.
Code Quality Requirements
Your code should be readable and
understandable to anyone who knows C. For full credit
your code must observe the following requirements.
- Divide your program into multiple functions, each of which does
a single well-defined task. For example, there should be a function that processes a single input file, which
is called as many times as needed to process each of the files
listed on the command line (and which, in turn, might call other
functions to perform identifiable subtasks).
Your program may not consist of one huge
main function
that does everything. However it should not contain tiny functions
that only contain isolated statements or code fragments instead of
dividing the program into coherent pieces. Guideline: we expect less than 5 functions
- Include function declarations near the
beginning of the file so the function definitions can
appear in whatever order is most appropriate for presenting the
code in the remainder of the file in a logical sequence.
(We won't be too picky here, but main should be the first function, and related functions should be near each other).
- Comment sensibly. Provide a brief description of what each function
does at the beginning of the function (not how it works, but what the goal of the function is
and if the function changes any parameters and what it returns), and provide brief comments around any tricky
code that is hard for our TAs to understand. We want to give you these points and
won't be nearly as precise as 142 and 143 - the goal is just to make it easy for the
TAs to navigate your code
- Avoid global variables. Use parameters (particularly pointers)
appropriately.
- You may use an appropriate #define MAXLINE command to set the
maximum line length mentioned above.
- Use appropriate names for variables and functions: nouns or noun
phrases suggesting the contents of variables or the results of
value-returning functions; verbs or verb phrases
for
void functions that perform an action without
returning a value. Variables of local significance like loop
counters, indices, or pointers should be given simple names
like i
or p
- Don't make unnecessary copies of
large data structures; use pointers. (Copies of
ints,
pointers, and similar things are cheap; copies of arrays and large
structs are expensive.) Don't read the input by calling a library
function to read each individual character. Read the input a line
at a time (it costs just about the same to call an I/O function to
read an entire line into a char array as it does to read a single
character).
- You should use the cpplint.py style checker:
- Use
wget from cancun and chmod +x to make it executable
- Use
./cpplint.py --clint wordcount.c to review your code.
Note: If this fails, check for your python installation (whereis python. In some cases you must call python3 explicitely: python3 ./cpplint.py --clint wordcount.c, or modify the first line of cpplint.py to point to your system's python installation.)
- There is more help for using this code on the CSE 333 page.
- cpplint.py is an example of a linter, which is tool to check code for compliance to style and or coding standards. In this case, the linter is checking that code complies with style guidelines developed at Google, and used widely in industry. Compliance with style guidelines is essential when multiple developers are working on the same codebase.
- While this checker may
flag a few things that you wish to leave as-is, (Notably: You may ignore warnings
about 'strtok') most of the things it
catches, including whitespace errors in the code, should be fixed. We
will run this style checker on your code to check for any issues that
should have been fixed. Use the discussion board or office hours if
you have questions about particular clint warnings.
- Hint: All reasonable programming text editors have commands or
settings to use spaces instead of tabs at the beginning of lines,
which is required by the style checker and is much more robust than
having tabs in the code. For example, if you are a emacs user, you
can add the following line to the
.emacs file in your
home directory to ensure that emacs translates leading tabs to
spaces:
(setq-default indent-tabs-mode nil).
Implementation Hints
- If you break the assignment into small tasks, each one of which can be done individually by
itself, it will be quite manageable. For example, figure out
how to process a single file before you implement the logic to
process all of the files on the command line. (Or, vice-versa, but
start small and test before you move on). Figure out how to
open, read, and copy all of a file to
stdout before
adding another step. HINT: We have a sample program from an earlier lecture that does this.
- You might notice that the structure of this program is similar
to your
spellcheck script: You first check for usage
(is the command called correctly?), and then, for each input
file you perform a task. You write similar error messages, and exit
with a failure code under similar circumstances.
- This homework can be done in less than 90 lines of code.
More than that is ok, but if you find yourself writing significantly more than that,
there's a simpler way, and you may want to come to office hours.
- Implement a main that gets the arguments and prints a place holder for the output
(either to screen or file), and you are off to a good start. Test this before you move on.
- Think before you code. You will ultimately get the job done
faster, better, and with less pain if you spend some time to sketch
your design (which functions are needed? what exactly do they do? what
are the main data structures?) before you write detailed code. Start
coding by writing function headings and heading comments and creating
significant variables -- and commenting those too. Then as you write
detailed code and test it you will have your written design
information in the comments to compare and check as you work on the
code.
- I/O is relatively expensive, while storing one more integer is relatively inexpensive.
As a result, you should write one function that calculates
all the potential output values in one go, and use the options to determine
which ones to print to stdout.
- Every time you add something new to your code (see hint #1),
test it. It is much easier to find and
fix problems if you can isolate the potential bug to a small
section of code you just added or changed.
- The standard C library contains many functions that you will
find useful. In particular, look at the
<stdio.h>,
<string.h>, <ctype.h>
and <stdlib.h> libraries. Use the cplusplus reference
link on the course home page to look up details about functions
and their parameters; use a good book like The C Programming
Language for background and perspective about how they are
intended to be used.
strlen tells you how many characters are in a string.
- Every file stream that is open should be subsequently closed.
- Use the compiler
-Wall option. Don't waste time
searching for errors that the compiler or run-time tests could have
caught for you.
- Be sure to test for errors like trying to open or read a
nonexistent file to see if your error handling is working
properly.
- Once you're done, read the instructions again to see if you
overlooked anything.
Testing
We are providing you with the binary wordcount that you can test against.
You can run this from the command line the exact same way you would run your wordcount executable.
By comparing the output from the official binary we give you to the output from your executable, you can make sure that your program is performing as expected. Get the binary to cancun with:
wget https://courses.cs.washington.edu/courses/cse374/23wi/hw/hw3Files/wordcount
Assessment
The majority of your grade will be evaluated by the Gradescope autograder that will check specified performance criteria.
There will be some additional manual grading that will look at the points brought up in the
requirements above throughout the entire spec (especially looking at code efficiency and following the spec).
Turning In
Please submit your files via the Gradescope HW3 Assignment. You should submit one file named wordcount.c.
As usual, you may resubmit for an improved score on the autograder.