Due: Thursday, April 28, 2022, at 11 pm
The purpose of this assignment is to gain some experience with C
programming by implementing a utility program that is similar
to grep
, but without the ability to process regular
expressions (i.e., a lot like a simple version
of fgrep
). In particular, in this assignment, you will:
gdb
, andThis assignment does not include any particularly complicated logic or algorithms, but it will require you to organize your code well and make effective use of the C language and libraries. You will also have to explore the details of the C string and file I/O libraries to discover how to do various operations that should already be familiar from your programming experience in other languages, but which are different in C. It is meant as an orientation to the Unix/Linux C programming environment. You should do this assignment by yourself.
Implement in C a Unix utility program gasp
. The command
./gasp STRING FILE...should read the listed files (
FILE...
) and copy each line from the input
to stdout
if it contains STRING
anywhere in
the input line. Each output line should be preceded by the name of
the file that contains it. The argument STRING
may be any
sequence of characters (as expanded, of course, by the shell depending
on how the argument is quoted).
This is the same output produced by fgrep
and grep
if the STRING
argument is
treated as literal data and not as a regular expression.
Note that the STRING
argument can contain any characters
at all - although you may need to quote it when you enter the command in a terminal
in order to prevent the shell
from trying to expand various punctuation characters in the string using
shell globbing rules.
You should match the output format
of fgrep -H
, where the input file name is always included
in the output before each matched input line,
even if only one input filename appears on the gasp
command line.
There should be a colon (:
) separating the filename from the matched
text line. Do not add or delete anything from the text line even if it begins or
ends with whitespace.
For example, if the first line of file story.txt
is the string
"In the beginning",
then the output from the command gasp begin story.txt
should include
this line:
story.txt:In the beginning
Besides the general specification given above, your program should meet the following requirements to receive full credit.
\0
byte). This number
should be specified with an appropriate #define
preprocessor command so it can be changed easily. Your program is
allowed to produce incorrect results or fail if presented with
input data files containing lines longer than this limit.\0
). This length should also be specified
by an appropriate #define
.strncpy
in <string.h>
can be used to copy \0
-terminated strings; you should
not be writing loops to copy such strings one character at a
time.getopt
function in the Linux
library that provides simplified handling of command line
options. For this assignment, only, you may not
use this function. You should implement the processing of
command line options yourself, of course using the string library
functions when these are helpful.fgets
and strncpy
instead of routines
like gets
and strcpy
. The safe
functions allow specification of maximum buffer or array lengths
and will not overrun adjacent memory if used properly.stderr
and continue processing any remaining files on the command
line.gasp.c
.
Since this program is relatively short, all of the functions should be
in this single file. You should arrange your code so that related
functions are grouped together in a logical order in the file.klaatu
or the current CSE
Linux VM using gcc
with the -Wall
and -std=c17
options. Since this assignment should
not need to use any unusual or system-dependent code you may
be able to develop and test your code on any recent Linux
system or other system that supports a recent gcc
C
compiler. However, we will test your submissions using the CSE
systems (cancun
or the VM),
so you must verify your program there before
turning it in.stderr
and
exit with an exit code of EXIT_FAILURE
(defined in
<stdlib.h>
-- see the description of the
exit()
function).EXIT_SUCCESS
(see <stdlib.h>
).
This is normally done by returning this value as the int
result of the main
function.As with any program you write, your code should be readable and understandable to anyone who knows C. In particular, for full credit your code must observe the following requirements.
main
function
that does everything. However it should not contain tiny functions
that only contain isolated statements or code fragments instead of
dividing the program into coherent pieces.
Since this overall project is fairly small, and we haven't talked about
dividing C programs into multiple files yet,
you should
include all of your functions in a single C source file named gasp.c
.
Be sure to
include appropriate function prototype declarations near the
beginning of the file so the actual function definitions can
appear in whatever order is most appropriate for presenting the
code in the remainder of the file in a logical sequence and so that
related functions are grouped together
and so that the reader can easily locate related functions.void
functions that perform an action without
returning a value. Variables of local significance like loop
counters, indices, or pointers usually should be given simple names
like i
, k
, n
,
or p
, and often do not require further comments.int
s,
pointers, and similar things are cheap; copies of arrays and large
data structures are expensive.) Don't read the input by calling a library
function to read each individual character. Read the input a line
at a time (it costs just about the same to call an I/O function to
read an entire line into a char array as it does to read a single
character).
Don't perform a computation repeatedly, say, once for each input line
in a file. If a value is needed repeatedly, compute it once and store
the result for later use, unless it is particularly cheap to re-compute it.
But don't overdo it. Your code should be simple and
clear, not complex containing lots of micro-optimizations that
don't matter.
You should use the clint.py style checker
(right-click to download, and chmod +x
to make it
executable if needed) to review your code. While this checker may
flag a few things that you wish to leave as-is, most of the things it
catches, including whitespace errors in the code, should be fixed. We
will run this style checker on your code to check for any issues that
should have been fixed. Use the discussion board or office hours if
you have questions about particular clint warnings.
We will generally follow the parts of the Google C++ Style Guide
(linked to the course resources web page)
that apply to C. You should use that as a reference.
Hint: All reasonable programming text editors have commands or
settings to use spaces instead of tabs at the beginning of lines,
which is required by the style checker and is much more robust than
having tabs in the code. For example, if you are a emacs user, you
can add the following line to the .emacs
file in your
home directory to ensure that emacs translates leading tabs to
spaces:(setq-default indent-tabs-mode nil)
.
There are optional packages for VSCode to aid in formatting code to match
different style guides, including one that specifically matches the
Google guide.
stdout
before
you add the code to search for the STRING
argument
and selectively print lines containing it.printf
can also be your friend to print values
while executing and testing the code. <stdio.h>
,
<string.h>
, <ctype.h>
and <stdlib.h>
libraries. Use the reference
link on the course home page to look up details about functions
and their parameters; use a good book like The C Programming
Language for background and perspective about how they are
intended to be used.
-Wall
option and (if you can) the
runtime assert
function (in assert.h
) to
catch coding bugs and to check for things that "must happen" or
"can't happen" during execution. Don't waste time manually
searching for errors that the compiler or run-time tests could have
caught for you.Learning how to use a debugger effectively can be a big help in
getting your programs to work (although it is no substitute for
thinking and careful programming). To encourage you to gain a basic
familiarity with gdb
, you are required to do the
following:
-g
option, to include debugging information in the executable
file.script
program to capture the following
console session in a text file named debug.script
.gdb
with your executable program as an
argument.main
,
and the other at the point in your program right after
the fopen
function call that opens the input
files.gdb
run
command, providing a search string and at least one file name as
arguments.main
, use the appropriate gdb
command
to display the contents of the variable containing the search
string (the first argument to the program).
When you've done that, continue execution of
the program.gdb
commands to display (i) a backtrace showing the functions active
at the time the breakpoint was reached, (ii) source code showing
the line containing the breakpoint and a couple of surrounding
lines, (iii) the name of the file that was supplied to
the fopen
function (this string should be in a
variable somewhere), and (iv) the pointer value returned
by fopen
(presumably just a hex address, although it
might be NULL
if the file can't be opened).gdb
and exit
from the script
program's shell. Save the debug.script
output file
from script
to turn in later.You should use gdb
's basic command-line interface for
this part of the assignment, even if you use the -tui
option for your routine debugging. The full-screen -tui
interface generates a great deal of extra output in the script file,
which makes it almost impossible to read.
A small amount of extra credit will be awarded for adding the following extension to an already complete, working assignment. No extra credit will be awarded if the basic program is not fully implemented and substantially bug-free.
If you do any extra credit parts, you should turn in both your
original program without the extra credit and your extended program. The
extra credit version should be in a separate file named
gasp-extra.c
. Your README
file (see turn-in
instructions below) should contain a brief description of your extensions.
Extra credit really is extra. It can increase your score but if you choose not to do it it will not penalize of disadvantage you.
Description: add two command-line options to your gasp
program.
After this addition, the command
./gasp [options] STRING FILE...should read the listed files (
FILE...
) and copy each line from the input
to stdout
if it contains STRING
anywhere in
the input line. Each output line should be preceded by the name of
the file that contains it. The argument STRING
may be any
sequence of characters (as expanded, of course, by the shell depending
on how the argument is quoted).
There are two available options, which may appear in any order if both are present:
-i
Ignore case when searching for lines that
contain STRING
. If the -i
option is
used, the strings "this
",
"This
", "THIS
", and
"thiS
" all match; if -i
is not
used, they are all considered different.-n
Number lines in output. Each line copied
to stdout
should include the line number in the file
where it was found in addition to the file name. The lines in each
file are numbered from 1.Your program does not need to be able to handle combinations of
option letters written as a single multi-character option like
-in
or -ni
. But it does need to be able to
handle any combination of either or both (or neither) option in any order when they
appear separately on the command line preceding the STRING
argument.
You may assume that no option
occurs more than once (you do not need to check for this).
You may assume that the string argument does not begin with the hyphen
(-
) character.
For the -i
option, two characters are considered to
be equal ignoring case if they are the same when translated by
the tolower(c)
function (or, alternatively,
toupper(c)
) in <ctype.h>
.
The output format if the -n
option is present
should match the format produced by fgrep -H -n
.
For example, if the first line of file story.txt
is the string
"In the beginning",
then the output from gasp -n begin story.txt
should begin like
this:
story.txt:1:In the beginning
If no options are provided on the command line, the gasp
program should work exactly as it did before.
Generally programs should not have global variables, but, if you wish,
you may have two global
variables that indicate whether the -i
or -n
options are selected or not.
Hint: An easy way to implement the -i
option is to
translate both the STRING
argument and each input
line to lowercase, then search for the
translated STRING
in the translated input
line.
However, if the string is found, the original line from the input
file should be printed, not the translated one.
Efficiency tip: translate the search string once, not repeatedly
for every input line
As before, you can use any standard Linux or C library functions except that you (still) may
not use the getopt
function in this assignment, even for the extra-credit parts.
If you have added the -i
and -n
options from the
previous extra credit section and still would like to expand on your program,
here are some suggestions for things you could try.
These also will be awarded a small amount of extra credit, but if you choose
not to do them it will not lower your score.
You should not attempt any of these until you have done the previous extra
credit to add the -i
and -n
command-line options to the program.
stdin
. This should be fairly easy to add
if your code is organized as a well-designed collection of
functions.-w
to search for words separated by
whitespace. If -w
is specified,
the STRING
on the gasp
command line
should only be found if it is surrounded by whitespace (blanks,
tabs, newlines, etc.) in the input file(s) and not as part of some
other string. For example, the string foo
would
match foo
but not food
. A
character c
in the input should be treated as
whitespace if the library function call isspace(c)
returns true. If you implement this option, the program should
find the word if it appears at the beginning or end of a line, as
well as in the middle. You may also use an additional global
variable to record the state of this option if you wish.-n
option to count line numbers incorrectly.
However you decide to implement this, long input lines should not
cause your program to fail or crash.STRING
parameter anywhere in the line.
If you read arbitrarily long input lines in chunks that contain only
parts of an input line, be sure you can correctly handle situations
where the STRING
value appears in a line but spans two
parts of the line when they are read instead of falling entirely inside of one chunk. For
efficiency reasons you should continue to read the input in large
chunks, not a character at a time.Your solutions should be:
cancun
or the current CSE Linux virtual
machine).clint.py
style checker to locate potential
problems, and use the parts of the Google C++ style guide that are
appropriate for C for guidanceIdentifying information, including your name and CSE 374 22sp Homework 4, should appear as comments in each of your files.
Use gradescope, linked on the course resources web page, to submit your files (drag them onto the gradescope page):
gasp.c
),debug.script
file with the script
(console) output from the Debugging exercise above,gasp-extra.c
source file containing the extra-credit
version of the program, andREADME
that briefly describes the
extra credit parts that you added to your program, or contains a
note that you did not implement any of the extra credit parts.README
files. Turn in separate files; do not turn in a tar
,
zip
, or other kind of archive file.
Gradescope will allow you to turn in your homework up to two days late, if you choose to use one or two of your late days, but you should try to save your late days for much later in the quarter when you may really want them.