Due: Tuesday, November 12, 2024, at 11:59pm
Goals
Synopsis
Set-Up
Background Information
Provided Files
Your Task
Assessment
Turn-in instructions
In this assignment, you will develop a more complex program using dynamic data structures. In doing so you will:
In this assignment, you will build programs to implement T9 predictive text,
a text input mode developed originally for cell phones and
still used for numeric keypads. Each number
from 2-9 on the keypad represents three or four letters, the number 0 represents
a space, and 1 represents a set of symbols such as { , . ! ? } etc. The numbers
from 2-9 represent letters as follows:
2 => ABC
3 => DEF
4 => GHI
5 => JKL
6 => MNO
7 => PQRS
8 => TUV
9 => WXYZ
Since multiple letters map to a single number, many key sequences represent multiple words. For example, the input 2665 represents "book" and "cool", among other possibilities.
2 6 6 5 2 6 6 5 a B c m n O m n O j K l OR a b C m n O m n O j k L | | | | | | | | ----------------------- ----------------------- | | book cool
To translate from number sequences to words, we will use a data structure known as a trie. A trie is a multiway branching structure (tree) that stores the prefixes of sequences. As we travel down a path in the trie, we reach word sequences spelled out by the numbers along that path. Classic trie data structures have edges labeled with letters to store prefixes of strings. But for this application, we use a compressed trie that has only 10 possible branches at each node instead of 26, since the digits 0-9 represent the 26 letters, space and symbols. Because of this, an extra layer of complexity is needed to figure out the string represented by a path.
(Actually, for our application, each node only needs 8 possible children numbered 2-9, since digits 0 and 1 don't encode letters. But writing the code might be easier if nodes have 10 children numbered 0-9, since then subtree number n corresponds to digit n. Feel free to use either representation for the trie depending on which seems simpler to implement.)
For more information on the trie data structure, here is a link to the Wikipedia article.
This project should be done independently. If you work with a classmate, make sure you are each editing and working on your own set of files. You should not copy and paste code.
Before you get started, ensure that your set-up is up-to-date and appropriate.
cancun
, including gcc
git status
to determine if there are outstanding changes, and git add
and git commit
if you are unsure.)git pull upstream main
to pull the newest commit from the upstream repository.
This will give you access to the hw6
folder containing the materials for this
assignment. (upstream
specifies that you want to pull from the upstream
repository, and main
specifies the branch. You may see a text editor open to allow
you to edit the merge message. This will likely be Vim, so you can edit it and then save,
or just accept the current text and exit using :q!
.)You will do this assignment in your new hw6
folder.
You should get into the habit of committing and pushing code frequently.
One common style of software engineering is called “Test-Driven Development” (TDD). With this workflow, you write tests for the intended behavior before writing the code which implements that behavior. In the previous assignment you have written a complete set of tests for a T9 implemenation. In this assignment you will be doing the development portion of TDD.
Your hw6 folder contains the following files:
t9_lib.h
: The header describing T9’s interfaces. This is what you will be implementing.t9_lib.c
: You will modify this program to implement the T9 library.t9_demo.c
: Demo program to try out the correct T9 implementation.t9_tests.c
: Starter file for your tests. You will want to replace this file with the one created in HW6: Testing T9.safe_assert.h
: The test framework you will be using.dictionary.txt
: Large dictionary file, containing most of the English language.small_dictionary.txt
: Small dictionary file, containing just a handful of words.clint.py
: The linter you will use to check your coding style.Implement the data structure that supports the T9 demo, as described in the
t9_lib.h
file. This new library will be run with the provided
j9_demo.c
to allow for an interactive session.
Your code will implement a prefix tree, such that some nodes contain completed words.
For example, if the program reads the set of words "jello","rocks",
and "socks" from
the dictionary and adds it to an empty trie, the resulting trie should look
like this:
We provide you with two text files,
smallDictionary.txt
and
dictionary.txt
.
Each of these text files contains a list of words to be used in constructing
a trie. You may additionally use your own dictionary files to test.
Translate each word in the file into its associated T9 key sequence, and add the word to the trie. In the case of multiple words having the same key sequence k, let the first word encountered in the text file be represented by the key sequence k, the next encountered represented by k#, the next k##, etc. For example, 2273 can represent acre, bard, bare, base, cape, card, care, or case. To disambiguate, acre would be represented by 2273, bard by 2273#, bare by 2273##, and so forth.
For the interactive session, use the user-entered pattern to traverse the trie and return the correct word.
Your trie data structure should contain nodes to represent the tree, and strings (char arrays) containing copies of the words read from the dictionary file, linked to appropriate nodes in the trie.
Besides the general specification given above, your program should meet the following requirements to receive full credit.
Makefile
and use make
to compile your program. Your Makefile
should only
recompile the necessary part(s) of the program after changes are
made. Your Makefile
should include a target clean
to remove previously generated files. You may also want to include a target
test
for your own use during development.malloc
to dynamically allocate the nodes, strings, and
any other data that make up your trie. Make sure you free all the memory
at the end of the run. We will use valgrind
to evaluate how well
your code manages memory.malloc
and release the storage with free
when you are done with it. The amount allocated should be based on the
actual size needed, not some arbitrary size that is assumed to be "large
enough".strncpy
instead
of strcpy
malloc
will return NULL
if it is unable to
allocate storage. Although this is extremely unlikely to happen, a
robust program must check for the possibility and react appropriately if
it does.stderr
and terminate if necessary.free
everything acquired with
malloc
). This should be done explicitly without relying on
the operating system to clean up after the program finishes. Remember
to close any files you have opened.gcc -Wall -std=c11
on Cancun.
Your program should build without errors when
make
is used to run your Makefile
. You are,
of course, free to use other systems for development, and you should be
fine as long as you have a relatively recent version of gcc. But the code
must work on the standard linux distribution we use for the course.valgrind
. You must
load a dictionary and have a short interactive session while testing with
valgrind
.
(Warning: valgrind
slows down execution considerably. It
will take several minutes to load the full dictionary.txt file and then
free the resulting tree under valgrind
. We suggest you use
smaller input files during development to test for memory problems with
valgrind
.) If memory leaks are detected, compiling with the
-g
option and using
valgrind
's --leak-check=full
option will
generate more extensive messages with information about the
memory leaks. As with any program you write, your code should be readable and understandable to anyone who knows C. In particular, for full credit your code must observe the following requirements.
.h
) file for the trie should only declare items
that are shared between client programs
that use the header and the file(s) that implement it. Don't include in
the header file implementation details that should be hidden. If you make use
of helper functions within T9 you should declare them at the top of the
t9_lib.c
file. Be sure to include comments that describe their function.malloc
or free
- these are expensive. Don't make unnecessary
copies of large data structures; use pointers. (Copies of ints,
pointers, and similar things are cheap; copies of large arrays and
structs are expensive.)As with the previous assignment, you should use the cpplint.py style checker to review your code. While this checker may flag a few things that you wish to leave as-is, most of the things it catches, including whitespace errors in the code, should be fixed. We will run this style checker on your code to check for any issues that should have been fixed. Please use the class discussion board if you have questions about any of clint's warnings and whether they can be ignored.
struct
s) on paper or on a whiteboard. Be sure you
understand what you are trying to do before you start typing.gdb
and valgrind
are
your friends here to examine values while debugging.gdb
is your friend.if-elseif-else
statements, but another way to do it is to
have an array with one entry for each possible character. In the entry
for each character, store the corresponding digit code. Then you can
look up the code for a character without a complicated nest of
if
statements. (Recall that in C, as in Java and other
languages, a character can be used as a small integer. That is
particularly helpful for exactly this kind of application, where we
might want to use a character value as an index into a table of
data.)The sequences below can be used to validate your trie against the supplied
dictionary.txt
file.
This homework is worth a total of 65 points.
There will be an autograder which evaluates whether your T9 implementation meets the interface specification. You should use the test suite you created during the previous homework to test your T9 functionality. You will also want to use Valgrind, similar to the way you used it in the debugging homework, to evaluate your memory management. You are encouraged to resubmit until you achieve a full score on the autograder.
This homework will also be evaluated manually, specifically to look at code clarity, style, and commenting. You should be sure to include your name and date at the top of the file. You should also be sure to use brief comments to describe the pupose of the module, and functionality of other sections.
You will submit this homework to the Gradescope HW6: T9 Implementation, via Gitlab.
You will first update your Gitlab repository. Use git add
and git commit
to ensure that your updated t9_tests.c
, and any required additional
&ldquot;.txt&rdquot; files, are committed. This will be, at minimum, your t9_lib.c
and Makefile
.
These should be located in the hw6
folder, located at the top
level of your repository. User git push
to bring the origin/remote repository up-to-date.
Once you locate the Gitlab assignment you will tap the "GitLab" button on the bottom:
Once you submit your code the autograder may take some time to run. You may resubmit for a higher grade, but your should always do as much testing as possible on your own platform before resubmitting.