Due: Tues, Feb 28, 2023, at 11:59 pm
You will develop a more complex program using data structures. You will:
You are encouraged to use the starter files here for this assignment.
You will build programs to implement T9 predictive text,
a text input mode developed originally for cell flip phones and
still used for numeric keypads. Each number
from 2-9 on the keypad represent three or four letters, the number 0 represents
a space, and 1 represents a set of symbols such as { , . ! ? } etc. The numbers
from 2-9 represent letters as follows:
2 ABC
3 DEF
4 GHI
5 JKL
6 MNO
7 PQRS
8 TUV
9 WXYZ
Since multiple letters map to a single number, many key sequences represent multiple words. For example, the input 2665 represents "book" and "cool", among other possibilities.
To translate from number sequences to words, we will use a data structure known as a trie. A trie is a multiway branching structure (tree) that stores the prefixes of sequences. As we travel down a path in the trie, we reach word sequences spelled out by the numbers along that path. Classic trie data structures have edges labeled with letters to store prefixes of strings. But for this application, we used a compressed trie that has only 10 possible branches at each node instead of 26, since the digits 0-9 represent the 26 letters, space and symbols. Because of this, an extra layer of complexity is needed to figure out the string represented by a path.
(Actually, for our application, each node only needs 8 possible children numbered 2-9, since digits 0 and 1 don't encode letters. But writing the code might be easier if nodes have 10 children numbered 0-9, since then subtree number n corresponds to digit n. Feel free to use either representation for the trie depending on which seems simpler to implement.)
For more information on the trie data structure, here is a link to the Wikipedia article.
Implement in C a program t9
. The command
./t9 FILE
should read in a dictionary file (FILE
) that contains a list
of words. Translate each word in the dictionary into its numeric key sequence,
then add the
key sequence to your trie, with the word at the end of the path corresponding
to the digits. If a word with the same numeric sequence already exists in
the trie, add the new word to the trie as a link to a new node with an edge
labeled '#' instead of one of the digits 2-9. (The words linked from a
node by the '#' edges essentially form a "linked list" of words that have the
same numeric code, but to simplify the implementation
we'll use additional tree nodes to link together words with duplicate codes
instead of defining another kind of linked-list node to deal with this situation.)
For example, if the program reads the set of words "jello","rocks",
and "socks" from
the dictionary and adds it to an empty trie, the resulting trie should look
like this:
Once your program has read the dictionary and built the trie containing the words in it, it should start an interactive session. The user should be able to type numbers and the program should print out the word corresponding to the sequence of decimal digits in the numbers. Your program should use the numbers typed by the user to traverse the trie that has already been created, retrieve the word, and print it to the screen. If the user then enters '#', the program should print the next word in the trie that has the same numeric value, and so forth. The user can also type a number followed by one or more '#' characters - this should print the same word that would be found by typing the number and individual '#' characters on separate input lines. If the user enters '#' and there are no more words in the trie that have the same numeric digit sequence, then the program should print "There are no more T9onyms". If no word in the trie matches the input number, the program should print "Not found in current dictionary".
As an example, if we run the program using the above trie, an interactive session might look like this:
Enter "exit" to quit. Enter Key Sequence (or "#" for next word): > 76257 'rocks' Enter Key Sequence (or "#" for next word): > # 'socks' Enter Key Sequence (or "#" for next word): > 53556 'jello' Enter Key Sequence (or "#" for next word): > # There are no more T9onyms Enter Key Sequence (or "#" for next word): > 76257# 'socks' Enter Key Sequence (or "#" for next word): > 76257## There are no more T9onyms >4423 Not found in current dictionary >exit
The interactive session should terminate either when the user enters the word "exit" or when the end-of-file is reached on the interactive input (indicated by typing control-D on the keyboard on a separate line by itself).
Note: Be sure your program properly handles the case if the user types more "#"s than there are T9onyms for a particular number.
We provide you with two text files, smallDictionary.txt
and
dictionary.txt
. Each of these text
files contains a list of words to be used in constructing a trie - the
small one primarily for testing, and the large one for the final
program. Translate each word in the file into its associated T9 digit key
sequence, and add the word to the trie. In the case of multiple words
having the same key sequence k, the first word encountered
in the text file with that key sequence is represented by the key sequence k,
the next word from the input file with the same key sequence is
represented by k#, the next by k##, etc. For
example, 2273 can represent acre, bard, bare, base, cape, card, care, or
case. To disambiguate, acre would be represented by 2273, bard by
2273#, bare by 2273##, and so forth. When a user inputs a key sequence,
print the appropriate word.
Your trie data structure should contain nodes to represent the tree, and C strings (\0-terminated char arrays) containing copies of the words read from the dictionary file, linked to appropriate nodes in the trie.
Besides the general specification given above, your program should meet the following requirements to receive full credit.
Makefile
and use make
to compile your program. Your Makefile
should only
recompile the necessary part(s) of the program after changes are made.
Your Makefile
should include a clean
target so that
make clean
will remove the t9
executable file,
all .o
compiled files, and all files with names ending in
~
(tilde) - i.e., backup files created by text editors
and similar programs.
malloc
to dynamically allocate the nodes, strings, and
any other data that make up your trie.malloc
and return the storage with free
when you are done with it. The amount allocated should be based on the
actual size needed, not some arbitrary size that is assumed to be "large
enough".malloc
will return NULL
if it is unable to
allocate storage. Although this is extremely unlikely to happen, a
robust program must check for the possibility and react appropriately if
it does.stderr
and terminate if there
is no further work to be done.free
everything acquired with
malloc
). This should be done explicitly without relying on
the operating system to clean up after the program finishes.Makefile
compiles your code
with gcc -Wall -g -std=c17
on cancun
>.
Your program should build without errors when
make
is used to run your Makefile
.valgrind
.
(Warning: valgrind
slows down execution considerably. It
will take several minutes to load the full dictionary.txt file and then
free the resulting tree under valgrind
. We suggest you use
smaller input files during development to test for memory problems with
valgrind
.) If memory leaks or errors are detected,
valgrind
's --leak-check=full
option will be
useful to generate more extensive messages with information about the
memory problems. For full credit you must:
.h
) file for the trie (and any other header
files) should only declare items that are shared between client programs
that use the header and the file(s) that implement it. Don't include in
the header file implementation details that should be hidden, including
helper functions that are not meant to be used by other files. Be sure to
use the standard #ifndef
preprocessor trick..c
file for local functions).
The comment should not be repeated later where the function is defined (implemented).
malloc
or free
- these are relatively expensive. Don't make unnecessary
copies of large data structures; use pointers. (Copies of ints,
pointers, and similar things are cheap; copies of large arrays and
structs are expensive.)As with homework 3, we will be running a linter to check your files for certain style elements. We will be running the script which calls clint.py and then gets rid of certain output messages that we will not penalize, such as the "No copyright" warning and using tabs versus spaces. You can run the hw5Linter on your files one at a time to do the same checks that we will do on your assignment. If you have questions about warnings you are seeing when you run hw5Linter, you can make a post on edstem and we will take a look.
struct
s) on paper or on a whiteboard. Be sure you
understand what you are trying to do before you start typing.gdb
and printf
are
your friends here to examine values while debugging.gdb
is helpful here, but it can be tedious to visualize
a data structure with one gdb
print command at a time.gdb
is your friend.if-elseif-else
statements, but another way to do it is to
have an array with one entry for each possible character. In the entry
for each character, store the corresponding digit code. Then you can
look up the code for a character without a complicated nest of
if
statements. (Recall that in C, as in Java and other
languages, a character can be used as a small integer. That is
particularly helpful for exactly this kind of application, where we
might want to use a character value as an index into a table of
data.)The sequences below can be used to validate your trie against the supplied
dictionary.txt
file.
Your solutions should be:
cancun
hw5Linter
style checker to locate potential
problemsvalgrind
to help check for possible problems.Use gradescope, linked on the course resources web page, to submit your files (drag them onto the gradescope page):
Makefile
for your programtar
,
zip
, or other kind of archive file
(Gradescope should automatically unpack the files if you do use a zip
archive or something similar to drag your files to Gradescope's window,
but be sure it successfully unpacks the actual files.)
Gradescope will allow you to turn in your homework up to two days late, if you choose to use one or two of your late days.