Synopsis¶
In this assignment, you will implement the T9 predictive text library you wrote tests for in HW5. Here again is the map of the letters each number can correspond to:
Key | Letters |
---|---|
2 | ABC |
3 | DEF |
4 | GHI |
5 | JKL |
6 | MNO |
7 | PQRS |
8 | TUV |
9 | WXYZ |
You can refer back to the HW5 spec for examples and further explanation.
To translate from number sequences to words, we will use a data structure known as a trie. A trie is a multiway branching structure (tree) that stores the prefixes of sequences. As we travel down a path in the trie, we reach word sequences spelled out by the numbers along that path. For this application, we use a compressed trie that has 8 possible branches at each node, one for each possible digit (2-9).
Writing the code might be easier if nodes have 9 or 10 children numbered 0-9, since then subtree number n corresponds to digit n (either nodes 0 or 1 would be used to represen the “#” branch). Feel free to use either representation for the trie depending on which seems simpler to implement.
For more information on the trie data structure, you can refer to our lecture slides and this Wikipedia article.
Technical Requirements¶
Implement the T9 library you tested in HW5. Examine t9_lib.h
from the starter code, implementing each function in a corresponding t9_lib.c
file, and copy in t9_tests.c
from your HW5 submission to test your implementation.
Each node is to contain a word which terminates at that node. If there is already a word at the leaf node for the same pattern, build up a linked list of nodes using the “#” branch. For example, if the program reads the set of words “jello”,”rocks”, and “socks” from the dictionary and adds it to an empty trie, the resulting trie should look like this:
We have provided a demo program (t9_demo.c
) which utilizes the library you will be implementing. When provided a dictionary file (eg, ./t9_demo small_dictionary.txt
), you should be able to look up words using their T9 sequence.
As an example, if we run the program using the above trie, an interactive session might look like this:
Welcome to T9!
Type "exit" and press Enter to quit.
Enter key sequence:
> 76257
rocks
Enter key sequence:
> 76257#
socks
Enter key sequence:
> 53556
jello
Enter key sequence:
> 53556#
No entry for 53556#.
Enter key sequence:
> 76257##
No entry for 76257##.
Enter key sequence:
> 4423
No entry for 4423.
Enter key sequence:
> exit
We provide you with two text files, small_dictionary.txt
and dictionary.txt
. Each of these text files contains a list of words to be used in constructing a trie - the small one primarily for testing, and the large one for the final program.
Your library should translate each word in the file into its associated T9 key sequence, and add the word to the trie. In the case of multiple words having the same key sequence k, the first word encountered in the text file should be represented by the key sequence k, the next encountered represented by k#, the next k##, etc. For example, 2273 can represent acre, bard, bare, base, cape, card, care, or case. acre would then be represented by 2273, bard by 2273#, bare by 2273##, and so forth.
Your trie data structure should contain dynamically allocated nodes to represent the tree, and strings containing copies of the words read from the dictionary file, linked to appropriate nodes in the trie.
Besides the general specification given above, your program should meet the following requirements to receive full credit.
- You must write a Makefile which compiles your library. Your code must compile and run without errors or warnings when compiled with your Makefile on
calgary
. Your Makefile must invokegcc
with the-Wall
and-std=c11
options. In particular, we require the followingmake
targets:
t9_lib.o
: Compiles, but does not link, your t9 library code.t9_demo.o
: Compiles, but does not link, the t9 demo program.t9_demo
: Links yourt9_lib.o
andt9_demo.o
into an executable calledt9_demo
. Does not actually run the program.
Tip
You can use the Makefile from HW5 as a hint on how to write the Makefile.
- Files that need to be edit:
t9_priv.h
: You will need to declare and define the T9 struct here.t9_lib.c
: You implement the functions that are defined int9_lib.h
.t9_tests.c
: You will want to replace this file with your test suite from HW5.
-
Use
malloc
to dynamically allocate the nodes, strings, and any other data that make up your trie. You may assume that no word (and therefore no numerical input) is longer than 50 characters; you should otherwise be able to process word lists of arbitrary size. -
If an error occurs when opening or reading a file, your library function should return
NULL
. -
Your library should exhibit no memory leaks or other memory errors when the demo program is run using
valgrind
. If memory leaks are detected, valgrind’s--leak-check=full
option will be useful to generate more extensive messages with information about the memory leaks.
Caution
valgrind
slows down execution considerably! It will take several minutes to load the full dictionary.txt
file and then free the resulting tree under valgrind. We suggest you use smaller input files during development to test for memory problems with valgrind.
Test Sequences¶
The sequences below can be used to validate your trie against the supplied dictionary.txt
file.
- 22737: acres, bards, barer, bares, barfs, baser, bases, caper, capes, cards, carer, cares, cases
- 46637: goner, goods, goofs, homer, homes, honer, hones, hoods, hoofs, inner
- 2273: acre, bard, bare, barf, base, cape, card, care, case
- 729: paw, pax, pay, raw, rax, ray, saw, sax, say
- 76737: popes, pores, poser, poses, roper, ropes, roses, sords, sorer, sores
Code Quality Requirements¶
As with any program you write, your code should be readable and understandable to anyone who knows C. In particular, for full credit your code must observe the following requirements.
-
You should check your code with
clint.py
and ensure there are no style errors. Specifically, you should remove all the TODOs once you finish implementing T9. -
Use appropriate names for variables and functions. Avoid names like “var” or “flag” that provide no useful information, though counter variables in for loops are okay.
-
No global variables. Use parameters (particularly pointers) appropriately. Global constants are okay, however.
-
Avoid making excessive copies of large data structures. Consider using pointers instead.
-
If you need to create a copy of a string or other variable-size data, you should dynamically allocate an appropriate amount of storage using
malloc
and release the storage withfree
when you are done with it. The amount allocated should be based on the actual size needed, not some arbitrary size that is assumed to be “large enough”. -
Use standard C library functions where possible; do not reimplement operations available in the standard libraries.
-
You must check the return status (result code) of every library function you call to be sure that no errors occurred. In particular,
malloc
will return NULL if it is unable to allocate storage. Although this is extremely unlikely to happen, a robust program must check for the possibility and react appropriately if it does.
Implementation Hints¶
You should consider writing your Makefile and bringing in your test suite from HW5 first.
There are a lot of things to get right here; the job may seem overwhelming if you try to do it all at once. But if you break it into small tasks, each one of which can be done individually by itself, it should be quite manageable.
-
Design the structure of the trie and the structs (and struct fields) you need.
-
Figure out how to add a single word to the trie (
AddWordToT9
) before you implement the logic to process all of the words in the dictionary (InitializeFromFileT9
). -
The
fgets()
function may be helpful for reading in the dictionary line by line. -
Figure out how to add a few words that have different numeric codes before you handle words that have the same codes.
-
Implement the code to traverse the trie to translate an input key sequence into the corresponding word (
PredictT9
) once you’ve built the trie, not before. -
Before you start typing code into the computer, spend some time sketching out data structures and code (particularly trie node structs) on paper or on a whiteboard. Be sure you understand what you are trying to do before you start typing.
-
Test your code frequently – you can use both the demo program and your tests from HW5 for this. It is much easier to find and fix problems if you can isolate the potential bug to a small section of code you just added or changed. gdb and
printf
are your friends here to examine values while debugging. We also talked about debugging in lecture 12. -
You may find it very useful to include code that can print the contents of the trie in some understandable format. That way you can use these functions as your’re implementing and debugging your program.
-
Start with a small data file and figure out in advance what the resulting trie should look like. Then verify that the program does, in fact, create that trie.
-
To build the trie, you need some way to translate characters (primarily letters) from words in the dictionary file to the corresponding keypad digits. It is probably a great idea to include in your code a function that takes a character as an argument and returns the corresponding digit. This can be implemented with a series of if-elseif-else statements, but another way to do it is to have an array with one entry for each possible character. In the entry for each character, store the corresponding digit code. Then you can look up the code for a character without a complicated nest of if statements. Recall that in C, as in Java and other languages, a character can be used as a small integer. That is particularly helpful for exactly this kind of application, where we might want to use a character value as an index into a table of data.
-
Be sure to check for errors like trying to open a nonexistent file to see if your error handling is working properly. The comments in the provided
t9_lib.h
are there to help you – read them before implementing your functions! -
Once you’re done, you should read the instructions again to see if you overlooked anything.