Synopsis

In this assignment, you will implement the T9 predictive text library you wrote tests for in HW5. Here again is the map of the letters each number can correspond to:

Key Letters
2 ABC
3 DEF
4 GHI
5 JKL
6 MNO
7 PQRS
8 TUV
9 WXYZ

You can refer back to the HW5 spec for examples and further explanation.

To translate from number sequences to words, we will use a data structure known as a trie. A trie is a multiway branching structure (tree) that stores the prefixes of sequences. As we travel down a path in the trie, we reach word sequences spelled out by the numbers along that path. For this application, we use a compressed trie that has 8 possible branches at each node, one for each possible digit (2-9).

Writing the code might be easier if nodes have 9 or 10 children numbered 0-9, since then subtree number n corresponds to digit n (either nodes 0 or 1 would be used to represen the “#” branch). Feel free to use either representation for the trie depending on which seems simpler to implement.

For more information on the trie data structure, you can refer to our lecture slides and this Wikipedia article.

Technical Requirements

Implement the T9 library you tested in HW5. Examine t9_lib.h from the starter code, implementing each function in a corresponding t9_lib.c file, and copy in t9_tests.c from your HW5 submission to test your implementation.

Each node is to contain a word which terminates at that node. If there is already a word at the leaf node for the same pattern, build up a linked list of nodes using the “#” branch. For example, if the program reads the set of words “jello”,”rocks”, and “socks” from the dictionary and adds it to an empty trie, the resulting trie should look like this:

Trie example on jello, rocks, and socks

We have provided a demo program (t9_demo.c) which utilizes the library you will be implementing. When provided a dictionary file (eg, ./t9_demo small_dictionary.txt), you should be able to look up words using their T9 sequence.

As an example, if we run the program using the above trie, an interactive session might look like this:

Welcome to T9!
Type "exit" and press Enter to quit.
Enter key sequence:
> 76257
rocks
Enter key sequence:
> 76257#
socks
Enter key sequence:
> 53556
jello
Enter key sequence:
> 53556#
No entry for 53556#.
Enter key sequence:
> 76257##
No entry for 76257##.
Enter key sequence:
> 4423
No entry for 4423.
Enter key sequence:
> exit

We provide you with two text files, small_dictionary.txt and dictionary.txt. Each of these text files contains a list of words to be used in constructing a trie - the small one primarily for testing, and the large one for the final program.

Your library should translate each word in the file into its associated T9 key sequence, and add the word to the trie. In the case of multiple words having the same key sequence k, the first word encountered in the text file should be represented by the key sequence k, the next encountered represented by k#, the next k##, etc. For example, 2273 can represent acre, bard, bare, base, cape, card, care, or case. acre would then be represented by 2273, bard by 2273#, bare by 2273##, and so forth.

Your trie data structure should contain dynamically allocated nodes to represent the tree, and strings containing copies of the words read from the dictionary file, linked to appropriate nodes in the trie.

Besides the general specification given above, your program should meet the following requirements to receive full credit.

  • You must write a Makefile which compiles your library. Your code must compile and run without errors or warnings when compiled with your Makefile on seaside. Your Makefile must invoke gcc with the -Wall and -std=c11 options. In particular, we require the following make targets:
  1. t9_lib.o: Compiles, but does not link, your t9 library code.
  2. t9_demo.o: Compiles, but does not link, the t9 demo program.
  3. t9_demo: Links your t9_lib.o and t9_demo.o into an executable called t9_demo. Does not actually run the program.

Tip

You can use the Makefile from HW5 as a hint on how to write the Makefile.

  • Files that need to be edit:
  1. t9_priv.h: You will need to declare and define the T9 struct here.
  2. t9_lib.c: You implement the functions that are defined in t9_lib.h.
  3. t9_tests.c: You will want to replace this file with your test suite from HW5.
  • Use malloc to dynamically allocate the nodes, strings, and any other data that make up your trie. You may assume that no word (and therefore no numerical input) is longer than 50 characters; you should otherwise be able to process word lists of arbitrary size.

  • If an error occurs when opening or reading a file, your library function should return NULL.

  • Your library should exhibit no memory leaks or other memory errors when the demo program is run using valgrind. If memory leaks are detected, valgrind’s --leak-check=full option will be useful to generate more extensive messages with information about the memory leaks.

Caution

valgrind slows down execution considerably! It will take several minutes to load the full dictionary.txt file and then free the resulting tree under valgrind. We suggest you use smaller input files during development to test for memory problems with valgrind.

Test Sequences

The sequences below can be used to validate your trie against the supplied dictionary.txt file.

  1. 22737: acres, bards, barer, bares, barfs, baser, bases, caper, capes, cards, carer, cares, cases
  2. 46637: goner, goods, goofs, homer, homes, honer, hones, hoods, hoofs, inner
  3. 2273: acre, bard, bare, barf, base, cape, card, care, case
  4. 729: paw, pax, pay, raw, rax, ray, saw, sax, say
  5. 76737: popes, pores, poser, poses, roper, ropes, roses, sords, sorer, sores

Code Quality Requirements

As with any program you write, your code should be readable and understandable to anyone who knows C. In particular, for full credit your code must observe the following requirements.

  1. You should check your code with clint.py and ensure there are no style errors. Specifically, you should remove all the TODOs once you finish implementing T9.
  2. Use appropriate names for variables and functions. Avoid names like “var” or “flag” that provide no useful information, though counter variables in for loops are okay.
  3. No global variables. Use parameters (particularly pointers) appropriately. Global constants are okay, however.
  4. Avoid making excessive copies of large data structures. Consider using pointers instead.
  5. If you need to create a copy of a string or other variable-size data, you should dynamically allocate an appropriate amount of storage using malloc and release the storage with free when you are done with it. The amount allocated should be based on the actual size needed, not some arbitrary size that is assumed to be “large enough”.
  6. Use standard C library functions where possible; do not reimplement operations available in the standard libraries.
  7. You must check the return status (result code) of every library function you call to be sure that no errors occurred. In particular, malloc will return NULL if it is unable to allocate storage. Although this is extremely unlikely to happen, a robust program must check for the possibility and react appropriately if it does.

Implementation Hints

You should consider writing your Makefile and bringing in your test suite from HW5 first.

There are a lot of things to get right here; the job may seem overwhelming if you try to do it all at once. But if you break it into small tasks, each one of which can be done individually by itself, it should be quite manageable. For instance, design the structure of the trie and the structs (and struct fields) you need. Figure out how to add a single word to the trie (AddWordToT9) before you implement the logic to process all of the words in the dictionary (InitializeFromFileT9). The fgets() function may be helpful for reading in the dictionary line by line. Figure out how to add a few words that have different numeric codes before you handle words that have the same codes. Implement the code to traverse the trie to translate an input key sequence into the corresponding word (PredictT9) once you’ve built the trie, not before.

Before you start typing code into the computer, spend some time sketching out data structures and code (particularly trie node structs) on paper or on a whiteboard. Be sure you understand what you are trying to do before you start typing.

Test your code frequently – you can use both the demo program and your tests from HW5 for this. It is much easier to find and fix problems if you can isolate the potential bug to a small section of code you just added or changed. gdb and printf are your friends here to examine values while debugging. We also talked about debugging in lecture 11.

You may find it very useful to include code that can print the contents of the trie in some understandable format.

Start with a small data file and figure out in advance what the resulting trie should look like. Then verify that the program does, in fact, create that trie.

To build the trie, you need some way to translate characters (primarily letters) from words in the dictionary file to the corresponding keypad digits. It is probably a great idea to include in your code a function that takes a character as an argument and returns the corresponding digit. This can be implemented with a series of if-elseif-else statements, but another way to do it is to have an array with one entry for each possible character. In the entry for each character, store the corresponding digit code. Then you can look up the code for a character without a complicated nest of if statements. Recall that in C, as in Java and other languages, a character can be used as a small integer. That is particularly helpful for exactly this kind of application, where we might want to use a character value as an index into a table of data.

Be sure to check for errors like trying to open a nonexistent file to see if your error handling is working properly. The comments in the provided t9_lib.h are there to help you – read them before implementing your functions!

Once you’re done, you should read the instructions again to see if you overlooked anything.