(last updated 5/14 to fix bugs in the keypad and examples -- P is on the 7 key)
In this assignment, you will develop a more complex program using dynamic data structures. In doing so you will:
In this assignment, you will build programs to implement T9 predictive text,
a text input mode available on cellphones. On a cellphone, each number from
2-9 on the keypad represent three or four letters, the number 0 represents
a space, and 1 represents a set of symbols such as { , . ! ? } etc. The numbers
from
2-9
represent letters as follows:
2 ABC
3 DEF
4 GHI
5 JKL
6 MNO
7 PQRS
8 TUV
9 WXYZ
Since multiple letters map to a single number, many key sequences represent multiple words. For example, the input 2665 represents "book" and "cool", among other possibilities.
To translates from number sequences to words, we will use a data structure known as a trie. A trie is a multiway branching structure (tree) that stores the prefixes of sequences. As we travel down a path in the trie, we reach word sequences spelled out by the numbers along that path. Classic trie data structures have edges labeled with letters to store prefixes of strings. But for this application, we used a compressed trie that has only 10 possible branches at each node instead of 26, since the digits 0-9 represent the 26 letters, space and symbols. Because of this, an extra layer of complexity is needed to figure out the string represented by a path, which is the predictive component of T9.
For more information on the data structure trie, here is a link to the Wikipedia article. Here is a link to the powerpoint presentation given in class (pptx 2007 format; a ppt version is also available (but is much larger)).
Implement in C a program t9. The command
t9 FILE
should read in a dictionary file (FILE
) that contains a list
of words. Translate each word in the dictionary into its numeric key sequence,
then add the
key sequence to your trie, with the word at the end of the path corresponding
to the digits. If a word with the same numeric sequence already exists in
the trie, add the new word to the trie as a link to a new node with an edge
labeled '#' instead of one of the digits 2-9. (The words linked from a
node by the '#' edges essentially form a linked list of words that have the
same numeric code.)
For example, if the program reads the set of words "jello","rocks",
and "socks" from
the dictionary and adds it to an empty trie, the resulting trie should look
like this:
Once your program has read the dictionary and built the trie containing the words in it, it should start an interactive session. The user should be able to type numbers and the program should print out the word corresponding to the sequence of numbers entered. Your program should use the numbers typed by the user to traverse the trie that has already been created, retrieve the word, and print it to the screen. If the user then enters '#', the program should print the next word in the trie that has the same numeric value, and so forth.
As an example, if we run the program using the above trie,
an interactive session might look like this:
Enter "exit" to quit. Enter Key Sequence (or "#" for next word): > 76257 'rocks' Enter Key Sequence (or "#" for next word): > # 'socks' Enter Key Sequence (or "#" for next word): > 53556 'jello' Enter Key Sequence (or "#" for next word): > # There are no more T9onyms Enter Key Sequence (or "#" for next word): > 76257# 'socks' Enter Key Sequence (or "#" for next word): > 76257## There are no more T9onyms >4423 Not found in current dictionary. >exit
We provide you with two text files, smallDictionary.txt
and dictionary.txt
(right-click on the links to download the files). Each of these text
files contains a list of words to be used in constructing a trie
- the small one primarily for testing, and the large one for the
final program. Translate each
word
in
the
file
into
its
associated
T9 key
sequence, and add the word to the trie. In the case of multiple words having
the same key sequence k,
let the first word encountered in the text file be represented by the key sequence
k, the next encountered represented by k#, the next k##, etc. For example,
2273 can represent acre, bard, bare, base, cape, card,
care, or case. To disambiguate, acre would be represented by 2273, bard by 2273#
bare by 2273##, and so forth. When a user inputs a key sequence,
print
the appropriate word or words.
Your trie data structure should contain nodes to represent the tree, and strings (char arrays) containing (copies of) the words read from the dictionary file, linked to appropriate nodes in the trie.
Besides the general specification given above, your program should meet the following requirements to receive full credit.
Makefile
and use make
to compile
your program. Your Makefile
should only recompile the necessary
part(s) of the program after changes are made.malloc
to dynamically allocate the nodes, strings, and
any other data that make up your trie.malloc
and
return the storage with free
when you are done with it. The amount allocated should be based on the
actual size needed, not some arbitrary size that is assumed to be "large enough".stderr
and terminate if there
is no further work to be done.free
everything acquired with malloc
).gcc -Wall
on attu
. Your program should build without errors when
make
is used on attu
to run your Makefile
..As with any program you write, your code should be readable and understandable to anyone who knows C. In particular, for full credit your code must observe the following requirements.
.h
) file for the trie (and any other header files) should only
declare items that are shared between client programs that use the header and
the file(s)
that implement it. Don't include in the header file implementation details
that should be hidden. Be sure to use the standard #ifndef
preprocessor
trick so your header files work properly if included more than once in a source
file,
either directly or indirectly.void
functions that perform an action
without returning a value. Variables of local significance like loop counters
or indices should be given simple names like i
or n
and do not require further comments. Avoid names like count
or flag
that
provide no useful information - use names that suggests the values being counted
or
the condition that is represented.malloc
or free
-
these are expensive. Don't make unnecessary copies of large data structures;
use pointers. (Copies of
ints, pointers, and similar things are cheap; copies of large arrays
and structs are expensive.)struct
s) on paper or on a whiteboard.
Be sure you understand what you are trying to do before you start typing.printf
is your friend here to print values while debugging.if-elseif-else
statements, but another way to do
it is to have an
array with one entry for each possible character. In the entry for each character,
store the corresponding digit code. Then you can look up the code for a
character without a complicated nest of if
statements. (Recall
that in C, as in similar languages, a character can
be
used as a small integer. That is particularly helpful for exactly this kind
of application, where we might like to use a character as an index value
into a table of data.)A small amount of extra credit will be awarded for adding the following extensions to an already complete, working assignment. No extra credit will be awarded if the basic program is not fully implemented and substantially bug-free.
freq_Dictionary.txt
(right-click
to download) contains a list of words with, in addition, the frequency
of each word in ordinary
text. Use the information
in this file to construct your trie so the most likely words are reached
first.
If you include extensions in your program, you should also include a README file that describes what you added and how to demonstrate your addition(s) when your program is executed.
The sequences below can be used to validate your trie against the given dictionary. (updated 5/14 to fix a couple of bugs)
Create an uncompressed tar file containing your source code, Makefile
, and
(if you have it) your README file describing your extensions and submit that
using the normal class dropbox.