Due: Electronic turnin Monday Feb 1 at 5pm.
Printed receipt due in the box in Sieg by 10pm Monday 2/1 or may be submitted in section
Tuesday 2/2.
For this assignment, write a program that reads text and reports how frequently each word appears in the input. Your program should include a C++ class Dictionary that implements a dictionary data type to store the words and the number of occurrences of each word.
The main goal of this assignment is to learn how to create a class that contains member functions. A secondary goal is to learn how to use some basic functions in the standard C/C++ string library <string.h>, the character library <ctype.h>, and in <assert.h>.
For this assignment, use data structures with fixed sizes, not dynamically allocated arrays, linked lists, or trees. We haven't covered quite enough C++ yet to be able to write a proper class containing dynamic data. Also, the main program should read input from cin and write output to cout. We'll cover file I/O later in the course.
The main program should read a series of words and print a report of how many unique words appear in the input and how often each word appears. To simplify things, the input should contain no punctuation except for a single isolated period at the end. The words may contain upper and lower case letters; words differing only in capitalization are considered to be the same. The output should be a table of words sorted by frequency. Words appearing with the same frequency should be sorted alphabetically. For example, if the input is
The quick sly fox jumped over the lazy brown dog .
the output for this example should begin with
The input contains 9 unique words. Input words sorted by frequency, then alphabetically: 2 the 1 brown 1 dog 1 fox ...
You should implement a C++ class Dictionary to store the table of words and frequencies. The class should supply the following public operations.
Constructor | Dictionary() | Create empty dictionary |
Operations | add(w) | add new occurrence of word w to this dictionary |
size() | Return the number of words in this dictionary | |
Stream output | << | write dictionary to stream (details below) |
The stream output operator << should only write the contents of the dictionary, not the number of words in the dictionary or any other titles or descriptive information. The output should have one line for each word in the dictionary giving the number of occurrences of that word and the word itself.
The maximum length of a word in a Dictionary and the maximum number of words that can be stored should be specified with constants. You may make any reasonable assumption you wish about the values of these constants, but the maximum length of a word must be at least 20 characters (not including the \0 at the end of the C string) and that the maximum number of words should be at least 100.
The add operation should generate an execution error and terminate the program if an attempt is made to add a word that is too long to fit in the dictionary or to add a new word when the dictionary is already full. Use assert() for this.
You should read each word from cin into an array of characters that is substantially larger than the maximum size allowed for words in the dictionary. Reading a word containing, say, 65 characters should cause an assert error when an attempt is made to add it to a Dictionary, but it should not cause the program to crash because the array used in the cin operation had fewer than 65 characters. In picking a size for the input buffer, you may assume that no single input word will contain more than 199 characters.
One of the purposes of this assignment is to learn how to work with standard C/C++ null-terminated (\0) strings and the <string.h> library that supplies functions for these strings. Some of the more useful functions in that library are strcpy, strcmp, and strlen.
Recent versions of C++ include a new <string> library (without the .h), that provides a new implementation of strings. But it is important to learn how to use the original C/C++ <string.h> library, so, for this assignment, you must use the original <string.h> library, not the newer C++ library <string>.
Library <ctype.h> contains useful operations on characters, including functions that convert between upper- and lower-case letters.
In the implementation of class Dictionary, you will probably find it helpful to create additional, private functions to perform operations needed in the implementation. Examples might include a function that locates a given word in the dictionary if present, or one that sorts the dictionary into an order suitable for stream output. Use additional functions to organize your code into sensible, logical units. Don't mix unrelated code in a single function. For example, the stream output operation should call a separate function if the array needs to be sorted; it should not contain code to sort the array itself.
Your code should be written and commented clearly. Be sure to read the course style guide if you have not done so already.
If you wish, you can extend your program to, for example, properly handle input that contains punctuation. (Even if you do this, the end of the input should still be indicated by a single, isolated period.) No extra points will be awarded for any extensions, but you will have the reward of knowing that you have done a great job.
If you extend your program, be sure that your TA will be able to read and understand the work you hand in. If you're not sure whether a proposed extension is reasonable, please check with your TA in advance.
In any case, be sure that you get the basic assignment working properly before you attempt any extensions. Save a copy of your solution to the basic assignment so you can hand that in if you run out of time trying to get a fancier version working.
When you have finished, go back to the main homework 2 page and follow the directions for turning in your homework.