![]() |
CSE 143 Autumn 2000Homework 1Due: Electronic submission by 10 pm, Wednesday 10/4. Paper receipt due in quiz section on Thursday 10/5. |
For this assignment, write a program that reads a text file, counts the number of times each distinct word appears in the file, then prints a table of the most frequently occurring words, along with the number of times they appear. We will give you some advice and specify some of the data structures you are to use, but otherwise, you should create the program from scratch.
The purpose of this assignment is to review some ideas from previous courses and explore basic parts of C++. Concepts:
The program should ask the user to enter the name of a text file, read and count the number of times each word appears in the file, ask the user how much word/frequency information to display, then display the requested number of entries, starting with the most common word(s), sorted in descending order of word frequency. If two or more words in the list appear the same number of times, they should be further sorted alphabetically. Example:
Suppose the file demo.txt
contains the following text:
It REALLY is #$!*&%? odd, this assignment, isn't it?
An execution of the program using that file as data should produce the following
results (user input is in bold italics
; everything
else is generated by the program).
Please enter file name: demo.txt How many word/frequency pairs do you want? 20 2 it 1 assignment 1 is 1 isnt 1 odd 1 really 1 this
You can download this sample program and experiment on your own to see how it works.
It
, IT
,
it
, and iT
are considered to be the same.its
and it's
mean quite different things), but for this assignment the program should
ignore all punctuation marks (i.e., it's
and its
are both treated as being occurrences of its
).
"Punctuation" is defined for this assignment as any character c
for which the <cctype>
library function ispunct(c)
returns true.#$!*&%?
in the above example).demo.txt
in the example, not just demo
). If a simple file
name is entered, the file needs to be in the same folder as the Visual C++
project (or in the same folder as the demo program, if you're running the
sample program). If the program is unable to open the file (most
likely because it doesn't exist or its name was spelled incorrectly), it
should report the error and terminate.One of the objectives of this assignment is to learn how to use multiple source files to separate parts of the program into logical groups. This program is fairly simple, but it should be partitioned into a couple of files.
The key data structure needed in this program is a list of word/frequency pairs. This list is an array of structs, along with a count of how many pairs are currently stored in the list. Here are C++ type definitions for this data structure. You must use this data structure to store the list of word/frequency pairs in your program.
const int MAX_WORDS = 300; // Max # distinct words in list struct WordInfo { // information about a single word string word; // the word itself int frequency; // # of times it has appeared so far }; struct WordList { // list of words and frequencies WordInfo entry[MAX_WORDS]; // Word/frequency pairs are int nWords; // stored in list[0..nWords-1] };
Here's the key idea for splitting the program
into logical pieces. You will need functions in your program to initialize
the list to empty, record a new occurrence of a word in the list, sort the
list, print part or all of it, and possibly other operations. All of these
functions manipulate internal details of the WordList
data
structure.
The main program, on the other hand, should not depend on the exact layout or
representation of a WordList
. The main program should include
a WordList
variable to store the word/frequency information, then call
appropriate functions as needed to add words, sort, print, and so forth.
All of the functions that access internal details of the WordList
data structure should be placed in a separate source file. No functions in
other files should depend on any of the internal implementation details. You'll also need
to have an appropriate header file containing the above data definitions along
with function prototypes for functions that manipulate WordList
s.
This header file will be #include
d in both the file containing the WordList
functions, and in the file that contains the main program.
cin
, cout
, ifstream
,
etc.). Do not use printf/scanf/fscanf
or other C library
functions for I/O.#include <string>
), not the
C library functions strcpy
, strcmp
, and so forth.'\t'
) to line up the output in
columns. The sample program prints each frequency/word pair with a
single tab character in between.#include "file.h"
to include header
files that are part of your code. The angle brackets in #include
<library>
are used to indicate libraries that are part of
Visual C++ or other implementations.When you've finished your program, turn it in using this turnin form. Print out the receipt that appears, staple it, and hand it in during quiz section.