![]() |
CSE 143 Autumn 2000Homework 2Due: Electronic
submission by 10 pm, Wednesday, Oct. 11.
|
For this assignment, convert Homework 1 into a C++ program that includes classes and member functions for the main data structure (the WordList). In addition, change the main program so it reads an HTML file instead of a plain text file, but only counts ordinary words in the file, ignoring HTML tags, comments and special symbols.
The purpose of this assignment is to gain experience with the following new concepts:
The program should ask the user to enter the name of an HTML file, read and count the number of times each plain word appears in the file, ask the user how much word/frequency information to display, then display the requested number of entries, starting with the most common word(s), sorted in descending order of word frequency. If two or more words in the list appear the same number of times, they should be further sorted alphabetically. HTML tags should be ignored, as should HTML comments and special symbols. The .html file should be read from the local disk - you are not expected to write a program that opens an http connection to a web server and downloads the file.
Example: Suppose the file test.html
contains the following text:
<html> This is a sample HTML file. Isn't it similar <!-- this is a comment --> to a normal file with a few &special; things that are new. Like < this is not a horse > tags. </html>
An execution of the program using that file for input should produce the following results (user input is
in bold italics
; everything else is
generated by the program).
Please enter file name:test.html
How many word/frequency pairs do you want?100
Total Number of Words: 19 3 a 2 file 1 are 1 few 1 html 1 is 1 isnt 1 it 1 like 1 new 1 normal 1 sample 1 similar 1 tags 1 that 1 things 1 this 1 to 1 with
You can download this sample program and experiment on your own to see how it works.
A key objective of this assignment is to gain experience with C++ classes. You are required to replace the WordList structure from Homework 1 with a proper C++ WordList class. This class should include member functions that perform appropriate operations on WordLists. Be sure that the representation of a WordList is private, and not accessible outside member functions of the WordList class. Any member functions that are not part of the public interface should also be private. Create an appropriate header file containing the class declaration and a companion C++ source file that contains the implementation of the WordList member functions.
Among other private data, the WordList class will contain an array of word/frequency pairs. These pairs can be implemented with a struct, as in program 1, or, if you wish, converted to a class. However these pairs should remain a simple data structure, not a complicated class with lots of member functions.
When you've finished your program, turn it in using this turnin form. Print out the receipt that appears, staple it, and hand it in during quiz section.