Assignment 3 for CSE 373 (Winter 2010)
Document Distances
Due Monday, February 1 at 7:00 PM via Catalyst CollectIt.
CSE 373: Data Structures and Algorithms
The University of Washington, Seattle, Winter 2010
Overview: In this assignment you will create an implementation of a special dictionary abstract data type using AVL trees. You'll use the dictionary to represent the set of words and their numbers of occurrences in a document. If you perform the additional, optional parts of the assignment, you'll also be able automatically download web pages and compare them to one another (in terms of the statistics of their word usage).
Resources Provided: You'll start with the following resources:
0. Detailed description of the assignment.
1. A Java class called WebFile for downloading web pages.
2. A "Visual Data Structure Applet" for a default data structure, which is a 2D array. (Make sure you have the version updated Jan. 22, which makes it easier to implement the INSERT_TEXT command.)
3. A Java file "VisualStackApplet.java" that shows how to adapt the Visual Data Structure Applet for a custom data structure, in this case a stack.
4. A very brief introduction to document analysis, which explains the basic ideas behind the optional parts of this assignment. Note: In this assignment, the "reference vocabulary" referred to in this reading should be taken to be the union of the sets of words in the two documents being compared. Also, you are not expected to perform "stemming" as part of reducing documents. When processing web pages, on the other hand, it would be a good idea to eliminate very basic stop words such as "a", "an", "the", "and", "or", "I", "we", "you", "it", "they", as well as HTML tags.
 
Just as in Assignment 1, your program will inherit the following behavior: scrollable display area, textual command processor, buttons to control the overall execution of commands.
Turnin Instructions: Turn in all the Java source files needed to compile and run the most advanced version of your applet. This might be only the implementation of binary search trees, or it might be the AVL tree implementation, or it might be the works, with all optional features. Do not turn in .class or eclipse project files. Your classes should all be in the default package (no package name at the top).
 
Last updated: 31 Jan 2010 at 21:51.
 
Click Here To Submit.