CSE 373 Winter 2017: Homework 3 - Text Associator

Due: Friday, February 10th, 11pm

Overview

Overview:

Your task is to write a general tool that will allow you to maintain one-directional word associations. In order to do so you will be implementing a TextAssociator, which maintains word associations as a built-from-scratch HashTable of WordInfo objects. Your final implementation will be a self-resizing, non-generic HashTable, using separate chaining as a collision resolution.

Once you have completed your implementation you will test the functionality and behavior of your TextAssociator by writing client code that leverages your collection of word-associations.

You have been given skeleton code to assist you in the design and structure of this data structure, however many design choices will be left up to you.

Suggested Steps:

  1. Download the skeleton code.
  2. Implement the method stubs in WordInfoSeparateChain class inside of TextAssociator.java
  3. Implement the method stubs in TextAssociator.java
  4. Run ThesaurusClient.java as part of the writeup
  5. Write your client code in MyClient.java
  6. Complete the rest of the writeup
Provided Files

Provided Files:

You can download all the files in cse373_17wi_hw3_supplied.zip.

  1. WordInfo.java.

    WordInfo represents a relationship between a source String and a collection of Strings it should be associated with. This class is fully implemented for you. You will use this class in your WordInfoSeparateChain class.

    word info relationship example

  2. TextAssociator.java.

    TextAssociator is Dictionary implemented using hashing in order to maintain associations between Strings. TextAssociator.java is mainly skeleton code for you to implement. You are to use an array of WordInfoSeparateChain objects to maintain your hashing structure of your TextAssociator. Certain methods in WordInfoSeparateChain private inner class also need to be implemented.

  3. ThesaurusClient.java.

    ThesaurusClient uses your TextAssociator to maintain associations between words and their synonyms. Included in the provided files, you will find simple_thesaurus.txt, and large_thesaurus.txt. These two files specify the relationship between words and their synonyms. Each line has a list of comma-separated words. The first word in each line is the source word, and the remaining words on that line are the synonyms. You can specify which thesaurus file you want to use by updating the THESAURUS_FILE class constant.

  4. simple_thesaurus.txt. The simple thesaurus text file to be used with ThesaurusClient.
  5. large_thesaurus.txt. The large thesaurus text file to be used with ThesaurusClient.

In your eclipse workspace, to obtain the required files:
      file >> import >> general >> Existing Projects into Workspace >> Select archive file >> cse373_17wi_hw3_supplied.zip

Functionality

Required Public Functionality

WordInfoSeparateChain private inner class

public boolean add(WordInfo wi) {
Adds a new WordInfo to the WordInfoSeparateChain. If the WordInfo already exists, you should do nothing and return false. Otherwise, add it and return true.

public boolean remove(WordInfo wi) {
Remove the given WordInfo from the WordInfoSeparateChain. Return true if it is removed, false if it did not exist. This should remove the entire WordInfo object

TextAssociator class

public TextAssociator() {
Constructor for a new TextAssociator. This should initializes all required fields. You can choose your initial capacity.

public boolean addNewWord(String word) {
Adds a new word to the TextAssociator. If the word already exists, you should do nothing and return false. Otherwise, add it and return true. One case to keep in mind is if the appropriate index in the array index doesn’t yet contain a WordInfoSeparateChain, then you will have to construct a new separate chain.

public boolean addAssociation(String word, String association) {
Associates the given word with the given association. If the given association already exists with the word, or if the word does not already exist in the TextAssociator, return false. Otherwise, add it and return true. Associations are one directional (i.e. add an association from word->association, not vice versa).

public boolean remove(String word) {
Remove the given word (and subsequently all of its associations) from the TextAssociator. Return true if it is removed, false if it did not exist. This should be removing an entire WordInfo object (not just an association).

public Set getAssociations(String word) {
Return a set of all the associations from the given word, or null if the word does not exist.

Note: You may add other methods to these two given skeleton code classes as you see fit. Make sure the visibility of these methods (private, public, etc) makes sense.

Implementation Notes

Implementation Notes

You will notice that the public interface to the client deals strictly with Strings, your WordInfoSeperateChain stores WordInfo objects behind the scenes. It will be your job to convert between Strings and WordInfo objects for varying method calls.

Your TextAssociator must be able to store an arbitrary number of WordInfo object. Remember that because we are using separate chaining, in theory we would never have to resize our array. However, as discussed in class, having a large load factor can start to degrade your runtime for many operations. When the load factor for your TextAssociator reaches a certain threshold, you should resize your internal Array.

Design Decisions:

  1. What should the starting capacity of your TextAssociator be?
  2. At what load factor should you expand your internal capacity? Remember that you must recalculate the destination of each WordInfo object when you expand your array.
  3. What should the new size of your array be?

Style:For this assignment we will be grading similarly, but a little more strictly for things like style and efficiency than we did on HW #2. Make sure you are commenting your code, looking for redundancy, structural mistakes, formatting of your code, efficiency of your algorithms, and preserving the abstraction of your classes through encapsulated fields, private methods when appropriate, and appropriate level of commenting on public functionality.

Your WordInfoSeparateChain is a private inner class. Clients of this program should never interact directly with an Instance of this class, and should not know that it exists (neither in comments or public interface).

Feel free to add additional public functionality that you think would be useful for a client (this is specifically applicable for the client code part of this assignment (see below).

There will be some redundant code between your public methods; it might help to make some private helper methods to clean up your code.

Mutability: Be careful with methods such as WordInfoSeparateChain’s getElements(), or WordInfo’s getAssociations(), as they return references to their internal fields. This means a client (in this case, YOU) can directly modify the fields. While this may be helpful, you should be careful what operations you perform, so you don’t introduce bugs into your code. You do not have to worry about preserving the abstraction from your Client through returning aliases to internal objects. You do not have to deep or shallow copy objects when returning them to the Client. You can trust the Client will not do something illegal to your internal objects with the aliases you return to them. If you want to protect your internal fields, you are allowed to practice perserving the abstraction and making copies.

Hash Function: You will need to implement a hash function in your TextAssociator to map any given input into an index in your hash table. You can do this however you'd like without importing any new Java libraries into your TextAssociator. You can use Java String's hashCode function, you can use the provided WordInfo hashCode, or something else that you think is better. You'll be discussing this decision in the write-up.

Implementation Hints

ThesaurusClient

Running Thesaurus Client

Once you think your implementation of TextAssociator is working, you can test it by running ThesaurusClient. A very simple test case to verify that your TextAssociator is working properly is to run this program with “simple_thesaurus.txt”. This text file contains a handful of words that will be replaced with synonyms when you run the program. A simple test case is to input the following:

     Input: “my code is really good and I am very smart”
     Output: “my code is absolutely marvelous and I am bona fide brainy”

From this example you can see that “really” was replace with “absolutely” because our text file specified that “absolutely” was the only synonym for “really”. You can also see that our simple_dictionary.txt did not specify any synonyms for the word “code”, so it was left unreplaced in our output.

Once you are convinced that your TextAssociator is working properly (and properly resizing), run the program and input the following sentence exactly as follows (using large_thesaurus):

                “hello world it is fun to write code and have fun with data structures”.

Include the response from in your write-up. Keep in mind your sentence will be probably be nonsensical and will be different each time you run it. Include your favorite in the write-up ☺

*Note*: ThesaurusClient is not very robust and doesn’t work very well with punctuation

MyClient

Your Client Code

Now that you have seen how one client could use your TextAssociator, your next task is to create your own client code that will use your TextAssociator in a different way. You will create a file named MyClient.java that has the following requirements:

  1. Initializes and populates a TextAssociator object with at least 20 associations
  2. Uses said associations to accomplish some goal (i.e. must be making calls to .getAssociations())
  3. Outputs some text to System.out explaining what your client code is doing, etc.

You can use ThesaurusClient as a helpful example in writing your client code if you would like. Please write a very explanative class comment on you're MyClient and in your write-up explaining what your client does and how your TextAssociator was used in order to accomplish this goal.

If you are having trouble coming up with ideas, think about the following: -spellchecker, contact list, auto-complete tool, etc.

Write-Up

Write-Up Questions

  1. What is your favorite sentence you generated with the ThesaurusClient as described above in how to run the ThesaurusClient?

  2. For each of the three Design Decisions mentioned above in Implementation Notes, please discuss possible options that you considered, what you ended up choosing, and why.

    For reference, the design decisions were:
    1. What should the starting capacity of your TextAssociator be?
    2. At what load factor should you expand your internal capacity? Remember that you must recalculate the destination of each WordInfo object when you expand your array.
    3. What should the new size of your array be?

  3. What hash function did you choose for your TextAssociator (i.e. did you did you use String’s hashcode method, did you use the WordInfo's hashCode method, did you incorporate your table size?) Why was this hash function effective, are there alternative hash functions that you considered? Also, please evaluate the hashCode inside of WordInfo, even if you did not use it. Do you think it is a good hashCode and why?

  4. We chose to implement this TextAssociator with separate chaining, if you were instead going to use a different collision resolution scheme, what would you choose? How and where would your code change? Give several specific examples to illustrate your understanding.
Turn-In

What to turn in:

Due: Friday, February 10th, 11pm

You should submit a zip file to the Canvas dropbox with the name cse373_17wi_hw3.zip. This zip should include the following files. Files:
  1. TextAssociator.java
  2. . Your implementation for the TextAssociator including your implementation of the private inner class, WordInfoSeparateChain.
  3. MyClient.java, Your client code described above.
  4. Electronic version of your project report in pdf, MS word, or text format (please check with us first if you need to submit in another format).