CSE 326 - Summer 2003 -- Homework #6

  Word Searching  

Assigned: Wednesday, August 6, 2003
 Electronic Turn-In Due: Tuesday, August 19, 11:00pm
Hard Copy Due: Beginning of Class on Wednesday, August 20
Sorry, NO LATE DAYS!!!!

 

Write a program to solve word searching puzzles, where words from a dictionary are to be found in a two-dimensional array of letters. For example, you might be given the following array:

 
                     m v j l i x a p e
                     j h b x e e n p p
                     h k t t h b s w y
                     r w a i n u y z h
                     p p f x r d z k q
                     t p n l q o y j y
                     a n h a p f g b g
                     h x m s h w y l y
                     u j f j h r s o a

Your program is to find words from the dictionary that can be spelled out in the puzzle by starting at any character, then moving in a straight line up, down, left, right, or on any of the four diagonals. For example, the above array contains the word algorithm because it can be spelled out by starting the lower right corner and moving up and to the left; and it contains the word syzygy because it can be spelled out by starting at the s on the third row and moving down.

The purpose of this assignment is to give you an opportunity to put to work the algorithms design skills that you have learned.  Thus, we will be saying very little about the algorithm for solving this problem, most of it is up to you!

Input format. Your programs should be run as follows:
               java –classpath . WordSearch1 array.txt dictionary.txt

array.txt is the name of the array of all lowercase letters.  We’ve provided some examples in /projects/cse/courses/cse326/03su/hw6. This file contains the value of N in the first line, followed by N lines of N characters each (with characters separated by spaces).  The aforementioned directory also contains a file wordPuzzleGen.java for creating more such files for testing.

dictionary.txt is a list of the words to search for, for instance, /projects/cse/courses/cse326/texts/words.txt.  You should convert each word to all lowercase (see String.toLowerCase() ).  You may ignore any words of less than four letters or that contain punctuation. You may assume that the dictionary is given to you in alphabetic order (after converting each word to all lowercase).

Output format. Print out every dictionary word (one per line) that is contained somewhere in the puzzle. If a word appears in the puzzle more than once, print it only once. You may print the words in any order.

Requirements. In order to gain experience with hashing and with comparing different implementations, you are required to write two separate solutions to this problem and compare them:

1.      WordSearch1.java – a solution that somehow involves hashing in any form.

2.      WordSearch2.java – a solution that does not make use of hashing.

You may certainly share code between your two solutions but they should represent different approaches to the problem.  You may also re-use any code that you have previously written for this course, whether individually or as part of a group.

Approach. We have covered a number of different algorithms that can be used effectively to solve this problem. Your task is to study the problem; come up with two possible strategies for solving it using some of the algorithms and data structures that you have learned; implement and test your solution; and then explain and defend your design in the writeup. A good writeup is particularly important for this assignment, because we expect to see a variety of solution strategies. You need to clearly explain not just what you did, but also why you did it, backed up with estimates of resource costs (time and space). Also, you should clearly explain whatever limitations you need to place on the puzzle size N and the dictionary size M. As usual, your goal is to find a decent method which works within reasonable bounds. You need not waste effort squeezing time and space out of a bad algorithm or implementing complex or highly tuned code. This problem can be solved efficiently with 200 or so lines of code (including comments and basic error checking).

General Notes and Advice

1.      This is an individual assignment, you must write all the code yourself.  As always, you may discuss general approaches with other students, but be sure to credit them in your README.

2.      We highly recommend that you carefully read this document twice through before starting any code.

Going Above and Beyond

You may do any, all, or none of the following. They will be worth a nominal amount of extra credit.

  1. Whup Luke. I’ve dug up my old solution to this problem from my undergraduate days. Think you can beat it? Well, give it a shot. You can try out the real thing (and compare your own answers) by running something like:
          java -classpath /cse/courses/cse326/03su/hw6/hw6sample.jar WordSearch array120.txt words.txt
    To time this without wasting time printing to the console, use something like:
          time java -classpath /cse/courses/cse326/03su/hw6/hw6sample.jar WordSearch array120.txt words.txt >& /dev/null
    There are two possible ways to whoop Luke (for credit, that is):
          a.) Smokin’ – your solution is consistently faster than Luke’s (which takes about four seconds on fiji for array120.txt and words.txt).
          b.) Stylin’ – your solution is so sleak and elegant that, while still very fast, it uses much less memory than Luke’s (which runs out on array300.txt). Demonstrate this by running quickly on much larger problem sizes (like array400.txt).
    To enter: Be sure to describe in your README which of your programs you think is better than Luke’s and why.
    Note: No promises are given about how easy or difficult this may be. Don’t waste your time trying to tweak the code too much – the underlying algorithm is the most important.

  2. Go 3-D. Modify your program to, instead of reading in N lines of N characters, read N2 squared lines in, and treat the whole file as a N x N x N array (treat the first N lines as z=0 the x/y/z space, then next N lines as z=1, etc.). Then extend your program to allow words to be formed by moving in any or all of the three dimensions. Be sure to turn in some test cases to try out. In your README,        
          a.) In 2-D, there are eight possible directions to move in to find a word. How many are there in 3-D?
          b.) Describe what you did.  Was this a simple extension of your program, or did the new dimension call for different data structures?

Turn-In Instructions

You should have the files WordSearch1.java and WordSearch2.java, plus associated helper files. Turn in electronically all of your source files (use *.java) plus your usual README. Follow the usual instructions.  This assignment is called “hw6” in the turn-in system. Your README should contain:

  1. Your name and section.
  2. (Answer this question before you begin) How long do you think this project would take you to finish?
  3. How much time did you actually spend on this project?
  4. Acknowledgement of any assistance you received from anyone but your team members, the 326 staff, or the Weiss book.
  5. An explanation of how to compile your program, if anything special is needed.
  6. A brief description of how your project goes "above and beyond" the basic requirements, if it does.
  7. An explanation and justification of the data structures that you used to solve this problem, as described in the “Approach” section above.
  8. Which of your two approaches is faster?  Why?
  9. What did you enjoy about this assignment?  What did you hate?  Could we have done anything better?

Bring to class the printouts of your source files as well as your README.  Please put your README file on top so it easy to find, and number the answers to the above questions.

Credits

This assignment and much of the text is based on an assignment from CSE 226 at Princeton University.

A Reminder

For unknown reasons, that literature professor from the last assignment is still involved in your life, and still demands that all the data structures you write be your own. She has seen lots of them on the web and in other places, but those simply don't cut it! Please be sure you do not reference library data structures or copy (even partially) any code that is available from a variety of sources. The goal of this class is for you to learn how to implement and use each data structure, not simply to copy and use.

About code from the textbook: Ideally, we would like you to write the code on your own based on the descriptions/examples of the ADT operations provided in the lectures and the textbook. In this case, you could look up the code given in the textbook whenever you run into problems. Doing it this way will help you to learn the transition from descriptions of ADT operations to their actual implementation, which is one of the main goals of a Data Structures course. However, we will not take off any points if you use any of the book code as a template -- we will simply assume that you fully understand what the code is doing before using it (you may be asked to write some code in the midterm/final!). If you use any of the book code, make sure you explicitly state which parts are from the book in your README.