The next programming assignment involves finding anagrams using recursive backtracking. The solution can be quite short, but it's logically interesting and is worth spending some time on.
An anagram of a phrase is a combination of words that contain all of the letters in that phrase. For example, one anagram for "rainstorm" is "smart iron". For the assignment, we want to write a program that finds anagrams of phrases that are given to it. Besides the phrases, the input to the program consists of a list or dictionary of words, and the anagrams are to be made from combinations of words in that list.
The basic idea is to go through the words in the dictionary and try all possible combinations, seeing if they produce an anagram for the phrase that has been entered. So, for example, "ant" is one possible word in a "rainstorm" anagram, since the letters a, n, and t appear in "rainstorm"; "zoo" is not, since "rainstorm" has only one o and no z's in it.
So we'll search for the first word in the dictionary whose letters are a subset of the letters in the phrase we're interested in. Then we need to continue our search by looking in the dictionary for words whose letters are a subset of the remaining letters in the phrase - the original phrase minus the letters that were contained in the first matching word. We successfully find an anagram when we find a word in the dictionary whose letters exactly match the remaining unmatched letters in the original phrase.
There are a couple of observations here: First, this is a backtracking problem. We start out by finding some word in the dictionary that matches part of the phrase. Then we recursively want to look for word(s) that match the remaining letters in the original phrase. When we've found an anagram, or when we've exhausted the word list, we go back to the previous step and resume the search for anagrams with the next words at that level. This is just like the 8 queens problem, except that we don't want to stop when we've found a single anagram. We want to search until we've found all matches.
The second observation is that to solve the problem, we need some way of figuring out what letters occur in the phrase, what letters occur in a paticular word, whether the letters in the word are a subset of the letters in the phrase, and, if so, what letters in the phrase remain to be matched to form an anagram. It turns out that the LetterInventory class from assignment 2 provides just what we need. Just like in the 8 queens problem, where we separated the logic of searching for a solution from the details of keeping track of the board, for the anagram problem, we can separate the search for an anagram (the backtracking algorithm) from the details of keeping track of letters (the LetterInventories).
Preprocessing
An exhaustive search for anagrams in a large dictionary is going to be expensive - there are a lot of possible combinations to try. In any problem like this there are often steps that can be done to substiantially reduce the amount of work needed. In your solution you are required to do two things to cut down on the computation needed to search for anagrams.
Iterators and for-each loops
We've stressed several times that in most cases, code should be written using general interface types like List instead of particular implementations like ArrayList, particularly in variable and parameter declarations. For a problem like this assignment, the dictionary (word list) might be stored in an ArrayList, or it might be stored in some other data structure. Our code ought to use a List interface so it works properly with any kind of list. But if we use methods like get(i) to access individual words, although the code will still work, it may be quite inefficient if the actual list structure is something like a LinkedList where get(i) is an O(n) operation instead of O(1).
For a small amount of extra credit, your are encouraged to use iterators to process lists in this assignment. Iterators are guaranteed to provide fast access to successive items, so they are appropriate for any type of collection. If you recall, to access the items in a collection, we get an Iterator object from the collection, then use its hasNext and next methods to get successive elements. For example, the following code would process the items in a list of strings:
List<String> words = ...; ... Iterator<String> iter = words.iterator(); while (iter.hasNext()) { String s = iter.next(); ... process string s ... }
This pattern is so frequent that in Java 5 a new version of the for statement (with ugly syntax) was introduced to make it easier to write loops like this. The above example can also be written as follows:
List<String> words = ...; ... for (String s: words) { ... process string s ... }
Underneath, the for-each statement (as it's called) is implemented with iterators,
just as in the previous code fragment. But it saves having to write out all
the details. It also works with arrays.