CSE 303, Autumn 2009
Homework 2B: anagram++
Due Tuesday, October 20, 2009, 11:30 PM
50 points total
For Part 2B, you have a choice of what to do. Also, since I am
announcing this two days late, it will now be due on Tuesday, October
20, 2009 @ 11:30PM.
Choice I: for those of you who
(probably) were writing scripts for the first time and feel that you
"got it done" but don't feel that it's clean enough, good enough, fast
enough, etc.
Choice II: for those of you
who don't feel that there is much improvement left in your 2A scripts.
Both choices have the same deadline, earn the same points, etc.
Choice I:
The objective of this choice is for you to do what we as programmers
should do more often -- rewrite our programs, because the process of
writing them the first time teaches us better ways to write them again
later on.
Rewrite -- I recommend from scratch -- your solution to 2A. Added
requirements:
- It must deal consistently with non-alphabetic characters under
the following rules
- If allgrams.sh
receives anything other than precisely one parameter, where that
parameter contains only alphabetic characters, it should return "Error:
ill-formed input"
- Well-formed parameters match precisely those entries in the
dictionary that have the same alphabetic characters (in any order)
ignoring case (and contain only alphabetic characters -- that is, no
dictionary entry with any non-alphabetic character can ever be returned
as an anagram). Although the anagram match itself is
case-insensitive, the original dictionary entry must be output with its
case unchanged. That means that if there are two entries in the
dictionary that are identical except for case, both would be returned
as anagrams of any matching parameter.
- You must revise test-allgrams.sh to demonstrate each of the
varying cases: normal usage, ill-formed input, etc. Your test
cases must show that your program respects, precisely, the rules above.
- You must include, as a text file named differences.txt, a description
of the key changes you made from your original submission for 2A to
your new solution for 2B. At the highest level, I want you to
explicitly convey what you learned in rewriting the scripts. You
should include any changes in your basic approach (algorithm), why you
made different decisions, and what you think are the biggest
improvements in the new version.
- Note: if you had a strong solution to 2A, patching it to deal
with the (relatively) minor added requirements will not likely give you
enough to explain in this differences file -- so you should consider
doing Choice II, in those cases.
- Excessively slow performance -- roughly, over a minute to process
a normal input properly -- should be eliminated. If you cannot
achieve a minute, then you must explain (in differences.txt) why you think
the script is slow.
Grading will be based on a combination of
- the correctness of your scripts both for normal and ill-formed
input
- on the clarity of your script
- on the efficiency of your script
- credit will be deducted for programs slower than described
above, although credit will be added in those cases for good
descriptions of why performance is poor, and
- no added credit will be given for programs faster than
described above
- and on the quality (not quantity) of your differences.txt description.
Choice II:
The objective of this choice is for you to push your shell script
programming skills farther. Look at the site
http://wordsmith.org/anagram/ -- which creates anagrams of multiple
words from multiple words. For example, if you enter "computer
science" you get a (large) number of anagrams including the following
(for which I've reordered a few words for fun):
Comic Cup Entrees
Necrotic Scum Pee
Teen Cop Rice Scum
Your assignment is to provide, in batch form, the same basic service as
this anagram server. That is, you must accept multiple words as
input and find anagrams comprising multiple words in the
dictionary. I have not done
this, so it might be harder than I think (and I don't think it's
trivial). You should use the same dictionary as in 2A (I couldn't
find quickly what dictionary this anagram server uses), and you should
follow the non-alphabetic character rules listed in Choice I
above. (You will, of course, allow multiple parameters as input
to your script, to enter multiple words.) Call your script multigrams, and provide a test-multigrams script for
testing.
Grading will be based on a combination of
- the correctness of your scripts for normal input (don't worry
about handling ill-formed input)
- on the clarity and documentation of your script
- and to a much lesser degree, on the efficiency of your script --
the server itself is slow in many situations.
You are allow to add extra bells and whistles, such as those allowed in
the advanced query engine on the server
(http://wordsmith.org/anagram/advanced.html). However, bells and
whistles that are included at the cost of basic features will hurt your
grade, not help it.
Turn-in information
Both choices will be turned in via Catalyst (for which I will try to
get the link right this time). Follow the same directions as in
2A for creating a hw2b.tar.gz
file to submit.