Homework 4 - History Sleuths!
Due by start of class (2:30 pm) on Wednesday, November 13

I. Introduction

You have just been approached by a world famous UW history professor. He would like you to settle a centuries-old debate on who wrote Shakespeare's plays, Shakespeare or Sir Francis Bacon? You protest that this question is surely outside of your area expertise. "Oh, no," chuckled the historian, stroking his snowy white beard. "I need a Computer Scientist!"
 

II. Word Frequency Analysis

The professor suspects that some authors use particular words more often than others. He hopes that if we study the frequency with which authors use words, we may be able to come up with a word usage "signature" for that author. This signature should be quite consistant across a particular author's works, but vary greatly between authors.

The professor wants you to take two works of Shakespeare (Hamlet, and All's Well That Ends Well), and two of Bacon (The New Atlantis, and The Essays), and count the number of times that each word occurs in each. The output should be in the following format:

970 the
708 and
666 of
632 to
521 I
466 a
444 my
391 in
383 you

...where the first number is the frequency that the second string occurs in the text. Strangely enough, the professor wants you to hand in this project using the turnin program to the CSE 326 database. He would like a copy of your source code, and a 1-2 paragraph answer to his question: based on the data you have accumulated, did Bacon write Shakespeare's plays?

III. The Nitty Gritty

On this project, your code will be expressedly analyzed for its efficiency. Since you are using large inputs, constants matter. That is, the difference between a 2n2 and a 4n2 algorithm will be noticeable. After your code is working, you should analyze each algorithm to make sure it is not doing unnecessary repetitive work, and then eliminate this excess. As always, correctness will count more than efficiency, so make sure your algorithms are correct before worrying about efficiency.

You will find that the text files we give you contain some extraneous information (e.g. copyright stuff). Don't waste your time trying to delete this, as it will have an negligable effect on the output.

Have fun, and uncover deep historical truths!


IV. Files and Sample Code

V. Extra Credit!


VI. Interesting Tidbits


VII. Credits

Inspiration for this assignment comes from the Winter 2002 offering of CSE 326.