Project 2 - Version 2.0
History Sleuths!
Due Friday November 2nd



O. Changes Between Version 1.0 and Version 2.0

We have relaxed the requirements for Project 2. You should have received an e-mail about the changes. You can still read the original version of the project here. The changes are as follows:

If you have already invested a lot of time in the orginal version of project 2, don't panic! The additional step of outputting the file in frequency order is being offered as extra credit.

I. Introduction

You have just been approached by a world famous UW history professor. He would like you to settle a centuries-old debate on who wrote Shakespeare's plays, Shakespeare or Sir Francis Bacon? You protest that this question is surely outside of your area expertise. "Oh, no," chuckled the historian, stroking his snowy white beard. "I need a Computer Scientist!"
 

II. Word Frequency Analysis

The professor suspects that some authors use particular words more often than others. He hopes that if we study the frequency with which authors use words, we may be able to come up with a word usage "signature" for that author. This signature should be quite consistant across a particular author's works, but vary greatly between authors.

The professor wants you to take two works of Shakespeare (Hamlet, and All's Well That Ends Well), and two of Bacon (The New Atlantis, and The Essays), and count the number of times that each word occurs in each. He would like you write a program that takes two commmand line arguments. The first argument is the name of a text file to be parsed, and the second is a text file that your program will output. The output file will be in the following format:

1 "AS-IS".
1 "Defect"
1 "Pro-
1 "Project
1 "Right
2 "Small
1 "small
1 #1787]
1 &c.'
1 ''Tis
5 'A

...where the first string is the frequency that the second string occurs in the text. Strangely enough, the professor wants you to hand in this project using the turnin program to the CSE 326 database. He would like a copy of your source code, an explanation of how to compile your program, and a 1-2 paragraph answer to his question: based on the data you have accumulated,did Bacon write Shakespeare's plays? In order to answer this question, you may find it useful to use the sort utility as follows:

sort -nr output.txt > output-sorted.txt

This will take your output file (which we are calling output.txt here) and create another file (called output-sorted.txt) which will sorted by frequency! The new file will look like this:

970 the
708 and
666 of
632 to
521 I
466 a
444 my
391 in
383 you
358 Ham.

Note that you no longer have to do the sorting by frequency yourself, though that is offered as extra credit.
 

III. The Nitty Gritty - IMPORTANT!!!
 


IV. Files for Project 2
 

V. Extra Credit!


VI. Interesting Tid Bits