CSE 341 -- Programming Languages

Autumn 2003

Department of Computer Science and Engineering, University of Washington

Steve Tanimoto (instructor)

Assignment 6

Version 1.0 of 17 November

The Empty Oyster

Due date and time: Wednesday, November 26: Web-based turn-in at 11:59 PM.) .

Title: The Empty Oyster.

Purposes: Gain experience with the Perl language. Create a web-based CGI program.

Individual Work: Do this assignment individually.:

Instructions: In this assignment, you will provide an "oyster shell" online where people with "pearls of wisdom" can deposit them. There are three parts to this assignment. The first part is to create an HTML web page with a form on it, so that a user can go to that page in a browser and enter some information. The second part is to create a Perl script that processes the form data and responds to the user by creating a new web page. The third part is to get 3 friends in the site to try your program.

Part I: First, create a web page on the CSE departmental host named Abstract. You should create a subdirectory of your www directory with the name cse341 (lower case). Then put your web page in this directory with the file name oyster.html.

The web page should contain some text asking the user to enter a "Pearl of knowledge" consisting of a few sentences of text. This might be their outlook on life, or a description of a memorable experience. There should be, in addition to the text area for the "pearl", a text field where the user is prompted for her or his name. There should also be a Submit button, so that the user can submit it and receive some feedback.

Part II: In this part, you write a Perl script that processes the text coming in from your input page. Use the CGI Perl module to make it easy to get the user's text. Get the name of the user and filter out any non-alphabetic characters from it using a regular expression (this is partly for security reasons, and partly to help you keep your file directory empty of oddly named files.)

Your program should do three things with the user's Perl of knowledge. It should save it in a file having the user's name as the first part of the filename, and having the extention .txt. It should then analyze the text by comparing its word usage with that of the previous 5 users' text. (So you will need to keep another file around that keeps track of the who the last five users have been.) Finally, it should report on the results by printing out HTML for a web page that gives the following: The user's name, The user's input text, the names of the five most recent previous users, and for each of these: an analysis of words shared with the new input text, words not shared, and a scalar measure of similarity between the two texts. It should conclude by displaying the 5 previous texts, in order of decreasing similarity to the new text, on the web page.

Part III: In this part, get at least 3 friends to enter their Pearls of Wisdom into your site, so that there is some good data there before your program is graded.

Extra Credit: Add a feature to your Perl script that tells the user's fortune, based on the comparison of the user's text with that in 3 to 5 built-in documents. You may do this by determining with built-in document is closest to the user's, and then printing out a canned fortune that corresponds to the closest document.

Evaluation: Evaluation: Part I: 10 points. Part II: 30 points. Part III: 10 points. Extra portion: 5 additional points.

Helpful Hints: When comparing two text documents T1 and T2, create for each document a hash in which each word is a key and the associated value is the number of times that word occurs in the document. Let n1 and n2 be the total number of word occurrences in T1 and T2 respectively. Let Wc be the set of words occuring in both T1 and T2, and for each w in Wc let min(w) be the minimum of the numbers of occurrences of w in T1 and T2. Let m = total of min(w) for all w in Wc. Then we can define a measure Sim(T1,T2) of similarity between T1 and T2 to be the value of 2 * m / (n1 + n2). Note that if T1 and T2 have no words in common, then Sim(T1,T2) will equal 0. Also, if T1 = T2, then Sim(T1,T2) will equal 1.