Part I:
First, create a web page on the CSE departmental host named
Abstract. You should create a subdirectory of your www directory
with the name cse341 (lower case). Then put
your web page in this directory
with the file name oyster.html.
The web page should contain some text asking the user
to enter a "Pearl of knowledge" consisting of a few
sentences of text. This might be their outlook on life,
or a description of a memorable experience.
There should be, in addition to the text area for the "pearl",
a text field where the user is prompted for her or his name.
There should also be a Submit button, so that the user
can submit it and receive some feedback.
Part II:
In this part, you write a Perl script that processes the
text coming in from your input page. Use the CGI Perl module
to make it easy to get the user's text. Get the name
of the user and filter out any non-alphabetic characters
from it using a regular expression (this is partly for
security reasons, and partly to help you keep your file
directory empty of oddly named files.)
Your program should do three things with the user's
Perl of knowledge. It should save it in a file having
the user's name as the first part of the filename,
and having the extention .txt. It should then analyze
the text by comparing its word usage with that of
the previous 5 users' text. (So you will need to
keep another file around that keeps track of the
who the last five users have been.)
Finally, it should report on the results by printing
out HTML for a web page that gives the following:
The user's name, The user's input text,
the names of the five most recent previous users,
and for each of these: an analysis of words shared
with the new input text, words not shared, and a scalar
measure of similarity between the two texts.
It should conclude by displaying the 5 previous texts,
in order of decreasing similarity to the new text, on
the web page.
Part III:
In this part, get at least 3 friends to enter their
Pearls of Wisdom into your site, so that there is some
good data there before your program is graded.
Extra Credit:
Add a feature to your Perl script that tells the user's
fortune, based on the comparison of the user's text with
that in 3 to 5 built-in documents. You may do this by
determining with built-in document is closest to the user's,
and then printing out a canned fortune that corresponds to
the closest document.
Evaluation:
Evaluation: Part I: 10 points. Part II: 30 points. Part III: 10 points.
Extra portion: 5 additional points.
Helpful Hints:
When comparing two text documents T1 and T2, create for each document a hash in which each word is a key and the associated value is the number of times that word occurs in the document. Let n1 and n2 be the total number of word occurrences in T1 and T2 respectively. Let Wc be the set of words occuring in both T1 and T2, and for each w in Wc let min(w) be the minimum of the numbers of occurrences of w in T1 and T2. Let m = total of min(w) for all w in Wc. Then we can define a measure Sim(T1,T2) of similarity between T1 and T2 to be the value of
2 * m / (n1 + n2). Note that if T1 and T2 have no words in common, then
Sim(T1,T2) will equal 0. Also, if T1 = T2, then Sim(T1,T2) will equal 1.