CSE143 Homework

Project 2

Notes on Part B

Deadline Change: Electronically Wednesday, Feb. 5 (usual time); paperwork in class the next day.

These notes don't stand alone. Refer back to the main Project 2 instructions for a base description of the project. In particular, those instructions described briefly the three oracle types to be added to Part B. [New! Oracle Mall and OracleUtils (1/31). View README before using.]

New specifications:

Text searcher oracles should be of type GenericTextSearcherOracle, and their ~~oracles~~ omens should be of type GenericTextSearcherOmen.
The identity of the source text being searched should be available to clients of any type of text searcher oracle or omen. In particular:
- Such oracles and omens must implement a public method String getTextSource(); which gives the full name of the file being searched (path name or URL). The name should be in a format that would be directly usable in another Java program that needed to locate or search the same file. (In other words, don't abbreviate the name or add formats or comments, etc.).
- The value returned by an interpretInDetail() method of an omen should include the text source (unlike the above requirement, the information is formatted as part of a larger message) in addition to other information expected of interpretInDetail.
The numeric result of an omen is refined and reported more carefully. In particular, all text searcher omen types should make available two pieces of numeric information to clients, the location of the match, and the quality of the match. As before, the numeric information should also appear in a formatted form as part of the interpretInDetail output.
- ~~integer~~ int getMatchLocation() tells the line number within the file where the match occurred (or began, in the case of matches that span several lines). As before, lines are numbered from 1. The value should be negative if there was no match at all.
- double getMatchScore() tells how good the match was. This value should be 0.0 if there was no match at all; 1.0 if there was a perfect match. Values between 0.0 and 1.0 indicate some degree of partial match (for the current project, no partial matches are defined or required; you may define partial match values if you can do it in a consistent way.)
- When two omens (or two matches) are compared, the one with the higher match score is considered more favorable. If the two have exactly the same match score, then the one which occurs earlier in the file is considered more favorable.

Clarifications

Your program should be able to correctly process any normal (ASCII) text file, regardless of file name or length. If the program is given a file of some other format, such as a binary file, your program does not have to interpret the data -- but it must not blow up, under any circumstance. It should simply return no omen, or return an omen which shows no match for the required data. To achieve this, you will need to pay attention to the exceptions which can arise from methods that you call.
The file and stream processing you do should use Java classes of the io package. Do not use any uwcse or other external code. As mentioned originally, process the file data as you get it, rather than storing the whole file in an array or other structure to search later. Of course, you are welcome to look at the sample solution to project 1 for ideas on how to process the file.
For the Secret Prophecy Finder Plus! oracle/omen only...
- The requests are taken from a file. This should be a text file. The user should select it using a File Browser. Each line of the file is taken to be a complete and independent request. All requests are processed against the same source text file.
- the interpretInDetail method should list the best matches found. In particular, it should list all of the perfect matches, or state that there were none. For each match listed, it should include include the original request string, the match location, and the match score.
- The (overall) getMatchLocate and getMatchScore should both refer to the same match, and that should be the "best" match found.
The requirements from Part A, including the operation of the Luck Tester Oracle, still apply. You will have to turn in Luck Tester again and it may be tested again. You are free to use or adapt any official sample solution code that is posted.
There will probably be a new version of OracleMall within the next couple of days.
Contest. If there is a contest, separate instructions will follow. Contest results will not affect your project grade.

Examples

A "letter" is any of the letters a-z of the English alphabetic. A "word" is a set of consecutive letters, preceded by whitespace or punctuation, and followed by whitespace or punctuation. Nothing else is a word. (Note that under this definition, a word cannot span more than one line.) In the following

R U aware, CSE142 and 143 is the funnest d*** course, on this plnet!

the words are:

R U aware and is the funnest course on this plnet

Each of the following are two words:

O'Brien, helter-skelter, MyProject.java, base-10-Ethernet, Nick's

"P2Main.java" is one word ("java").

When matching a request string to the file, you should ignore any non-English letters in the request. Case is not considered in making a match. For example, the following three requests should give identical results:

"July 4, 1776" " Jul Y 5 1976" "1776, ju (1776)-ly"

The following line of a text file:

"Someone took Oscar Peterson's last song"

will match any of these requests (and many others):

"STOP" "top" "tops"

Just as with the original Text Searcher, your program should stop when it finds the first (complete) match to the request. The line number reported should be the line of the file where the match begins. The text captured in the omen should include that line plus any following lines that are part of the match. For example, given the following lines (with line numbers shown):

55 a b c d

56 e f g

57 h i j k

59 l m n o p

If the request is "DE", then the omen will contain 55 as the line number and the two matching lines, both of which the omen should report, are

a b c d

e f g

~~h i j k~~

Further Hints, etc. on the main P2 page.