|
|
CSE143 Autumn 2004 Project #2 Part BEine Kleine Google - Searching for DocumentsDue: Wednesday, November 10, at 9:00 pm. No late assignments will be accepted.In this part of the project, you'll take a query from the user, find the documents that best match that query, and report them. You should continue to work with the same partner you had for part A of this project, using the pair-programming style discussed in class. Remember that not only is it important to get the project done, but it is also important that you learn how to work effectively with someone else. Grading: You and your partner will receive the same scores on the programming parts of the project. Programming projects will be evaluated both for how well the code works and how well it is written and tested, each on a scale of 0 to 4. Be sure to include appropriate JavaDoc and other comments in your code, use meaningful names, indent sensibly, and so forth. Keep track of the major design decisions, algorithms, and tests you perform to verify your code is working. You will be asked to include a summary of these in the report that you will write individually at the end of the final part of this project. OverviewThe scenario for this project was described in part A. You've already gotten and analyzed the documents; now comes the part where you let the user search their contents. Basic RequirementTo receive full credit for this project (assuming everything works and is well written, of course), you are only required to do the following:
Implementation HintsThe simplest way to find documents which match the query terms is to go through the list of documents and look for the ones that contain all the search terms. That's fine for this project. However, if there is a large number of documents, this would be very inefficient. A much more efficient implementation would include a way to look up documents based on a single word. This could be done with aHashMap where the keys are
words and the values are Set s of Document s.
To display the list of hits, you can just print them out in the interactions
pane or console window (remember to take out the extraneous To keep track of which documents are getting what scores for a given
query, consider using a Most people probably won't need to use this, but in case you're interested, the Java regular expression package provides powerful tools for changing strings and finding patterns in them. But it's very tricky to get the hang of. If you have the basic requirements working, consider spiffing up the existing code with regular expressions if it helps, or using them for some of the extra credit. Interacting with the User To read user input, you can use a Reader systemIn = new InputStreamReader(System.in); BufferedReader input = new BufferedReader(systemIn); String lineFromConsole = input.readLine();Another, probably easier, way to read input is to use the static showInputDialog method
of JOptionPane to pop up a dialog box with a message and a space
for the user to enter a response.
See the JavaDoc pages for JOptionPane for the details.
You can use the same method to get the user's choice of which document to display. Extra CreditThere are plenty of things that can be done for extra credit.
One requirement about extra credit: If you do implement one or more additional scoring algorithms, you must also implement the basic scoring algorithm described above (search for all the words in the query and add up the number of occurrences), and you must provide some way in the user interface for the user to select which scoring algorithm to use. This could be as simple as entering a number to select the algorithm from a list, or you could use radio buttons in a GUI or something similar. If you implement additional scoring algorithms, you might do some experiments to see which algorithms give the best results. You'll want to discuss the different algorithms, describe the results, and discuss why you think you got the results you did in your final project report. TestingBe sure to try your program on some small examples so you can verify that it is actually analyzing the input correctly and producing appropriate output. For instance, run it on a very small section of the catalog, one where you can be sure that it returns the documents that really are the best match. You should use JUnit to run these sample tests. Remember to test that your program gets all the documents that match all the search terms when you use the basic algorithm. What to turn inUse this online turnin form to turn in the files that make up your project, including all the tests from both parts of the project (A and B). You should also turn in at least two different text files containing sample output generated by your program for different pairs of starter words (these should be relatively short, and you can cut and paste text or screen snapshots into a text editor to do this - you don't need to include the ability to write files in your program). Do not turn the project in under two different names.
|