|
|
CSE143 Autumn 2004 Project #2 Part AEine Kleine Google - Reading WordsDue: Wednesday, November 3, at 9:00 pm. No late assignments will be accepted. In this assignment you will be reading text from files and searching
it for strings. By part B, the program will take a query and provide
the documents that match that query. For part A, we're just concerned
with
reading
the file, parsing it in to documents, parsing the words out, and storing
them in Work with your assigned partner. The same rules as always apply: try to do most of the actual coding while sitting at the same computer, trading off the keyboard. The two of you should turn in a single set of files. After the final part of the project is done, each of you will individually write a final report. Grading: When the project is complete, your project will be evaluated both for how well the code works and how well it is written and tested. For this part of the project, we will try to give you quick feedback on the scale of 0-3: 3 = no major problems, 2 = something needs to be fixed, 1 = serious trouble, and 0 = no credible effort. For all phases of the project, be sure to include appropriate JavaDoc and other comments in your code, use meaningful names, indent sensibly, provide toString() methods where appropriate, and so forth. Keep track of the major design decisions, algorithms, and tests you perform to verify your code is working. You will be asked to include a summary of these in the report that you will write individually at the end of the final part of this project. OverviewWhen presented with several documents, it's normal to want to be able to pick out a few of them that are of interest. In the case of the Internet, the documents are web pages. You're all familiar with searching the 'net; you provide a query of some search terms, and the search engine provides documents that match those terms. For this project, rather than searching the 'net, you'll be searching part of the UW course catalog. There are two major steps in running a search engine.
We're going to do something odd on this project for the sake of convenience. While a "document" normally comes in its own file, in this case we're redefining "document" to mean one entry of the UW course catalog. We're providing a text file of part of the course catalog. Your code should treat lines that are next to each other as if they are part of the same document. So a group of lines that are right next to each other is a document; and as soon as there's a blank line you start over. The file is here: coursecatelog.txt. If you choose to make more files to search, be careful that they follow the same format of blank lines separating the documents. (Also beware of tricky situations when there are blank lines at the beginning of the document.)
What we're doing for part AFor part A of the project, you and your partner should implement the part of the program that opens an input file, reads lines from the file, separates out the individual "documents" and separates the input into individual words. Parts of what you write for part A are likely to change when you use them in part B, but what we're doing in part A will provide the basic infrastructure for part B. To gain a bit of experience with For our purposes in this project, a word is any sequence of letter or digit characters separated by spaces, punctuation, end of line separators, or whitespace such as space, tab, return, and so on). Since a word has the same meaning regardless of the punctuation next to it or its capitalization, we will consider “example”, “Example”, “example,”, “example:”, “EXAMPLE!”, and “EXAMPLE!!!” to all be the same word. It's best to remove the punctuation and capitalization and store them all as "example" in the frequency table. This is not required in part A - you can store the words without converting to lowercase or stripping punctuation, but you can do this now if you want to get a bit ahead. Hints: Look at classes Implementation RequirementsThe purpose of this project is to give you practice with various file and text processing techniques. Unlike the previous project, which was almost totally free-form, for this project there are some specific requirements for how you implement your program.
Implementation HintsAs usual on a software project, it's best to start small and gradually add to what you've got, eventually arriving at a program that does the entire job. That means looking at the entire problem and figuring out the smallest part that you can implement without having to do anything else, then which small part can be added next, and so on. At each stage of the implementation, it is important to test your code to be sure that the new part that you’ve added works properly and, furthermore, didn’t break any of the previous code. While it’s up to you and your partner to figure out how to proceed, here is a suggested order that you might find useful. If you and your partner decide to do something different, you should think about the implications and be as sure as you can that your strategy makes sense. We recommend that you start by implementing the code to open a file and initialize aBufferedReader object to read from it. Check that the code works
correctly before you move on to the rest of the assignment.
As always, you need to think about how to divide your code into appropriate classes, each of which does one thing well (high coherency) and are loosely coupled to other classes to the extent that this is possible. TestingBe sure to test your code thoroughly. In particular, be sure you can handle various kinds of input: extra blank lines in the middle or at the beginning or end of the file, leading or trailing whitespace (blanks, tabs, etc.) on a line, an arbitrary number of whitespace characters between words, etc. You and your partner should think through the possibilities and create a comprehensive (note that this does not necessarily mean numerous) set of tests. You should use JUnit to verify that the code to break words apart works properly, then use some plain text files (i.e., NotePad or something similar) to test that your program can open a file and read it. To the extent you can, you should also test your exception handling code
to see if it properly reports errors. This may be a bit tricky to do, but
you might be able to try things like deleting a file after selecting its
name but before clicking OK in a What to turn inUse this online turnin form to turn in the files that make up your project and the test programs and test input files you used to verify that it works. Do not turn the project in under two different names.
|