CSE 163, Winter 2020: Homework 4: Part 3

Testing and Running the Search Engine

Now that you have implemented a SearchEngine that supports mulit-word queries, you can use it to query directories full of text documents!

Running the Search Engine Locally

main.py is a program for running your SearchEngine that we have implemented for you. it can be run in VS Code just like any other python program (Right-click -> β€œRun Python File in Terminal”). When run, main.py will output a series of prompts to the console, which allow the user to input the directory to be searched, and then enter search terms.

We have included a directory called wikipedia in hw4.zip which contains a large number of HTML wikipedia pages. You can run main.py and enter wikipedia as the directory. Note that it will take a few minutes for the SearchEngine to be constructed because there are so many files. You can also use small-wiki as a smaller example.

You can now query terms over Wikipedia files! Note how fast the ranking is for terms you have searched. Even though the number of documents is HUGE. Since you precomputed all of the values, computing a document ranking is very fast.

Running the Search Engine on Ed

You can run the search engine with the Run button. The wikipedia directory included for the people running locally is too large for Ed. Instead, you should use small-wiki. The directory for small-wiki can be found at /course/small-wiki.

You can now query terms over Wikipedia files! Note how fast the ranking is for terms you have searched. Even though the number of documents is HUGE. Since you precomputed all of the values, computing a document ranking is very fast.

To run your own tests, you will have to open the terminal and run python hw4_test.py.

Testing

You will also need to test your document.py and search_engine.py classes. We have included a file called hw4_test.py where you should write your tests.

You should write three tests for document.py and three tests for search_engine.py Testing these classes will require you to create your own test corpus. To test each class you can construct a new instance of the class and pass in your own custom documents. Then you can call the functions of the class and verify that the returned value matches what you expected.

Since the SearchEngine reads documents from a folder, to test it you will likely need to create your own directories containing files in your hw4 directory. You should include these test document directories in your submission so we can run your tests.

Recall on Ed, you will need to specify the paths to these directories with an absolute path (e.g. /home/test-dir if you uploaded test-dir).

As in previous homeworks, we have provided a function called assert_equals that takes an expected value and the value returned by your function, and compares them: if they don't match, the function will crash the program and tell you what was wrong. You can see more instructions an example for tests from the Homework 1 - Part 1 to see examples of how to call the tests.

Similar to the previous homeworks, the assert_equals lives in a file called cse163_utils.py. We imported the function in hw4_test.py in a special way so this shouldn't change how you call assert_equals. You should not modify anything in cse163_utils.py.

Grading

For full credit, your hw4_test.py must satisfy all of the following conditions:

  • Use the main method pattern shown in class.
  • Has 3 functions to test document.py and 3 functions to test search_engine.py
  • Each of these test functions should have a descriptive name that indicates which function is being tested (e.g. test_funky_sum)
  • Each of the test functions must be called from main.
  • Turn in any test files you generate.