Now that you have implemented a SearchEngine that supports mulit-word queries, you can use it to query directories full of text documents!
main.py
is a program for running your SearchEngine that we have implemented for you. it can be run in VS Code just like any other python program (Right-click -> βRun Python File in Terminalβ). When run, main.py
will output a series of prompts to the console, which allow the user to input the directory to be searched, and then enter search terms.
We have included a directory called wikipedia
in hw4.zip
which contains a large number of HTML wikipedia pages. You can run main.py
and enter wikipedia
as the directory. Note that it will take a few minutes for the SearchEngine
to be constructed because there are so many files. You can also use small-wiki
as a smaller example.
You can now query terms over Wikipedia files! Note how fast the ranking is for terms you have searched. Even though the number of documents is HUGE. Since you precomputed all of the values, computing a document ranking is very fast.
You can run the search engine with the Run button. The wikipedia
directory included for the people running locally is too large for Ed. Instead, you should use small-wiki
. The directory for small-wiki
can be found at /course/small-wiki
.
You can now query terms over Wikipedia files! Note how fast the ranking is for terms you have searched. Even though the number of documents is HUGE. Since you precomputed all of the values, computing a document ranking is very fast.
To run your own tests, you will have to open the terminal and run python hw4_test.py
.
You will also need to test your document.py
and search_engine.py
classes. We have included a file called hw4_test.py
where you should write your tests.
You should write three tests for document.py
and three tests for search_engine.py
Testing these classes will require you to create your own test corpus. To test each class you can construct a new instance of the class and pass in your own custom documents. Then you can call the functions of the class and verify that the returned value matches what you expected.
Since the SearchEngine
reads documents from a folder, to test it you will likely need to create your own directories containing files in your hw4
directory. You should include these test document directories in your submission so we can run your tests.
Recall on Ed, you will need to specify the paths to these directories with an absolute path (e.g./home/test-dir
if you uploadedtest-dir
).
As in previous homeworks, we have provided a function called assert_equals
that takes an expected value and the value returned by your function, and compares them: if they don't match, the function will crash the program and tell you what was wrong. You can see more instructions an example for tests from the Homework 1 - Part 1 to see examples of how to call the tests.
Similar to the previous homeworks, the assert_equals
lives in a file called cse163_utils.py
. We imported the function in hw4_test.py
in a special way so this shouldn't change how you call assert_equals
. You should not modify anything in cse163_utils.py
.
For full credit, your hw4_test.py
must satisfy all of the following conditions:
document.py
and 3 functions to test search_engine.py
test_funky_sum
)