CSE 163, Winter 2020: Homework 4: Part 2

The SearchEngine Class (multi-word queries)

Now that you have implemented the SearchEngine, we want to extend the search function to support multi-word queries. Note that you should only need to change the search function of your SearchEngine class implemented in Part 1.

Multi-Word Queries

To compute the TF-IDF of a multi-word query, we compute the TF-IDF for each individual term in the query and then sum these terms to compute the total TF-IDF.

For example, if the query was β€œa cute dog”, the TF-IDF of the query for document D would be

\( TFIDF(``\text{a cute dog}", D) = TFIDF(``\text{a}", D) + TFIDF(``\text{cute}", D) + TFIDF(``\text{dog}", D)\)

Similar to part 1, we compute the TF-IDF for each document that contains a term in the query and sort the documents by descending order of TF-IDF.

Finding Relevant Documents

However, finding the relevant documents for a multi-word query is a bit more challenging. Instead of looking at a single entry in the dictionary, we must look at all Documents which contain at least one word in the query. This means if the query is β€œa cute dog” the TF-IDF should be computed for all documents in the SearchEngine that contain the word β€œa”, all documents that contain the word β€œcute”, and all documents that contain the word β€œdog”. Even if a document only contains the word β€œa”, it should still be included in the ranking - its ranking will just be lower because its TF-IDF score for β€œcute” and β€œdog” will be 0.