Now that you have implemented the SearchEngine
, we want to extend the search
function to support multi-word queries. Note that you should only need to change the search
function of your SearchEngine
class implemented in Part 1.
To compute the TF-IDF
of a multi-word query, we compute the TF-IDF
for each individual term in the query and then sum these terms to compute the total TF-IDF
.
For example, if the query was βa cute dogβ, the TF-IDF
of the query for document D would be
Similar to part 1, we compute the TF-IDF
for each document that contains a term in the query and sort the documents by descending order of TF-IDF
.
However, finding the relevant documents for a multi-word query is a bit more challenging. Instead of looking at a single entry in the dictionary, we must look at all Documents
which contain at least one word in the query. This means if the query is βa cute dogβ the TF-IDF
should be computed for all documents in the SearchEngine
that contain the word βaβ, all documents that contain the word βcuteβ, and all documents that contain the word βdogβ. Even if a document only contains the word βaβ, it should still be included in the ranking - itβs ranking will just be lower because itβs TF-IDF
score for βcuteβ and βdogβ will be 0.