Link Search Menu Expand Document

Search Engine

Building a search engine with maps and sets.1

Web search engines are software systems that organize information on the internet so that users can discover web pages related to a search query. We can implement a basic web search engine by building an inverted index (also known as an index) that maps each term (word) to the set of web pages in which it appears. A search query, which is a sequence of one or more terms, can be answered by returning the web pages that contain all of the terms.

Search Engine

Oct 12
Sets and Maps
BJP 11.2, 11.3
  1. Describe an example where a set would be preferred over a list, and vice versa.
  2. Loop over each item in a collection (or an array) with a for-each loop.
  3. Apply the if-missing-then-put pattern to provide default map values.
Oct 13
SectionSets and Maps
Oct 14
Nested Collections
  1. Trace the execution of programs with reference data types.
  2. Define methods that use collections containing other collection types.
  3. Describe the relationship between object reference equality vs. value equality.
Oct 15
SectionNested Collections
Oct 16
Linked Nodes
BJP 16.1
  1. Evaluate and reassign variable references to deeply linked nodes.
  2. Trace the execution of programs with references to deeply linked nodes.
  3. Identify and apply additive changes before destructive linked node changes.

Specification

Implement a SearchEngine class that represents an inverted index.

SearchEngine()
Constructs an empty SearchEngine instance.
void index(String document)
Indexes the document by associating each term in the document with the document itself. This method will be called for each web page that the client wants to add to the search engine.
Set<String> search(String query)
Returns the set of documents where each document contains all of the terms in the given query ignoring any never-indexed terms. If none of the terms have been indexed or if the query is empty, then an empty set is returned. One way to implement this is to start with all of the documents associated with any one of the indexed terms in the query and then remove all of the documents not associated with each of the other indexed terms in the query.

Split strings using the provided split method that returns a Set<String> terms.

Web app

To launch the web app, Open Terminal , paste the following command, and press Enter.

javac Server.java && java Server bbcnews/*.txt; rm *.class

Then, open the Network dropdown and select host 0.0.0.0:8000.

When you’re done, return to the terminal and enter the key combination Ctrl C to close the server.

Grading

Mark the program to submit it for grading. In addition to the automated tests, the final submission will also undergo code review for internal correctness on comments, variable names, indentation, data fields, and code quality.

Method errors
Not decomposing complex methods into logical subtasks that each have their own method. Create private “helper” methods to capture repeated code. A good rule of thumb for this assignment is that no method should have more than 20 lines of code in its body.
Data structure errors
Excessive inefficiency when using collections. For example, using a for-each loop to find a value of a map when a call to get would return the value.
Using a collection when not necessary, or using the wrong collection type. For example, using a list instead of a set when we only want to maintain unique elements.
  1. Chris Piech and Mehran Sahami. 2020. Bajillion Search Engine. In CS 106A Spring 2020. https://web.stanford.edu/class/archive/cs/cs106a/cs106a.1206/assn/15-assignment7.pdf