CSE 373, Spring 2018: Project 3: Search Engine

Overview

In this project, you will:

  • Implement a heap to help sort data.
  • Practice writing more tests.
  • Get more practice working with sets and dictionaries while implementing TF-IDF ranking, an algorithm for computing how relevant a query is to a document.
  • Get some practice working with and manipulating graphs while implementing PageRank, an algorithm for assessing the quality of a webpage based on the number of inbound links.
  • Combine these features to build a search engine.

This assignment has three parts:

  • Part 1 is due on Fri, May 11 at 11:59pm
  • Parts 2a and 2b are due on Fri, May 18 at 11:59pm

Expectations

Here are some baseline expectations we expect you to meet:

  • Follow the course collaboration policies

  • DO NOT use any classes from java.util.*. There are only two exceptions to this rule:

    1. You may import and use java.util.Iterator and java.util.NoSuchElementException.

    2. You may import and use anything from java.util.* within your testing code.

  • DO NOT modify instructor-provided code (unless told otherwise)

  • DO NOT produce excess output to the console; you should remove any print statements used for debugging purposes before submitting your assignment.

Table of Contents

This project is split up into three parts. Part 1 is about heaps, Part 2a is meant to give you more practice working with dictionaries and sets, and Part 3b is meant to give you practice working with graphs.

One important thing to note is that you will in general be given very few tests on all three parts. This is intentional and is meant to help you develop strong debugging and testing skills. We strongly encourage you to add your own tests to supplement the ones you were given.