CSE 373, Spring 2019: HW5 - Search Engine

Overview

In this project, you will:

  • Implement a heap to help sort data.
  • Practice writing more tests.
  • Get more practice working with sets and dictionaries while implementing TF-IDF ranking, an algorithm for computing how relevant a query is to a document.
  • Get some practice working with and manipulating graphs while implementing PageRank, an algorithm for assessing the quality of a webpage based on the number of inbound links.
  • Combine these features to build a search engine.

This assignment has three parts:

  • Part 1 is due on Thursday, May 16 at 11:59pm
  • Parts 2a and 2b are due on Wednesday, May 22 at 11:59pm

Expectations

Here are some baseline expectations you should meet in all projects:

  • Follow the course collaboration policies.

  • DO NOT use or import any classes from java.util.*. There are only two exceptions to this rule:

    1. You may import and use the following classes:

      • java.util.Iterator
      • java.util.NoSuchElementException
      • java.util.Objects
      • java.util.Arrays
    2. You may import and use anything from java.util.* within your testing code.

  • DO NOT make modifications to instructor-provided code (unless told otherwise). If you need to temporarily change our code for debugging, make sure to change it back afterwards.

Table of Contents

This project is split up into three parts. Part 1 is about heaps, Part 2a is meant to give you more practice working with dictionaries and sets, and Part 2b is meant to give you practice working with graphs.

One important thing to note is that you will in general be given very few tests on all three parts. This is intentional and is meant to help you develop strong debugging and testing skills. We strongly encourage you to add your own tests to supplement the ones you were given.