CSE 373, Winter 2019: Homework 5: Search Engine

Overview

In this project, you will:

  • Implement a heap to help sort data.
  • Practice writing more tests.
  • Get more practice working with sets and dictionaries while implementing TF-IDF ranking, an algorithm for computing how relevant a query is to a document.
  • Get some practice working with and manipulating graphs while implementing PageRank, an algorithm for assessing the quality of a webpage based on the number of inbound links.
  • Combine these features to build a search engine.

This assignment has three parts:

  • Part 1 is due on Friday February 22 at 11:59pm
  • Parts 2a and 2b are due on Friday March 1 at 11:59pm

Expectations

Here are some baseline expectations you should meet in all projects:

  • Follow the course collaboration policies.

  • DO NOT use or import any classes from java.util.*. There are only two exceptions to this rule:

    1. You may import and use java.util.Iterator and java.util.NoSuchElementException.

    2. You may import and use anything from java.util.* within your testing code.

  • DO NOT make modifications to instructor-provided code (unless told otherwise). If you need to temporarily change our code for debugging, make sure to change it back afterwards.

Table of Contents

This project is split up into three parts. Part 1 is about heaps, Part 2a is meant to give you more practice working with dictionaries and sets, and Part 2b is meant to give you practice working with graphs.

One important thing to note is that you will in general be given very few tests on all three parts. This is intentional and is meant to help you develop strong debugging and testing skills. We strongly encourage you to add your own tests to supplement the ones you were given.