CSE 373, Winter 2018: Project 3: Search Engine

Overview

In this project, you will:

  • Implement a heap to help sort data.
  • Practice writing more tests.
  • Get more practice working with sets and dictionaries while implementing TF-IDF ranking, an algorithm for computing how relevant a query is to a document.
  • Get some practice working with and manipulating graphs while implementing PageRank, an algorithm for assessing the quality of a webpage based on the number of inbound links.
  • Combine these features to build a search engine.

This assignment has three parts:

  • Part 1 is due on Fri, Feb 16 at 11:30pm
  • Parts 2 and 3 are due on Fri, Feb 23 at 11:30pm

Note: each of these three parts are largely independent of each other. Feel free to work on them in any order you want, as long as you meet the listed deadlines.

Expectations

Here are some baseline expectations we expect you to meet:

  • Follow the course collaboration policies

  • DO NOT use any classes from java.util.*. There are only two exceptions to this rule:

    1. You may import and use java.util.Iterator and java.util.NoSuchElementException.

    2. You may import and use anything from java.util.* within your testing code.

  • DO NOT modify instructor-provided code (unless told otherwise)

Table of Contents

This project is split up into three parts. Part 1 is about heaps, Part 2 is meant to give you more practice working with dictionaries and sets, and Part 3 is meant to give you practice working with graphs.

One important thing to note is that you will in general be given very few tests on all three parts. This is intentional and is meant to help you develop strong debugging and testing skills. We strongly encourage you to add your own tests to supplement the ones you were given.