Steam-powered Turing Machine University of Washington Department of Computer Science & Engineering
 CSE454 Project Description
  CSE Home   About Us    Search    Contact Info 

Administrivia
 Home
 Using course email
 Email archive
 Policies
Content
 Overview
 Resources
 Lecture slides
Assignments
 Reading
 Project
    The basic idea is to build a Google-style search engine which is supreme at finding pages (initially restricted to those on the University of Washington family of web sites).

Crucial in this will be spidering strategy, index structures, snippet summary extraction, and ranking algorithms (specifically those based on hypertext analysis techniques). There are several possible extensions, including, two list just two, 1) an efficient implementation of pagerank, 2) focussing the crawler and making it find, classify and allow search for webcams by geographic queries.

The project will be broken into parts, each with its own deadline and turn-in deliverables. In the first part, students will work alone, but we will form groups of two or three for the subsequent parts.

  1. Part One. Due October 22 at noon. Modify the Java crawler we provide to improve its speed and scaling properties, while maintaining politeness. Measure your improvements.
  2. Part Two. Due October 29 at noon. Form teams and design indexer and query processing modules.
  3. Part Thre. Due November 12 at noon. Working indexer and processing of single-word queries.
  4. Part Four. Due November 26 at noon. Multiple word queries, ranking, and web front end.
  5. Part Five. Due December 10 at noon. Enhancements and final report.


CSE logo Department of Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX