Steam-powered Turing Machine University of Washington Department of Computer Science & Engineering
 CSE490i Project Description
  CSE Home   About Us    Search    Contact Info 

 Using course email
 Email archive
 Lecture slides
    The basic idea is to work in groups of two or three to build a Google-style search engine which is supreme at finding pages on the University of Washington family of web sites.

Crucial in this will be spidering strategy, index structures, snippet summary extraction, and ranking algorithms (specifically those based on hypertext analysis techniques).

The project will be broken into parts, each with its own deadline and turn-in deliverables.

  1. [Jan 15] Choose a group name, select groupmates and email (one message per group, please) Adam with this information. Groups should have either two or three members.
  2. [Jan 28] Build and test the basic spidering code.
  3. [Feb 12] Creation of inverted indicies; single-word queries.
  4. [Feb 26] Extensions such as multiple-word queries, Boolean processing, similarity ranking, PageRank, and advanced interfaces.
  5. [Mar 14] Final experiments and writeup. Until announced via email, this is still in draft form and subject to change w/o notice
  6. In the last week (likely during the last two classes, Mar 12 and Mar 14, each group will make a short, class presentation about their system.

As will become clear, the project is open-ended. There are many ideas to try, and some are harder than others. Furthermore, there are a number of interesting experiments to do to see which approaches work best. Our grading system will take into account the difficulty of what each group attempts as well as how well the effort is executed. So don't be afraid to try and implement a new idea if you have it (however, we suggest that you first check with us if you think you have especially bold ideas - our feedback may be helpful). So there are a number of ways of excelling in your project:

  • Trying something hard.
  • Making something simpler work robustly and well.
  • Conducting some well thought out experiments that show which of two (or more) approaches work best and why.

CSE logo Department of Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to weld]