CSE490i projects

As of the third part of the project, seven groups have implemented wrapper-based search engines (typically some combination of mp3.com, iuma.com, gigabeat.com, and ubl.com). Most of the wrapper-based approaches hardcoded some HTML strings surrounding the data of interest, such as band name or track title, and they fetch specific pages within each site (e.g., artist pages) rather than crawling the site randomly.

Four groups implemented a generic crawler: seed it with a search from some popular search engines, or with a few mp3-themed web sites, and have it follow links, looking for links to files with a .mp3 extension. Some of the crawlers look at the ID3 header to extract the band/track information from the mp3 file itself, and some crawlers notice that common rippers encode the band and track information in the filename, and extract this info from the filename directly.

Here are the groups, with links to their main search interface:

bpz
Caffeine (documentation)
COKE (documentation)
G2K (documentation)
inTune (documentation)
JKTcrawler
MegaMp3 (documentation)
MONIKER
Socket
Sparrow (documentation)

Tessa Lau | tlau@cs.washington.edu