CSE 490i - Project Part 4
Due: March 10, 2000; 5pm.
What to Do
For the most part, this part of the assignment is up to you (we give some
ideas below). The one thing we ask all groups to do is pay attention to the
user interface of your search system. This means thinking through usability
carefully. We recommend getting some friends from outside the class, asking
thme to use the page and watching (in person) as they do. Keep quiet
and don't tell them anything, just see what they do and if they have
problems. We also suggest you attend to spelling errors (perhaps using
approximate lookup in the database - MySQL does support LIKE). Perhaps
allowing browsing the database by band or genre instead of just
search. It's up to you, but we'll grade you on the overall look and feel of
the site.
In addition, we'd like to see some new functionality. You may do whatever
you wish on this front, but here are some ideas:
- If you haven't already, build a crawler-style song collector (rather
than wrapping a few sites). See part 3 for more tips here.
- Add in a collaborative filtering component, perhaps by using a Naive
Bayesian classifier as follows:
- Treat each person as a training example. Each song is an attribute of
that person with values being the degree to which a person liked it. Maybe
keep this simple with liked, neutral, and disliked as the values.
- Write some web code to train the system on each new users likes and
dislikes. It should let the user register and then present the user with a
list of songs to rate. Getting a good UI is key here since the utility of
your system's recommendations is a function of how much data you can
collect which is limited by the user's patience. It's also probably crucial
to let the user pick which songs he/she wishes to rate. It'll be good to
let a user go back and rate more songs so they can interleave rating known
songs and getting recommendations.
- After the system has been trained you can use the naive Bayes
classifier to predict the value for new songs that haven't been seen by
that user (as long as they have been seen by some user). You might wish to
use the Bow library
code for this part.
- By iterating over the set of predictable songs (i.e. those that have
been seen by others) you can build something which recommends the best new
song for a user. Again a good UI is key here, if you do it right the user
can listen to the song and give your engine feedback about how weel it's
doing. With the right UI, it'll be fun and users will listen to quite a
few.
- Your system will do better if you have some way of bootstrapping
things. One idea here would be to create virtual users that each represent
a various review sites and incorporate their info into the
system. Naturally, this probably requires writing a wrapper for each review
site.
- Harder: Try automatically clustering artists for
recommendations. Perhaps start by wrapping genre classifications from an
mp3.com-like site, and then try to duplicate the methods of hubat.com (Dan has a paper explaining how
their mechanisms work, but it's pretty technical).
What to Hand In
Hand in the URL of a top-level web page that lists your team name and
contact information for each member. This page should have pointers to the
following:
- A description of the features you added, how and why.
- The source code for your project.
- The URL of your search form (upgraded from the previous project).
- A set of example queries that will off the new features and content
you've added to the database in this part of the project, as well as the
features added previosuly.
Pointers to readings
Tessa Lau | tlau@cs.washington.edu