Project2 Reports page
In all respects, project 2 will be similar to project 1, so please refer to
the previous description for overview comments on what is desired. I'll be
much more brief in this description.
As before, projects will be done in groups of two, but I wish everyone to
reorganize into different groups unless you have an exceptional
circumstance and get my permission.
For project 2, I am even more open to groups proposing their own ideas. I
would also like to encourage groups to meet with me early on to brainstorm
about their projects. The suggestions below seem like like good ones,
however, and if they don't appeal, hopefully they will give you ideas.
- Use several machine learning algorithms to learn a SPAM
classifier. In 473 last Spring, I had the students try three
methods: decision trees, an ensemble of decision trees and a Naive
Bayes classifier to do this. See problems 4 and 5 here
There are several interesting ways to improve upon the basic
assignment. Read the following paper (Jason D. Rennie, Lawrence Shih,
Jaime Teevan, David Karger: "Tackling the Poor Assumptions of Naive
Bayes Text Classifiers." ICML 2003: 616-623 which is available here)
and see if it can lead you to improvements on your classifier. (Note:
Dan hasn't thought this through; the paper is high on his stack of
papers to read, but he hasn't gotten to it yet.)
- The Placelab framework uses
WiFi signatures to estimate a user's location in terms of longitude
and latitude coordinates. This data is timestamped and logged
periodically (e.g. every 2 seconds). Here is some
sample
data. Can you use ML and Bayesian techniques to predict higher level
descriptions of behavior?
You might use timestamp information and a clustering algorithm
such as k-means (AIMA page 845 but see also 725) to generate symbolic
locations. Then perhaps you could learn a markov model (or dynamic
Bayesian network, or hierarchical MM, or hierarchical DBN, or...) to
predict the user's behavior. You could also try smoothing using a
relational markov model as described by Sanghai et al. (The
abstraction hierarchy might include terms like restaurants > cafes >
starbucks. An instance might be Starbucks-on-42-and-the-ave). This is
a hot area of research. Here are some papers:
- For students who have taken or are interested in computer vision,
write a program which solves Captchas. These are a special type of
Turing test designed to keep software robots from using up resources
of MSN, Yahoo, Gmail and others (See this
overview news story or (best) the official Captcha site. See also problems for the visually
impaired.
Alternatively propose your own captcha. Can you think of one that
isn't visual? Can you generate a large number of tests?
Deadlines
- Monday (November 15) at midnight: Email Miao with the name of
your team, the names of the teammates, and a preliminary project plan
(which project or direction you are thinking).
- Between 11/15 and 12/3 I wish to meet with each group to discuss
details; see the signup-sheet.
- Monday (December 13) at 9:00am: Final report, code due. Email
code and report (.doc or .pdf) to Miao and give each of us one printout.