|
|
|
|
- Problem Set 1 doc pdf Due 10pm October 11, 2008
Sample Solutions doc pdf
- Problem Set 2 doc pdf Due 8am October 28, 2008
Sample Solutions doc pdf
- Problem Set 3 doc pdf Due 4:40pm November 10, 2008
Sample Solutions doc pdf
Mini-project ideas. I will give a few suggestions below, but if you have
another idea, please feel free to run it by Dan (by Thursday).
- One option would be to write a program to solve the counterfeit
coin problem on the midterm which we discussed in class. See also
the midterm solutions. Just solving that problem as posed on the
midterm is ok. For extra credit, it would accept a variable number
of coins as input, but the assumption that just on coin is
counterfeit (heavy or light) may be built into the program. More
extra credit for increased generalization (but no need to go crazy
here). For experiments you could compare the run time with
different heuristics or with and without a heuristic.
- Build a DPLL and or a WalkSAT satisfiability solver. The book has
plenty of information and page 235 has citations for further
reading. For experiments, you could compare DPLL with WalkSAT or
plot how they each scale as problems grow harder. You might
consider "hard" problems (remember the clause to variable ration
for random problems).
- Build a spam filter using one or more machine learning algorithms.
You can implement naive Bayes, decision trees, or look for the
Weka ML package http://www.cs.waikato.ac.nz/ml/weka/ and try a few
different algorithms. Experiments could compare the effect of
different learning algorithms or use cross validation to measure
the precision and recall of your filter.For more ideas, see
problem 5 in
http://www.cs.washington.edu/education/courses/cse473/04sp/hw/hw3/
Look at assignment 3 below for data:
http://www.cs.washington.edu/education/courses/cse473/04sp/hw/
and also click the 'complete assignment' and start reading at "Assignment 4"
A Web search is likely to turn up better, larger and more comprehensive datasets so I'd probably advise that instead of my very onld one. For example, I found this URL which looks promising: http://mobblog.cs.ucl.ac.uk/2008/01/21/spam-dataset/
And here's a URL for the Fifth Conference on Email and Anti-Spam (CEAS 2008) http://www.ceas.cc/ There's also the 'Web Spam Challenge' which is likely different but might be interesting.
- Write a program which learns Bayes nets. Start by assuming the
structure is fixed (and input) and learn the parameters from data
as in the previous problem set. Then try exploring the space of
structures using local search. In addition to the text, a good
reference is
http://research.microsoft.com/research/pubs/view.aspx?msr_tr_id=MSR-TR-95-06
Hand in a soft copy of your code along with instructions on how to run
it. A page of description on how it works would also be appreciated and
a page of evaluation (e.g. one or two experiments demonstrating scaling
behavior or the effect of various switches). If you end up using third
party code in your project, be sure to explain clearly what you used and
what you wrote yourself.
Project reports:
|