CSE 546 - Data Mining - Autumn 2003
| Instructor: pedrod at cs.washington.edu
Office: Allen 648
Office hours: Wednesdays 2:00-2:50 and by appointment
| TA: mattr at cs.washington.edu
Office: Allen 220
Office hours: Mondays 4:30-5:20 and by appointment
Mondays and Wednesdays from 3:00 to 4:20 in MEB 242
Week 1: Chapter 1 of Hand and Behind-the-scenes data mining
Week 2: Chapter 3 of Mitchell
Week 3: Chapter 10 of Mitchell; review first-order logic
Week 4: Mining high-speed data streams, Mining complex models from arbitrarily large databases in constant time
Week 6: Chapter 6 of Mitchell; review probability and statistics
Week 7: Chapter 4 of Mitchell; review calculus
Week 8: Section 2 of Machine-learning research: Four current directions
Week 9: Chapter 7 of Mitchell, A unified bias-variance decomposition, A tutorial on support vector machines
Week 10: Chapter 9 of Hand
Week 1: Introduction, inductive learning
Week 2: Decision trees
Week 3: Rule induction
Week 4: Scalability
Week 5: Instance-based learning
Week 6: Bayesian learning
Week 7: Neural networks
Week 8: Model ensembles
Week 9: Learning theory and SVMs
Week 10: Clustering
The topics covered will have a non-null intersection with the following list:
- The data mining process
- Decision trees
- Rule induction
- Instance-based learning
- Bayesian learning
- Neural networks
- Genetic algorithms
- Model ensembles
- Learning theory
- Support vector machines
- Association rules
- Data warehousing and OLAP
- Web mining
Class evaluation will be by means of a project. Projects can
be proposed by the students - for example, applying data mining
techniques to your area of interest - or chosen from this list:
Projects can be carried out in groups of two or individually; we
encourage working in groups. In addition to a written report, students
will give a short oral presentation of their work.
- Mining multi-relational databases
- Extracting knowledge bases from the World Wide Web
- Mining social networks from the Internet
- Incorporating prior knowledge into data mining algorithms
- Learning models of an adversary's behavior
- Mining sequences of program versions to automate debugging
- Merging and refining databases for data mining
- Predicting the evolution of scientific communities
- Simultaneously classifying sets of related entities
- Discovering unobserved relations from observable ones
- Learning across different subpopulations
- Learning given a query distribution
- Modeling the hidden Web through querying
- Modeling Web surfing behavior
- Scaling up learning algorithms via sampling and compression
Schedule of project presentations
- C. Burges, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, 1998.
- T. Dietterich, Machine-learning research: Four current directions, AI Magazine, 1997.
- P. Domingos, A unified bias-variance decomposition for zero-one and squared loss, Seventeenth National Conference on Artificial Intelligence, 2000.
- P. Domingos and G. Hulten, Mining high-speed data streams, Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2000.
- G. Hulten and P. Domingos, Mining complex models from arbitrarily large databases in constant time, Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002.
- G. John, Behind-the-scenes data mining, SIGKDD Explorations, 1999.
is a set of tools developed at UW that you may find useful for your
project. VFML is still in beta; if you're planning to use it, please
contact ghulten at cs.washington.edu.
Pointers to various pieces of data mining software can be found at KDnuggets.
Comments can be sent to the instructor or TAs using this
anonymous feedback form.
Course Mailing List
To subscribe to the course mailing list, visit the mailing list home page.
Alternatively, you can use the email interface to subscribe;
send email to cse546-request@cs with the word "help" in the subject
to receive a list of email command options.
Department of Computer Science & Engineering
University of Washington
Seattle, WA 98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to Pedro Domingos]