|
|
|
|
- September 27, 2007: Introduction
- October 2, 2007: Text Categorization
- October 4, 2007: Introduction to Information Retrieval
- October 9, 2007: MapReduce/Hadoop
- October 11, 2007:Semantifying
Wikipedia
- October 16, 2007: Issues in
Machine Learning
- October 18, 2007: No class - instead, group meetings with Dan
- October 23, 2007: Guest Lecture (Christophe Bisciglia "Google's Data
Center"); Slides on Information Extraction
- October 25, 2007: Information Extraction Continued (Finite-State Models)
- October 30, 2007: IE Continued (HMM
learning and CRFs).
- November 1, 2007: KnowItAll. and
Realm.
- Reading: Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A-M.,
Shaked, T., Soderland, S., Weld, D. and Yates, A.,
"Unsupervised
Named-Entity Extraction from the Web: An Experimental Study"
Artificial Intelligence, 165(1)91-134, 2005.
- Reading: Sparse
Information Extraction: Unsupervised Language Models to the Rescue
Doug Downey, Stefan Schoenmackers, and Oren Etzioni
Proceedings of the 45th Annual Meeting of the Association for Computational
Linguistics (ACL 2007).
- November 1, 2007: Bonus: Glenn Kelman's
talk on startups.
- November 6, 2007. No class - work on projects!
- November 8, 2007. Networks and the Web (PDF)
- November 13, 2007. Crawling and Indicies
- Mercator: A
Scalable, Extensible Web Crawler, Allan Heydon and Mark Najork,
Compaq SRC, June 1999.
- Modern Information Retrieval,
R. Baeza-Yates and B. Ribeiro-Neto, Addison Wesley, 1999.
Covers vector space model (section 2), precision/recall (3), inverted
files (8), and inverted file compression (7.4.5)
- November 15, 2007. No class --- Individual group meetings with Dan
- November 20, 2007. Inverted Indicies
Indicies and Google
- November 22, 2007. No class --- Thanksgiving
- November 27, 2007 Pagerank and Alta Vista Cased Study
- November 29, 2007 Log Mining
- December 4, 2007
- December 6, 2007. Project Presentations
|