Steam-powered Turing Machine University of Washington Department of Computer Science & Engineering
CSE 454 - Advanced Internet Systems - Winter 2013
Tues, Thurs 12:00 - 1:30pm in EEB 045
  CSE Home  About Us    Search    Contact Info 

Instructor: Dan Weld (weld at cs dot washington dot edu)
Office hours: CSE 588 - TBD and by email
TA: Xiao Ling (xiaoling at cs dot washington dot edu)
Office hours: CSE 610 - make an appointment by email

Schedule of Lectures

Date Topics & Lecture Notes Readings
January 8 Introduction, History, Future & Class Project
January 10 Architecture of a Relation Extractor
January 15 Supervised Learning & Logistic Regression Logistic Regression (Mitchell) Section 3
January 17 Instaread and Features for ML Papers on features,
January 22 Project Discussion and Crawling the Web Mercator Paper
January 28 IR Models & Index Construction and Link Analysis & Pagerank Brin & Page: Google and Haveliwala: Efficient Pagerank
Feburary 5 SE Query Processing: Alta Vista
Feburary 7 NYU's 2011 KBP System NYU 2011 system description
Feburary 19 Computational Advertising
Feburary 26 Crowdsourcing (Isaac Nichols & Nathan Macfarlad); see also Dan's take on the material
March 5 Cryptography & Practical Internet Security (Josh Benaloh)
March 7 Mining unstructured healthcare data (Deep Dhillon)


Presentations & Reports

Here are some notes on end-of-term in-class Presentations (Thurs 3/14) and Final Written Reports (due on Monday, 3/18 at 9am).

Textbooks & Resources

Read these informative papers, especially in the INFORMATION EXTRACTION subsection.

You also may wish to check out links to code and tools.

There is no required textbook.


Project Warmup: Assignment Descriptions Data Format(outdated)Data Format EC2 Tutorial source code


We're doing something new this year - a collaborative group project involving everyone in the class. The focus is information extraction, widely believed to be the future of Web search. We will divide into small groups (eg 2 people), each working on a component of an integrated system to "read the Web", augmenting a knowledge base (like Freebase) with entity-attribute-value triples by automatically processing newswire and Web text. We'll post more specifics on the project soon, but in the meantime you can read about some related research here in the UW CSE Department, on which we'll be building.


Course Administration and Policies

Your grade will be 85% project and 15% class participation (no midterm or final). The project component of your grade will be 45% scope and execution of what you implement, 15% how well it performed, 10% final presentation, and 20% final written report. Note that we'd much rather see an ambitious project that flops than a simple hack with good performance (tho we'd especially like to see cool approaches with stunning results).


CSE logo Department of Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX