Steam-powered Turing Machine University of Washington Department of Computer Science & Engineering
CSE 454 - Advanced Internet Autumn 2015
Tues, Thurs 12:00 - 1:20pm in JHN 111
  CSE Home  About Us    Search    Contact Info 

Instructor: Dan Weld (weld at cs )
Office hours: CSE 588 - TBD and by email
TA: Colin Lockard (lockardc at cs)
Office hours: CSE 418 - Monday (11-Noon), Friday (10:30-11:30 a.m.), and by appt.

Contact email: cse454-instructors at cs


Just as the Internet evolves quickly, year-to-year, each instance of CSE 454 is a bit different. This year we will cover information extraction, crowdsourcing and and social networks in addition to Internet search. One constant: the bulk of the coursework will be in the form of (three-person) group projects. Projects be on relation and event extraction, but each group needs to specify which relations they will try to extract, the corpus of documents to target, and the type of supervision to be used to train their extractor. Class time will be a combination of lectures, group discussion, student-led presentations, guest speakers and team meetings. Some of the topics we'll cover include:

Textbooks & Resources

There is no required textbook, but I'll ask you to read some of these informative papers. You also may wish to check out links to code and tools.

Soon we'll have a link for pre-project 1 and project ideas. See also guidelines for initial Specifications, end-of-term in-class Presentations, and Final Written Reports

We'll also set up a catalyst mailing list or Piazza page.

Schedule of Lectures

Date Topics & Lecture Notes Readings
October 1 Introduction, History, Future & Class Project None
October 6 Machine Learning Domingos: A Few Useful Things...
October 8 Information Extraction NLTK Chapter on IE
October 13 Pitch Day
October 15 Web Crawling and Indexing Mercator: A Scalable, Extensible Web Crawler & Manning et al.'s online book Information Retrieval
October 20 Distant Supervision Autonomosuly Semantifying Wikipedia by Wu & Weld
October 22 Human Computation & Crowdsourcing
Nov 3 No class: 1-1 Meetings with Dan
Nov 5 Team Project Presentations (see "Groups" below for slides)
Nov 12 Team Project Update Presentations
Nov 17 Case Study: Never Ending Language Learning NELL paper
Nov 19 No class: 1-1 Meetings with Dan
Nov 24 Search Engine Query Processing & Pagerank original Google paper & efficient implementation of pagerank
Dec 1 No class: 1-1 Meetings with Dan
Dec 3 Lessons from A/B/n testing (Ronny Kohavi) Practical Guide ... HiPPO
Dec 8 Online Communities & Network Effects & Information Cascades Easley & Kleinberg "Networks, Crowds & Markets"
Dec 10 No class: 1-1 Meetings with Dan
Dec 17 Final Presentations (10:30-12:20)
Dec 18 Reports due at 11:59pm


Assigned Date Assignment Due Date
October 1 Assignment 1: Text Classification
  • Zipped folder containing necessary files
  • Sample Solution
  • October 11, 11pm
    October 13 Assignment 2: Relation Extraction
  • Zipped folder containing necessary files
  • Sample Solution
  • October 22, 11pm
    October 13 Submit Project team names, member list, one-sentence idea summary on Catalyst. October 22, 11pm
    October 22 Project Proposal October 29, 11pm
    December 4 Final Presentation December 17, 10:30 am-12:20 pm
    December 4 One person per team should submit the following three items to their respective Catalyst dropboxes:
  • 1. Final Written Reports
  • 2. A PDF version of your final presentation
  • 3. A zipped copy of your code, including a README.txt explaining how to run it
  • December 18, 11:59 pm


    Course Administration and Policies

    Your grade will be 7.5% preproject 1, 7.5% preproject 2, 75% project, and 10% class participation (no midterm or final). The project component of your grade will be 45% scope and execution of what you implement, 15% how well it performed, 10% evaluation, 10% final presentation, and 20% final written report. Note that you can do well even if you choose something too ambitious and it flops, but the best strategy will likely mimic a successful startup: aiming for a minimal viable project with stretch goals.

    CSE logo Department of Computer Science & Engineering
    University of Washington
    Box 352350
    Seattle, WA  98195-2350
    (206) 543-1695 voice, (206) 543-2969 FAX