Steam-powered Turing Machine University of Washington Department of Computer Science & Engineering
 CSE454 Course Overview
  CSE Home   About Us    Search    Contact Info 

 Using course email
 Email archive
 Lecture slides
    The following outline is a tentative list of the topics we hope to cover; however, the ordering will be different in order to put crawling and search-engine design earlier.
  • Introduction
  • Text Processing
    • Classification
      • Naive Bayes Classifier
      • Information Extraction
      • Hidden Markov Models
      • Conditional Random Fields
    • Similarity Measures & Information Retrieval
      • Ranking, TF/IDF, precision / recall, stemming, stop words
      • Latent Semantic Indexing
    • Clustering
      • Expectation Maximization
    • Syntactic Analysis
      • POS Tagging
      • Anaphora
      • Parsing
  • The Web
    • Foundations
      • HTTP, HTML, browser archiecture
      • Server basics, cookies, log files, dynamic page generation
      • Web Programming: AJAX, FLEX, Silverlight
    • Fetching Pages, Spidering & Topic Specific Crawling
    • Web Ranking Techniques
      • Hypertext analysis (page rank, hubs and authorities, anchor text)
      • Spamming: keyword stuffing, doorway/jump pages, cloaking, font tricks
    • Datastructures for Scaling Query Processing
      • Index structures
      • Boolean processing
    • Information Extraction from the Web (KnowItAll)
    • Interface Issues
      • Summarization and snippets
      • Clustering results
      • Collaborative filtering, user modeling, adaptive websites
  • Special Topics
    • Advertising
    • Meta-search, query routing.
    • The Semantic Web, Semantic e-mail
    • Cryptography, security, privacy
    • Micropayments, digital cash, server-side wallets, and e-commerce
    • Scaling and clusters


CSE logo Department of Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX