CSE454 Course Overview

University of Washington Department of Computer Science & Engineering

CSE Home

About Us

Contact Info

Administrivia

Home

Using course email

Email archive

Policies

Content

Overview

Resources

Lecture slides

Assignments

Reading

Project

The following outline is a tentative list of the topics we hope to cover; however, the ordering will be different in order to put crawling and search-engine design earlier.

Introduction
Text Processing

Classification

Naive Bayes Classifier
Information Extraction
Hidden Markov Models
Conditional Random Fields

Similarity Measures & Information Retrieval

Ranking, TF/IDF, precision / recall, stemming, stop words
Latent Semantic Indexing

Clustering

Expectation Maximization

Syntactic Analysis

POS Tagging
Anaphora
Parsing

The Web

Foundations

HTTP, HTML, browser archiecture
Server basics, cookies, log files, dynamic page generation
Web Programming: AJAX, FLEX, Silverlight

Fetching Pages, Spidering & Topic Specific Crawling
Web Ranking Techniques

Hypertext analysis (page rank, hubs and authorities, anchor text)
Spamming: keyword stuffing, doorway/jump pages, cloaking, font tricks

Datastructures for Scaling Query Processing

Index structures
Boolean processing

Information Extraction from the Web (KnowItAll)
Interface Issues

Summarization and snippets
Clustering results
Collaborative filtering, user modeling, adaptive websites

Special Topics

Advertising
Meta-search, query routing.

The Semantic Web, Semantic e-mail

Cryptography, security, privacy
Micropayments, digital cash, server-side wallets, and e-commerce
Scaling and clusters

Department of Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA 98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX