The following outline is a tentative list of the topics we hope to
- Introduction
- Text Processing
- Classification
- Naive Bayes Classifier
- Information Extraction
- Hidden Markov Models
- Conditional Random Fields
- Similarity Measures & Information Retrieval
- Ranking, TF/IDF, precision / recall, stemming, stop words
- Latent Semantic Indexing
- Clustering
- Syntactic Analysis
- POS Tagging
- Anaphora
- Parsing
- Scaling
- N-tier architecture, networks of workstations
- Map reduce
- The Web
- Foundations
- HTTP, HTML, browser archiecture
- Server basics, cookies, log files, dynamic page generation
- Web Programming: AJAX, FLEX, Silverlight
- Fetching Pages, Spidering & Topic Specific Crawling
- Web Ranking Techniques
- Hypertext analysis (page rank, hubs and authorities, anchor text)
- Spamming: keyword stuffing, doorway/jump pages, cloaking, font tricks
- Datastructures for Scaling Query Processing
- Index structures
- Boolean processing
- Information Extraction from the Web (KnowItAll)
- Interface Issues
- Summarization and snippets
- Clustering results
- Collaborative filtering, user modeling, adaptive websites
- Special Topics
- Meta-search, query routing.
- The Semantic Web, Semantic e-mail
- Cryptography, security, privacy, P3P
- Micropayments, digital cash, server-side wallets, and e-commerce