|  |  |   |  | The following outline is a tentative list of the topics we hope to
cover; however, the ordering will be different in order to put crawling and
search-engine design
earlier. 
Introduction
Text Processing
   
    Classification
      
       Naive Bayes Classifier
       Information Extraction
       Hidden Markov Models
       Conditional Random Fields
       Similarity Measures & Information Retrieval
      
       Ranking, TF/IDF, precision / recall, stemming, stop words
       Latent Semantic Indexing
       Clustering
      
    Syntactic Analysis
      
       POS Tagging
       Anaphora
       Parsing
      The Web
   
    Foundations
      
       HTTP, HTML, browser archiecture
       Server basics, cookies, log files, dynamic page generation 
       Web Programming: AJAX, FLEX, Silverlight
      Fetching Pages, Spidering & Topic Specific Crawling
    Web Ranking Techniques
      
       Hypertext analysis (page rank, hubs and authorities, anchor text)
       Spamming: keyword stuffing, doorway/jump pages, cloaking, font tricks
        Datastructures for Scaling Query Processing
       
       Index structures
       Boolean processing
       Information Extraction from the Web (KnowItAll)
   Interface Issues
      
        Summarization and snippets
       Clustering results 
       Collaborative filtering, user modeling, adaptive websites
      Special Topics
  
     Advertising 
    Meta-search, query routing.The Semantic Web, Semantic e-mail Cryptography, security, privacy
   Micropayments, digital cash, server-side wallets, and e-commerce
   Scaling and clusters
    
 |