Case Study I: Estimating Click Probabilities [+] Expand All[+]
Learning task I: Predicting click probabilities
X -> [0,1], where X is webpage text, query keywords, features of user, etc.
- Linear Model
- Online Learning
- Basic regularization (L2)
- Challenge: High dimensional feature space
- Challenge: Changing dimensionality of the feature space
- Advanced approach: Sketching (Bloom filter, Count-Min sketch, hash kernels)
Learning task II: Personalization
- Multitask learning
- Hashing kernel
Lectures:
- 1. Jan 8: Intro. Linear model for estimating click
probabilities, logistic regression, gradient descent.
[Intro slides] [LR slides] [LR annotated slides] - 2. Jan 10: Online learning, Perceptron, kernel trick, kernelized Perceptron.
[Regularization,Perceptron slides] [Regularization,Perceptron annotated slides] - 3. Jan 15: Kernel trick continued, stochastic gradient
descent (SGD).
[Kernelized perceptron, SGD slides] [Kernelized perceptron, SGD annotated slides] - 4. Jan 17: SGD continued, hashing and sketching.
[ SGD continued, hashing and sketching slides]
Case Study II: Document Retrieval [+] Expand All[+]
Learning task I: Finding similar documents
- K-NN with tf-idf
- Challenge: Large dataset
- Advanced approach: Fast nearest neighbor search (KD-trees)
Learning task II: Clustering documents
- k-means
- Generative mixtures (GMM --> EM)
- Spectral clustering
- Challenge: Document may belong to multiple clusters
- Advanced approach: Mixed membership models (LDA --> sampling methods)
Lectures:
- 5. Jan 22: Document retrieval task, k-NN, tf-idf, fast k-NN (exact = KD-trees).
[ sketching continued, k-NN, and kd-trees slides] [ sketching continued, k-NN, and kd-trees slides] - 6. Jan 24: Approximate neighbor finding, locality-sensitive
hashing (LSH), random projections, hash kernels, multi-task
learning for personalization.
[ kd-trees continued, LSH, hash kernels, mixture models slides] [ kd-trees continued, LSH, hash kernels, mixture models annotated slides] - 7. Jan 29: Clustering: GMM, EM.
[ mixture models and EM slides][ mixture models and EM slides annotated] - 8. Jan 31: Map-Reduce for parallel programming and for k-means.
[ Map-Reduce and K-Means slides] [ Map-Reduce and K-Means annotated slides] - 9. Feb 5: LDA, Gibbs sampling for LDA.
[ MAP EM, LDA, and sampling slides] [ MAP EM, LDA, and sampling annotated slides] - 10. Feb. 7: Variational methods and online variational for LDA.
[ Collapsed sampling, variational, and stochastic variational methods for LDA slides] [ Collapsed sampling, variational, and stochastic variational methods for LDA annotated slides] - 11. Feb 12: Spectral clustering.
[ Spectral clustering slides][ Spectral clustering annotated slides]
Case Study III: fMRI Prediction [+] Expand All[+]
Learning task I: Prediction word probability from fMRI
X -> [0,1]. X is the fMRI image and the response is the word probability.
- Linear models (logitstic regression)
- Challenge: The dimension of the feature space is much larger than the sample size (p >> n)
- Advanced approach: LASSO
Learning task II: zero shot learning. Predicting fMRI image from unseen word.
- Challenge: Models for words never seen (generalization).
- Use features of words (co-occurrence from large-scale text corpus).
- Learning from people (Mechanical Turk).
- Challenge: Capturing correlations between voxels.
- Advanced approach: Graphical LASSO.
Parallel Learning:
- Challenge: Scaling up to large sample size and feature dimensions
- Stochastic gradient descent
- Stochastic coordinate descent (Shotgun)
- Dual averaging methods
Lectures:
- 12. Feb 14: fMRI task, zero-shot learning, ridge regression.
[ fMRI task, zero-shot learning, and ridge regression slides] [ fMRI task, zero-shot learning, and ridge regression annotated slides] - 13. Feb 19: LASSO, regularization path, sparsistency, zero-shot learning with Mechanical Turk.
[ LASSO slides][ LASSO annotated slides] - 14. Feb 21: LARS, fused LASSO, Shotgun, stochastic coordinate descent (SCD), averaging methods.
[ LARS and fused LASSO slides][ Shotgun, SCD, and parallel learning slides][ LARS and fused LASSO annotated slides] - 15. Feb 26: Graphical LASSO.
[ Graphical LASSO slides] [ Graphical LASSO annotated slides] [ Shotgun, SCD, and parallel learning annotated slides]
Case Study IV: Collaborative Filtering [+] Expand All[+]
Learning task I: Predict the rating from user to movie.
- Matrix Factorization
- Challenge: Cold-start problem.
- Advanced approach: Incorporating features in matrix factorization for zero shot learning.
- Challenge: Scalability.
- Advanced approach: Parallel learning with GraphLab.
Lectures:
- 16. Feb 28: Graphical Lasso conclusion, collaborative filtering, matrix factorization, alternating least-squares, stochastic gradient descent for matrix factorization.
- 17. Mar 5: Guest Lecture: Ralf Herbrich will
present an industrial perspective on ML for Big Data, based on his
experiences at Amazon, Facebook and Microsoft.
[Ralf Herbrich's slides] - 18. Mar 7: Alternating least squares, SGD for matrix
factorization, nonnegative matrix factorization.
[Matrix factorization, ALS, SGD, NMF slides] [Matrix factorization, ALS, SGD, NMF annotated slides] - 19. Mar 12: Cold-start problem, feature-based
collaborative filtering. Graph-parallel problems.
[Cold-start slides] [Graph-parallel slides] [Cold-start slides] [Graph-parallel slides] - 20. Mar 14: GraphLab, distributed matrix factorization,
graph-parallel ML algorithms.
[GraphLab slides]