Date | Title | Notes | Reading |
Wed, Sep 26 |
Introduction |
We'll go over class administrivia, as well as an overview of the class topics |
|
Mon, Oct 1 |
End-to-end case study |
The case study introduces the main stages of applying machine learning to systems problem, from gathering data to using the results of ML analyses to affect the system. |
Correlating instrumentation data to system states: A building block for automated diagnosis and control, Cohen et al. OSDI 2004. [PDF]
|
|
Wed, Oct 3 |
ML Basics: Algorithms |
A whirlwind tour of a few families of ML algorithms, including anomaly detectors, classifiers, clustering, reinforcement learning. The goal of this lecture is to give students enough information to roughly map systems problems into a space of applicable ML techniques. |
No reading for today. For detailed, future reference, take a look at the book Pattern Recognition and Machine Learning by Christopher Bishop. |
|
Mon, Oct 8 |
Fault detection |
Failure detection techniques such as anomaly detection. |
Detecting Application-Level Failures in Component-based Internet Services, Emre Kiciman and Armando Fox. IEEE Transactions on Neural Networks, Sep 2005. [PDF] |
|
Wed, Oct 10 |
Fault localization |
Fault localization techniques, such as correlation algorithms. |
Failure Diagnosis Using Decision Trees, Mike Chen et al. ICAC 2004[PDF]
optional Capturing, indexing, clustering and retrieving system history, Cohen et al. SOSP 2005 [PDF]
|
|
Mon, Oct 15 |
no class (SOSP) |
|
|
Wed, Oct 17 |
no class (SOSP) |
|
|
Mon, Oct 22 |
System optimization and control |
Techniques for on-line control of systems |
Using Probabilistic Reasoning to Automate Software Tuning, David Sullivan et al. Harvard TR 2004. [PDF]
optional: A Hybrid Reinforcement Learning Approach to Autonomic Resource Allocation, Gerald Tesauro et al. [PDF]
|
|
Wed, Oct 24 |
Helping humans understand sys behavior |
Using ML techniques to help people recognize patterns of behavior for network protocol inference, topology discovery, etc. |
Towards Highly Reliable Enterprise Network Services Via Inference of Multi-level Dependencies, Bahl et al. SIGCOMM 2007. [PDF]
optional:
Automatically Extracting Fields from Unknown Network Protocols, Karthik Gopalratnam et al. SysML 2006 [PDF]
|
|
Mon, Oct 29 |
Helping humans interpret ML results |
Using simple algorithms and visualization to help interpret
machine learning analyses |
Combining Visualization and Statistical Analysis to Improve Operator Confidence and Efficiency for Failure Detection and Localization, Peter Bodik et al. ICAC 2005 [PDF]
optional: Snitch: Interactive Decision Trees for Troubleshooting Misconfigurations, James Mickens et al. SysML 2007 [PDF]
|
|
Wed, Oct 31 |
Modeling Resource Requirements |
|
Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications. Piyush Shivam et al. VLDB 2006. [PDF]
|
|
Mon, Nov 5 |
On-line Analyses |
Analyzing data on-line and adapting models over time. |
Ensemble of models for automated diagnosis of system performance problems, Zhang et al. DSN 2005. [PDF]
|
|
Wed, Nov 7 |
Security Problems |
Network intrusion, virus detection, spam, etc. |
Timing Analysis of Keystrokes and SSH Timing Attacks, Dawn Song et al. USENIX Security 2001. [PDF]
|
|
Mon, Nov 12 |
Holiday |
Veteran's Day |
|
Wed, Nov 14 |
Bug Finding |
Analyzing source code and runtime behavior to find bugs. |
Scalable Statistical Bug Isolation, Ben Liblit et al. PLDI 2005. [PDF]
|
|
Mon, Nov 19 |
Advanced Topic I: Scaling to large sets of data |
Scaling machine learning analyses to large data sets. |
Map-Reduce for Machine Learning on Multicore. Cheng-Tao Chu et al. NIPS 2006. [PDF]
|
|
Wed, Nov 21 |
Power and thermal management |
Modeling power usage and temperature for improving energy efficiency |
(short paper) ConSil: Low-cost Thermal Mapping of Data Centers, Justin Moore et al. SysML 2006 [PDF]
|
|
Mon, Nov 26 |
Advanced topic II: Distributed algorithms |
Distributed data analysis. Analyzing data without paying costs of centralizing data. |
Communication-Efficient Tracking of Distributed Cumulative Triggers, Ling Huang et al. ICDCS 2007 [PDF]
|
|
Wed, Nov 28 |
Advanced topic III: Malicious adversaries |
Can malicious adversaries manipulate machine learning analyses? Vulnerabilities from training on real-world data, use of robust statistics, etc. |
Can machine learning be secure? Marco Barreno et al. ACM Symp. on Information, Computer and Communications, March 2006. [PDF]
|
|
Mon, Dec 3 |
no class (NIPS conference) |
|
|
Wed, Dec 5 |
no class (NIPS conference) |
|