CSE 599N: Special Topics (Autumn 2007)

Systems Applications of Machine Learning Techniques


Wed, Sep 26 Introduction We'll go over class administrivia, as well as an overview of the class topics

Mon, Oct 1 End-to-end case study The case study introduces the main stages of applying machine learning to systems problem, from gathering data to using the results of ML analyses to affect the system. Correlating instrumentation data to system states: A building block for automated diagnosis and control, Cohen et al. OSDI 2004. [PDF]

Wed, Oct 3 ML Basics: Algorithms A whirlwind tour of a few families of ML algorithms, including anomaly detectors, classifiers, clustering, reinforcement learning. The goal of this lecture is to give students enough information to roughly map systems problems into a space of applicable ML techniques. No reading for today. For detailed, future reference, take a look at the book Pattern Recognition and Machine Learning by Christopher Bishop.

Mon, Oct 8 Fault detection Failure detection techniques such as anomaly detection. Detecting Application-Level Failures in Component-based Internet Services, Emre Kiciman and Armando Fox. IEEE Transactions on Neural Networks, Sep 2005. [PDF]

Wed, Oct 10 Fault localization Fault localization techniques, such as correlation algorithms. Failure Diagnosis Using Decision Trees, Mike Chen et al. ICAC 2004[PDF]
optional Capturing, indexing, clustering and retrieving system history, Cohen et al. SOSP 2005 [PDF]

Mon, Oct 15 no class (SOSP)

Wed, Oct 17 no class (SOSP)

Mon, Oct 22 System optimization and control Techniques for on-line control of systems Using Probabilistic Reasoning to Automate Software Tuning, David Sullivan et al. Harvard TR 2004. [PDF] optional: A Hybrid Reinforcement Learning Approach to Autonomic Resource Allocation, Gerald Tesauro et al. [PDF]

Wed, Oct 24 Helping humans understand sys behavior Using ML techniques to help people recognize patterns of behavior for network protocol inference, topology discovery, etc. Towards Highly Reliable Enterprise Network Services Via Inference of Multi-level Dependencies, Bahl et al. SIGCOMM 2007. [PDF]
optional: Automatically Extracting Fields from Unknown Network Protocols, Karthik Gopalratnam et al. SysML 2006 [PDF]

Mon, Oct 29 Helping humans interpret ML results Using simple algorithms and visualization to help interpret machine learning analyses Combining Visualization and Statistical Analysis to Improve Operator Confidence and Efficiency for Failure Detection and Localization, Peter Bodik et al. ICAC 2005 [PDF]
optional: Snitch: Interactive Decision Trees for Troubleshooting Misconfigurations, James Mickens et al. SysML 2007 [PDF]

Wed, Oct 31 Modeling Resource Requirements Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications. Piyush Shivam et al. VLDB 2006. [PDF]

Mon, Nov 5 On-line Analyses Analyzing data on-line and adapting models over time. Ensemble of models for automated diagnosis of system performance problems, Zhang et al. DSN 2005. [PDF]

Wed, Nov 7 Security Problems Network intrusion, virus detection, spam, etc. Timing Analysis of Keystrokes and SSH Timing Attacks, Dawn Song et al. USENIX Security 2001. [PDF]

Mon, Nov 12 Holiday Veteran's Day

Wed, Nov 14 Bug Finding Analyzing source code and runtime behavior to find bugs. Scalable Statistical Bug Isolation, Ben Liblit et al. PLDI 2005. [PDF]

Mon, Nov 19 Advanced Topic I: Scaling to large sets of data Scaling machine learning analyses to large data sets. Map-Reduce for Machine Learning on Multicore. Cheng-Tao Chu et al. NIPS 2006. [PDF]

Wed, Nov 21 Power and thermal management Modeling power usage and temperature for improving energy efficiency (short paper) ConSil: Low-cost Thermal Mapping of Data Centers, Justin Moore et al. SysML 2006 [PDF]

Mon, Nov 26 Advanced topic II: Distributed algorithms Distributed data analysis. Analyzing data without paying costs of centralizing data. Communication-Efficient Tracking of Distributed Cumulative Triggers, Ling Huang et al. ICDCS 2007 [PDF]

Wed, Nov 28 Advanced topic III: Malicious adversaries Can malicious adversaries manipulate machine learning analyses? Vulnerabilities from training on real-world data, use of robust statistics, etc. Can machine learning be secure? Marco Barreno et al. ACM Symp. on Information, Computer and Communications, March 2006. [PDF]

Mon, Dec 3 no class (NIPS conference)

Wed, Dec 5 no class (NIPS conference)

Other Places to look for Systems and Machine Learning Papers:
  • SysML 2006-2007
  • MINENET 2005-2007
  • Ocassionally: ICAC, DSN, SOSP/OSDI, SIGCOMM, VLDB, ...

Contact: sumitb@microsoft.com and emrek@microsoft.com