CSE504: Learning & statistical methods for software engineering & program analysis

Schedule: The class meets Mon & Wed at 10:30-11:50, in room CSE 303. (Except in room 503 on January 20 and 27.)

Class calendar

This class will explore the application of machine learning and statistical methods to problems in software engineering and program analysis. No background in these topics is necessary; you will learn everything you need to know.


Program analysis solves developers' problems related to debugging, extending, and maintaining their programs. Traditional approaches aim for precision, assuming that the program has an exact specification and can be analyzed using formal techniques. These assumptions are not valid for real-world programs; relaxing these assumptions leads to more effective techniques and more useful tools for programmers.

Software consists of more than just the program text; it also includes tests, configurations, example executions, the context in which the program is run, natural-language documentation for both users and developers, etc. Executions can be analyzed in terms of the code or command-lines, or in terms of the dynamic execution. Even the program text is much more than programming-language statements and expressions, because information is conveyed by code comments, variable names, layout, and relationships between parts of the code. Furthermore, a version control system stores a history of changes to all of these artifacts.

These rich sources of information can help programmers improve their code, such as by finding functional or performance bugs, fixing bugs, or preventing bugs. The information sources can be processed to create specifications, documentation, test cases, architecture, bug fixes, and other important artifacts. Doing so, however, requires use of sophisticated techniques from machine learning, statistics, and natural language processing to extract information and organize structure from the artifacts.

Structure of the class

The class is organized around discussion and a group project. There are no other homeworks nor exams.

Each class session will be either a presentation of background material or a discussion of a recent result from program analysis and/or machine learning. Each student will present one or two papers to the class.

To solidify and apply their knowledge, students will work in small groups of 3-4 people to produce a research result that combines machine learning and program analysis. Groups that combine students with different backgrounds are particularly encouraged. Synergy with existing projects is permitted but not required; for example, choosing a problem or technique from your research, or just addressing a problem that you have encountered. The instructor will also offer suggestions for projects.

Mailing list

To get on the mailing list, either sign up for the class or send email to Michael Ernst asking to be manually added to the mailing list.


Contact Michael Ernst (mernst@cs.washington.edu).

I look forward to seeing you in the class!