CSE P 501 16wi Overview

Personnel

Instructor: Hal Perkins, perkins[at]cs, CSE 548; 206-543-4784. Office Hours: after class plus whenever the door is open (including before class if I'm not cramming for lecture) or by appointment.

Teaching Assistant: Erin Peach, peach[at]cs. Office hours Tue. 5:30-6:20 pm. CSE 218.

If you need to contact the staff by email, please send mail to csep501-staff[at]cs rather than to individual staff members.

Class Meetings

Tuesdays, 6:30-9:20 pm, CSE 305 and Microsoft building 99 room 1915.
There will be one additional class meeting on Thursday, March 3 from 6:30-8:00 pm. for an exam. The exam will be offered simultaneously in the UW and Microsoft classrooms.

Communications

The course web site is www.cs.washington.edu/csep501/.

Registered students are automatically subscribed to a class mailing list that will be used primarily for occasional announcements from the course staff. Messages are sent to your primary UW email address. Please be sure that is forwarded to wherever you read mail if you do not use that account regularly.

We also have an online discussion board. Please use this to stay in touch outside of class and to help each other out. There is a link on the main course web page to read postings, and you can also adjust your profile to have messages mailed to you if you prefer.

Objective

The goal of the course is to understand how a modern compiler is structured and the major algorithms used to translate code from high-level to machine language. One of the best ways to do this is to actually build a working compiler, so there will be a significant project to implement one that translates programs written in a core subset of Java into executable x86-64 assembly language. The compilers themselves will use scanner and parser generator tools and the default implementation language is Java. Variations on the project, particularly the implementation language, are possible; talk to the instructor if you want to try something different.

In most quarters, most of the students in CSE P 501 have not taken a previous compiler course, so we will cover the basics, but we'll move right along so we can get to more advanced material. Talk to the instructor if you are not sure whether this is the course for you.

Topics

  1. Overview of compilers
  2. Scanners and lexical analysis
  3. Parsing (bottom-up LR; top-down LL and recursive descent)
  4. Static semantics, type checking, and symbol tables
  5. Runtime organization and code shape
  6. Code generation - instruction selection and scheduling
  7. Register allocation
  8. Program analysis, optimization, and program transformations.
  9. Recent topics as time permits. Details tbd, but likely including compilers for dynamic languages, JITs, and others depending on class interests.

Prerequisites

The nominal prerequisites for a compiler course are background equivalent to undergraduate courses in data structures and algorithms, formal languages and automata, and machine organization. Specifically:

  • Data structures and algorithms: linked lists, queues, dictionaries, trees, hash tables, graphs, etc. (e.g., CSE 332 or 373)
  • Formal languages and automata: regular expressions, finite automata, context-free grammars and languages (e.g., CSE 311)
  • Machine organization: Assembly-language level programming for some processor, not necessarily x86 (CSE 351 or 410)

We will review what we need of these topics, so it shouldn't be a problem if things are a bit rusty. If you're missing one of the background courses, it is normally possible to work around it, but you will be on your own to fill the gaps. Talk with the instructor if you're not sure whether you have the appropriate background.

Texts

There are four good, fairly recent compiler textbooks available, none of which clearly dominates the others. Lectures and course materials will draw on all of them (among other sources). Most of these books have been used as the official text for the course at one time or another, and any of them would be suitable as a choice. We picked the Cooper/Torczon book as the "recommended" one because it seems to have the best balance between traditional and more recent material, but you are free to use whichever of these books you like. All four books are on reserve in the Engineering Library.

  • Engineering a Compiler, Cooper and Torczon (2nd ed., Morgan-Kaufman, 2012). This book is particularly good on the design tradeoffs needed to build real compilers. Lecture organization and examples will follow this book to some extent. The new edition has some worthwhile changes, but if you have a copy of the first edition it would also be suitable.
  • Compilers: Principles, Techniques, & Tools, Aho, Lam, Sethi, & Ullman (Addison-Wesley, 2nd ed., 2007). The "Dragon Book". This is the updated edition of the classic compiler textbook. The coverage of grammars, parsing and optimization is very strong (and has the best theoretical treatment - well beyond what we will do); but the material on semantics and type checking is mixed up with intermediate code generation, and there is very little about SSA intermediate representations. If you already have a copy of the first edition, that would also be fine for this course, althought it is pretty dated at this point.
  • Modern Compiler Implementation in Java, Appel (Cambridge University Press, 2nd edition, 2002). This book is oriented around a mini-java project that is the basis of our course project, but it has some idiosyncrasies because of the author's background building compilers for functional languages and because of the close link between the presentation of core material and the project details. Probably the most succinct of the books, and with a good selection of more advanced material. Some past students have found this to be a great match to the course; others have found the presentation to be impenetrable.
  • Crafting a Compiler, Fischer, Crytron and LeBlanc (Addison-Wesley, 2010). The new edition of a classic project-oriented compiler text. It is a reasonably good match to the course and project, but with a different writing (and coding) style compared to the Appel book.

Assignments and Grading

The course will be oriented around the project, covering topics in the order needed for it. The project will be due in phases, both to keep you on schedule, and so we can give you feedback at crucial points to be sure that all is well.

We suggest you work on the project in groups of 2. It is possible to work alone, but past experience has been that this can be too much for one individual in a 10-week quarter (although some of the individual projects have been among the most impressive!). It is often helpful to have a partner to talk to about specific details as the project evolves.

In addition to the project, there will be some written homework assignments to cover topics that don't fit into the project (particularly grammars and parsing at the beginning of the course, and maybe some other topics later). There also will be a single exam late in the quarter. Assignments (and exams, of course) should be done individually.

The course grade will be computed roughly as follows:

  • Project 50%
  • Written assignments 20%
  • Exam 25%
  • Other 5%

The bulk of the project grade will be based on the final results at the end of the quarter, but we will take into account the quality and timeliness of the intermediate steps. The project will be graded primarily on how well it works, but reasonable code quality is expected - it'll be hard for you to do well or for us to evaluate your work without some attention to that. A short written report describing the project will be due at the end of the quarter, and there will be brief group meetings with the course staff to talk about the project after everything is done.

We reserve the right to adjust these percentages if it seems appropriate.

Computing Resources

You are free to use any computer, programming environment, and Java implementation that you wish. We do ask that your code be standard Java, i.e., it will compile and run correctly with a current version (8) of Sun's standard Java tools. If you wish to use another implementation language (C# has been used successfully in the past, Haskell, ML, OCAML, and F# are also good choices), please talk with the instructor first.

To execute the generated code you will need access to a programming environment that can compile and run a C bootstrap program and x86-64 assembly language code. The default environment will be gcc and its associated assembler, but if necessary we should also be able to provide instructions for using Visual Studio and the MASM assembler that is included with it. All of these are installed in the CSE instructional labs and are available free for use on your personal machines. (Visual Studio can be obtained through the Microsoft DreamSpark program for CSE students if you don't already have it.)

All groups will be provided with a source code repository on the CSE GitLab server. You must store your project code there and course staff will retrieve projects from there for grading.

Academic Integrity

This should go without saying, but it needs to be written down to avoid any misunderstandings.

Your project assignments should be done only by members of your group. Other assignments should be done by you alone, unless otherwise specified. If cases of academic misconduct are discovered we will pursue appropriate penalties under college and university rules.

However, we also want to be clear on what is legitimate collaboration -- please help each other out in this class in appropriate ways! It is OK to help other students debug their programs, and to discuss general approaches to solving problems. However, it is not OK to copy someone else's code or homework solution and turn it in as your own work - or to provide materials to others so they can do this. This also applies to using code or solutions from previous quarters or other courses.

Exams must of course be done on your own. The exam will allow some use of notes and textbooks, with exact details to be specified.

Please refer to the separate academic integrity guidelines for more details. You are responsible for every word in that document as well as in this one.

Late Assignments and Incompletes

Assignments will be submitted online and are due at the time and date given in the assignment. We will try to be flexible about outside commitments (jobs, family events) that interfere with this. Please talk to the instructor if something happens that prevents you from submitting work on time.

Incompletes cannot be given simply because assignments were not done on time. They are only appropriate when circumstances outside your control prevent you from completing the course on schedule (illness, family emergency, etc.). Again, please talk to the instructor if something happens where this might be a necessary outcome.