CSE P 501 23au Compiler Construction
Information and Syllabus

Personnel

Instructor: Hal Perkins, CSE 548; 206-543-4784. Office Hours: after class plus whenever the door is open (including before class if I'm not cramming for lecture) or by appointment.

Teaching Assistant: Aragorn Crozier. Office hours TBD. We will also have help from the TAs for the related CSE 401 and CSE M 501 compiler courses.

If you need to contact the staff, please use the discussion board (private postings are fine) or send email to csep501-staff[at]cs rather than to individual staff members It should help give you a faster response, and it helps us keep in touch with active issues.

Class Meetings

Tuesdays, 6:30-9:20 pm, CSE2 G10. There will be one additional class meeting on Thursday, November 30 from 6:30-8:00 pm. for an exam. More details on that extra meeting will be provided later in the quarter.

Communications

The course web site is www.cs.washington.edu/csep501/. Look there for information about the course including meeting times, staff, office hours, communications, etc.

We have an online ed discussion board (see message board link at the top of the page) for all registered students. Please use this to stay in touch outside of class and to help each other out. In addition to the link above, you can also adjust your profile to be notified of messages or have them emailed to you. You can also use ed for private messages containing things like assignment details or other information that should not normally be posted publicly. Note that you need to use your UW Google identity to access the discussion board since membership is maintained automatically using UW's enrollment data for the class.

For other questions, including health issues or other personal circumstances, or for communications involving both you and your partner about the compiler project, please send mail to csep501-staff[at]cs. That will help us keep track of things that need further followup from the course staff to better help you.

Objective

The goal of the course is to understand how a modern compiler is structured and the major algorithms used to translate code from high-level to machine language. One of the best ways to do this is to actually build a working compiler, so there will be a significant project to implement a compiler that translates programs written in a core subset of Java into executable x86-64 assembly language. The compilers themselves will use scanner and parser generator tools, and the default implementation language is Java. Variations on the project may be possible; talk to the instructor if you want to try something different.

Normally most of the students in CSE P 501 have not taken a previous compiler course, so we will cover the basics, but we'll move right along so we can get to more advanced material. Talk to the instructor if you are not sure whether this is the right course for you. If we discover that student backgrounds are significantly different this year we'll make appropriate adjustments.

Topics

Prerequisites

The nominal prerequisites for a compiler course are background equivalent to undergraduate courses in data structures and algorithms, formal languages and automata, and machine organization. Specifically:

We will review what we need of these topics, so it shouldn't be a problem if things are a bit rusty, and the textbook has good presentations of the core ideas. If you're missing one of the background courses, it is normally possible to work around it, but you will be on your own to fill the gaps. Talk with the instructor if you're not sure whether you have the appropriate background.

Texts

There are several good textbooks for introductory compiler courses. The book by Cooper & Torczon is the primary text for the course since it seems to be the best overall match, and all of the assigned readings will come from it. It is the only one that you really need for the course. The Appel book is the original source of the minijava project, although some details of our version will be different. The Fischer, Cytron & Leblanc book is very close in structure to what we're doing. The Dragon Book (Aho, et al.) is a classic.

The 2nd edition of the Cooper/Torczon book (at least) is available through Safari Books Online, and UW students have free access to all Safari books through the UW libraries. Go to https://www.oreilly.com/online-learning/, click sign-in at the top right, and sign in with your netid@uw.edu email address to access them.

Primary text: Engineering a Compiler, Cooper & Torczon, 2nd ed. M-K 2012, 3rd ed. 2022. errata and links. The new third edition has useful updates, but for much of what we need, either the second or third edition should be suitable for this quarter. This book is particularly good on the design tradeoffs needed to build real compilers. Lecture organization and examples will follow this book to some extent.
Also useful: Modern Compiler Implementation in Java, Appel, Cambridge, 2nd ed, 2002. This book is oriented around a mini-java project that is the basis of our course project, but it has some idiosyncrasies because of the author's background building compilers for functional languages and because of the close link between the presentation of core material and the project details. Probably the most succinct of the books, and with a good selection of more advanced material. Some past students have found this to be a great match to the course; others have found the presentation to be impenetrable.
Also useful: Compilers: Principles, Techniques, & Tools, Aho, Lam, Sethi & Ullman, A-W, 2nd ed, 2007. The "Dragon Book". This is the updated edition of the classic compiler textbook. The coverage of grammars, parsing and optimization is very strong (and has the best theoretical treatment - well beyond what we will do); but the material on semantics and type checking is mixed up with intermediate code generation, and there is very little about SSA intermediate representation, which is a key data structure in most current compilers.
Also useful: Crafting a Compiler, Fischer, Cytron, and LeBlanc, A-W, 2010. Another project-oriented compiler text.

Assignments and Grading

The course will be oriented around the project, covering topics in the order needed for it. The project will be due in phases, both to keep you on schedule, and so we can give you feedback at crucial points to be sure that all is well.

We suggest you work on the project in groups of two. It is possible to work alone, but past experience has been that this is usually too much for one individual in a 10-week quarter (although some of the individual projects have been among the most impressive!). It is very helpful to have a partner to talk with about specific details as the project evolves. If you do wish to work alone, you must talk to the instructor first to be sure this is feasible.

In addition to the project, there will be some written homework assignments to cover topics that don't fit into the project (particularly grammars and parsing at the beginning of the course, and some other topics later). There also will be a single exam late in the quarter. Assignments (and exams, of course) should be done individually.

The course grade will be computed roughly as follows:

For the project, what ultimately matters is where you wind up at the end of the quarter, but there will be intermediate milestones due every couple of weeks to keep everyone on schedule and to provide feedback at important points. The project will be graded primarily on how well it works, but reasonable code quality is expected - it'll be hard for you to do well or for us to evaluate your work without some attention to that. The final version of the compiler at the end of the quarter will count for roughly half of the overall project grade, with the quality and timeliness of the intermediate deadlines counting for the other half. A short written report describing the project will also be due at the end of the quarter. We may schedule brief group meetings with the course staff to talk about the project after everything is done, but this is not certain yet.

We reserve the right to adjust these percentages if it seems appropriate.

Computing Resources

You are free to use any computer, programming environment, and Java implementation that you wish. We do require that your code be standard Java, i.e., it will compile and run correctly with Java 17, which is the long-term-support version installed in the CSE labs for this quarter. If you have convincing reasons for wanting to use another implementation language, this might be possible. Please talk to the instructor first.

To execute the generated code you will need access to a programming environment that can compile and run a C bootstrap program and x86-64 assembly language code. The default environment is the Linux gcc toolchain installed in the Allen School instructional labs. A virtual machine image is also available to run on personal x86-64 machines. Your generated code must assemble and run there when tested.

All groups will be provided with a source code repository on the CSE GitLab server. You must store your project code there and course staff will retrieve projects from there for grading. You must not store your compiler or other homework solutions on any publicly accessible web site, repository, or other system.

Academic Integrity

This should go without saying, but it needs to be written down to avoid any misunderstandings.

Your project assignments should be done only by members of your group. Other assignments should be done by you alone, unless otherwise specified. If cases of academic misconduct are discovered we will pursue appropriate penalties under college and university rules.

However, we also want to be clear on what is legitimate collaboration -- please help each other out in this class in appropriate ways! It is OK to help other students debug their programs, and to discuss general approaches to solving problems. However, it is not OK to copy someone else's code or homework solution and turn it in as your own work - or to provide materials to others so they can do this. This also applies to using code or solutions from previous quarters of this course or other courses.

Exams must of course be done on your own. The exam will allow some use of notes and textbooks, with exact details to be specified.

Please refer to the separate academic integrity guidelines for more details. You are responsible for every word in that document as well as in this one.

Late Assignments and Incompletes

Assignments will be submitted online and are due at the time and date given in the assignment. We will try to be flexible about outside commitments (jobs, family events) that interfere with this. Please talk to the instructor as soon as possible if something happens that prevents you from submitting work on time or if unusual circumstances create problems collaborating with your partner on the project. We will do our best to help resolve the situation.

Incompletes cannot be given simply because assignments were not done on time. They are only appropriate when circumstances outside your control late in the quarter prevent you from completing the course on schedule (illness, family emergency, etc.) and your work has been satisfactory up to that point. Again, please talk to the instructor if something happens where this might need to be considered.

Accommodations

Please refer to the university policies regarding accommodations and religous accommodations. Those policies apply in this class as everywhere else at UW. Please contact the instructor, DRS, or the course staff as needed so we can help.