CSE 401/M501 24sp Project IV - Code Generation

Due: Thursday, May 23 at 11:59 pm for CSE 401. Codegen for CSE M 501 is due with the complete project including extensions at the end of the quarter. You will "turn in" your project as you did with previous assignments by pushing it to your GitLab repository and providing a suitable tag. See the end of this writeup for details.

Added 5/21: All CSE 401 groups can use 2 late days to finish codegen and clean up any lingering bugs from earlier in the compiler project, even if your group already has used all or most of your project late days earlier in the quarter. The final compiler project with codegen and all corrections for CSE 401 groups must be committed, pushed, and tagged by 11:59 pm on Saturday, May 25. No project submissions will be accepted from CSE 401 groups after this final deadline. (CSE M 501 students have additional time to finish their complete project and will turn in codegen plus their M 501 extensions all at once at the end of the last week of the quarter. See the CSE M 501 project additions writeup and the calendars for details.)

Overview

The purpose of this part of the project is to add code generation to your MiniJava compiler so that it can produce x86-64 assembly code, and add the runtime support needed to execute compiled programs.

This also is the final part of the project. Any problems or bugs discovered or remaining from previous parts of the project sequence must be fixed by the time codegen is done. This final version of the compiler will be checked initially to see how well code generation works (this specific part of the project) and then will receive an overall evaluation of how well the compiler implements MiniJava and satisfies all of the requirements of the various parts of the project, as well as any extra credit added to the basic requirements. CSE M 501 projects will receive a similar evaluation when they are finished, and that evaluation will include the required extra work for CSE M 501 students.

We suggest that you use the simple code generation strategy outlined in class to be sure you get running code by the deadline, although you are free to do something different (i.e., better) if you have time. Whatever strategy you use, remember that simple, correct, and working is better than clever, complex, and not done.

We also strongly suggest thorough testing after you implement each part of the code generator. Debugging of code generators can be difficult, and you will make your life easier if you find bugs early, before your generator is too complex. Using a test-driven development approach has also been effective in the past for groups that have tried it -- i.e., writing tests for particular language features prior to writing the code generation that implements them. Whether or not you do things exactly this way, be sure to test the code generation for each language feature as you add it before going on to the next.

Modify your MiniJava main program so that when it is executed with no options using the command

    java MiniJava filename.java

it will read a MiniJava program from the named input file, parse it and perform semantics checks, and then print on standard output (stdout) a x86-64 gcc-compatible assembly-language translation of the input program. The MiniJava compiler should not attempt to write its output to a .s or other named file -- the generated code should be written to standard output. Of course when you run the compiler you can use shell commands or ant build script options to redirect stdout to a named file, and this may be particularly useful for testing.

If translation is successful, the compiler should terminate with an exit code of 0 (System.exit(0)). If any errors are detected in the input program, including syntax, or static semantics/type-checking errors, the compiler should terminate with an exit code of 1 (System.exit(1)). If errors are detected, the compiler does not need to produce any assembly language code, and, in fact, should probably exit without attempting to do so. (The code generator should be able to assume it is translating a correct MiniJava program and should not need to include error checks or special cases to deal with incorrect input source code or errors in the AST.) Any error messages produced by the compiler should continue to be written to standard error (stderr).

The output program should be a correct translation of the original MiniJava program and the generated code should not produce runtime errors like segfaults, to the extent this is reasonable. In particular, if a MiniJava program attempts to access an array element with an illegal (out of bounds) subscript, execution of the compiled program should be terminated with an appropriate error message. It is up to you whether the message contains the source line number of the error, although it would be useful to include this. You do not need to generate code to check all object references for possible null pointers or deal with other situations where it would be unreasonable to try to detect problems.

The java command shown above will also need a -cp argument or CLASSPATH variable as before to locate the compiled .class files and libraries. See the scanner assignment if you need a refresher on the details.

Your MiniJava compiler should still be able to print out scanner tokens if the -S option is used; the -P and -A options should continue to print the AST; and -T should still cause the compiler to print symbol tables with information gathered during the static semantics phase. There is no requirement for how your compiler should behave if more than one of -A, -P, -S or -T is specified at the same time, or whether your compiler should generate code if one or more of these options are provided. That is up to you.

Implementation Strategy

Code generation incorporates many more-or-less independent tasks. One of the first things to do is figure out what to implement first, what to put off, and how to test your code as you go. The following sections outline one reasonable way to break the job down into smaller parts. We suggest that you tackle the job in roughly this order so you can get a small program compiled and running quickly, and add to the compiler incrementally until you're done. Your experience implementing the first parts of the code generator also should give you insights that will ease implementation of the rest.

Integer Expressions and System.out.println

Get a main program containing System.out.println(17) to run. Then add code generation for basic arithmetic expressions including only integer constants, +, -, *, and parentheses. You will also need to generate the basic function prologue and exit code for the MiniJava main method, which is called from the bootstrap code (boot.c) using standard x86-64 C language calling conventions.

Object Creation and Method Calls

Next, try implementing objects with methods, but without instance variables, method parameters, or local variables. This includes:

Operator new, i.e., create an object with a method table (vtable) pointer, but no fields
Generation of method tables for simple classes that don't extend other classes
Methods with no parameters or local variables.

Once you've gotten this far, you should be able to run programs that create objects and call their methods. These methods can contain System.out.println statements to verify that objects are created and that evaluation and printing of arithmetic expressions works in this context.

Variables, Parameters, & Assignment

Next, try adding:

Integer parameters and local variables in methods, including assigning stack frame locations for variables
Parameters and variables in expressions
Assignment statements involving parameters and local variables

Suggestions: Some of the complexity in dealing with methods is properly handling registers during method calls. It can help to develop and test this incrementally -- first a function with a single, simple argument; then multiple arguments; then arguments that require evaluation of nested method calls. One strategy to deal with nested method calls is to always save any active parameter registers in the new method stack frame at the beginning of the generated code for a method - right after the method prologue that saves the base pointer and allocates the stack frame. That way copies of a method's parameters can always be located at a known offset in the stack frame, regardless of what has happened to the parameter registers. Another useful trick for detecting some problems with register usage is to generate debugging code in each method to clobber all of the argument and transient (not required to be saved) registers by storing easy-to-detect junk values in all of them (something like 0xFEFEFE... or 0xBADBADBAD...) right before the method returns. That will catch situations where the calling code is accidentally depending on register values being preserved across a method call when that is not guaranteed by the function call register conventions.

Control Flow

This includes:

While loops
If statements
Boolean expressions, but only in the context of controlling conditional statements and loops

Classes and Instance Variables

Add the remaining code for classes that don't extend other classes, including calculating object sizes and assigning offsets to instance variables (object fields), and using instance variables in expressions and as the target of assignments. At this point, you should be able to compile and execute substantial programs.

Extended Classes

The main issue here is generating the right object and method table layouts for extended classes, including handling method overriding and additional instance variables in subclasses properly. Once you've done that, dynamic dispatching of method calls should work, and you will have almost all of MiniJava working.

Arrays

We suggest you leave this until late in the project, since you can get most everything else working without arrays.

The Rest

Whatever is left, including items like storable Boolean values, which are not essential to the rest of the project, and any extensions you've added to MiniJava or to your compiler.

C Bootstrap

As discussed in class, the easiest way to run the compiled code is to call it from a trivial C program. That ensures that the stack is properly set up when the compiled code begins execution, and provides a convenient place to put other functions that provide an interface between the compiled code and the outside world.

We have provided a small bootstrap program, boot.c, in the src/runtime directory of the starter code and we suggest you start with this. Feel free to embellish this code as you wish. In particular, you may find that it is sometimes easier to have your compiler generate code that calls a C runtime function to do something instead of generating the full sequence of instructions directly in the assembly code. This can be particularly useful for implementing part of the code to terminate execution if an array subscript is out of bounds. You can add such functions to boot.c. Be sure to update your src/runtime/boot.c file with your changes. We will use the file found there to run your compiled code.

Executing x86-64 Code with gcc

Your compiler should produce output containing x86-64 assembly language code suitable as input to the GNU assembler that is part of the Linux gcc toolchain. You can compile and execute your generated code and the bootstrap program using gcc, and you can use gdb to debug it at the x86-64 instruction level.

There is a sample assembler file demo.s in src/runtime that demonstrates the linkage between boot.c and assembler code. This demo file does not contain a full MiniJava program, and the code produced by your compiler will be different. In particular, demos.s has no objects and uses ordinary x86-64 function calling conventions, not Java methods with this pointers, dynamic dispatch, and vtables for method calls. Still, it should give you a decent idea of how the runtime setup is designed to work. You can use demo.s and boot.c as input to gcc to generate an executable demo program with the command gcc -g -o demo demo.s boot.c. You can also use gcc to generate additional examples of x86-64 assembly code. If foo.c contains C code, gcc -S foo.c will compile it and create a file foo.s with the corresponding x86-64 code. The Compiler Explorer website at godbolt.org is also a great resource for experimenting with x86-64 code (although it currently seems to use Intel/Microsoft assembler syntax instead of AT&T/Linux by default. You can specify the -masm=att compiler option to tell gcc to use the AT&T format instead).

The output produced by your compiler should compile and run on the Allen School 64-bit linux systems. Our baseline system for testing is attu, which is the same setup as the linux workstations in the CSE labs. You can also use a CSE Linux VM to test code on your own computer if your machine has an x86-64 processor (see the CSE Home Virtual Machines page for details.)

You should test your compiler by processing several MiniJava programs. By the time you're done you should be able to compile any of the MiniJava example programs distributed with the starter code. Since a legal MiniJava program is also a legal full Java program, you can compare the behavior of programs compiled by your MiniJava compiler with the results produced when the same program is compiled and executed using javac/java.

You should continue to use your CSE 401 gitlab repository to store the code for this and remaining parts of the compiler project.

What to Hand In

As with previous parts of the project you should include a brief file, called codegen-notes.txt this time, in the Notes/ top-level directory of your project describing any interesting or unusual things about your project, including notes about extensions, clever code generation strategies, or other interesting things in this phase of the compiler. You should give a brief description of how much is working and any major surprises (either good or bad) you encountered along the way. In particular, if this phase of the project required going back and making changes to previously implemented parts, give a brief description of what was done and why it was needed. This file should only discuss work done this phase of the project, including any changes needed to previous code. After finishing the complete compiler project at the end of the quarter you will be asked to prepare a (short) summary report about the entire project. Details about that will be supplied later in a separate assignment.

Also, as in previous parts of the project, your codegen-notes.txt file should also briefly describe how you and your partner managed the work for this part of the project: how the work was organized, who did what, and how much of the work was done by each partner.

As before you will submit this part of the project by pushing code to your GitLab repository. Once you are satisfied that everything is working properly, create a codegen-final tag and push that to the repository. Then we strongly suggest that you create a fresh clone of your repository on attu in some completely different temporary directory, checkout the codegen-final tag, and verify that everything works as expect, including doing running an ant command to build everything after doing an ant clean to remove previously compiled code. If necessary, fix any problems back in your regular working copy of the code, push the changes to the repository, and update the codegen-final tag to refer to the correct commit in the repository before you make a new clone and test again.