CSE 413 Autumn 2007

Assignment 8 - D Code Generation

Due: Electronic turnin of entire compiler, including test output and written report, due Friday, Dec. 7, by 8:00 am. Printouts of your code, tests cases as well as output, and written report due at the beginning of class on Friday, Dec. 7.

You may work with a partner on this assignment. If you have a partner from the previous compiler assignments, you should continue working with that person. 

Overview

For this assignment, add code generation to your compiler. When the finished compiler is executed, it should open a D source program file, compile it, and produce a .s assembler text file containing an x86 assembly language version of the D program. Source lines from the D file, including comments and whitespace, should appear as comments in the assembler code, with each source line near the code generated from it.

Most of the work in this assignment consists of adding new code to the existing parser. While you might find it useful to create some extra functions or data structures, the bulk of the changes will be additions to existing parser functions.  A separate handout contains details about code generation for D programs.

Executing Compiled Code

The easiest way to run the compiled x86 code is to call it from a trivial C program as if it were an ordinary function.  That ensures that the stack is properly set up when the compiled code begins executing, and provides a convenient place to put functions that provide an interface between the compiled code and the external world (get and put).

C Bootstrap Program

The bootstrap program is named druntime.c (right click on the file name to download a copy). You should use this program to run your compiled code.  The file contains code for the get and put functions as well as the main function where program execution will begin. Looking at the contents of the main function in druntime.c (shown below), you can see that all it does is call a function named "d$main", (which we can see from the function prototype returns an int value) and print that result to stdout.

int d$main();   /* prototype for external function */

int main() {
  printf("\nValue returned from D main: %d.\n", d$main());
  return 0;
}

A key point is that the "real" main program is here in druntime.c. When your compiler generates assembly code for the main function from a *D program*, the code for *that* main should have a label with the name _d$main (the underscore is required if you will be running your programs with cygwin), and that name should appear in a .globl declaration in the compiled code to make it visible to the bootstrap program. In essence, when the main function in druntime.c calls d$main, it is really just treating it in the same way it would any other function that might have been defined in another file. It just so happens in our case rather than writing d$main in C and then compiling it, our compiler will have generated it directly into assembly code. In the end gcc will be happy to compile (C files) or assemble (assembler files) whatever files we give it and link them all together into one executable (as seen below).

Executing Assembler Code with gcc

The output of your compiler is an ordinary text file. It can have any name you would like, but it is customary on Unix systems to give assembler source files a name ending with ".s".

Assuming that your compiled code is stored in a file named, e.g., code.s, you can use gcc to assemble code.s, compile druntime.c, and link the result with a single command:

gcc code.s druntime.c -o code.exe

That produces an executable file named code.exe containing the complete program, which can be run normally.

Where to Start

As with previous assignments, it's helpful to figure out what piece of the compiler can be done first, without having to finish everything at once. Here are some suggestions.

First I would suggest looking at the sample compiler output for simple.d (simple.s) linked from the main assignments page. Your output does not need to match this exactly (definitely does not need to match the comments) but it should be fairly similar in terms of instructions if you follow the suggested guidelines for generating code (e.g. leaving values of expressions in eax). As you can see from this output we could obviously improve the code by removing some redundant or extra instructions (e.g. no need to pop parameters off the stack after a call to a zero-parameter function). For our purposes we just want code that works - do not worry about optimizing at this point.

Also, at this point please feel free to modify any part of the code we have given you previously (io.c, symtab.h etc.) Do not feel constrained to use the putline and putstr functions in io.c, feel free to add things to your symbol table, create multiple types of symbol tables, etc..

To get going, try implementing just enough to generate labels for functions and the ret instruction that ends a return statement. That should be enough to generate a text (.s) file containing an empty main program that can be called and return to the bootstrap without crashing.

Once you've done that, try implementing factor() for integer constants. Then return statements with constant values should work. From there you could implement arithmetic operations, which gives you return statements with expressions involving constants. You can test the code by writing small programs each of which has a main method with a different return expression.

The next big step could be to figure out the details of stack frame layouts and the offsets assigned to parameters and local variables. Check your work by compiling some sample programs, print the local symbol tables, and verify that the offsets or your offset calculations are correct. Be sure to update your code for function definitions to allocate the correct amount of space on the stack for local variables. From there you should be able to add assignment statements and expressions (factors) that include integer variables.

There are several possible ways to go from here. Probably the most useful is to get function prologues and epilogues (the rest of return) to work. At this point you should have straight-line code with function calls, including calls to get and put, working properly.

Finally, look at code generation for conditions (rel-exp and bool-exp) and if and while statements. The issues here are getting the labels planted in the right place, and picking the right conditional jumps to the correct labels.

Test Programs

Several D test programs are available from the course web.  Feel free to create additional tests to demonstrate or debug your compiler.  If you create any new test programs, please feel free to share them with others or contribute them to the collection on the course web (send mail to the instructor).

Debugging Assembly Programs

You can use gdb with your generated assembly programs just as with a C program. To do this be sure that you use the -g option when compiling/linking:

gcc -g hello.s druntime.c -o hello.exe.

Here is an on-line reference that may be of use. At the very least it may be useful to load up your program, set a break point (e.g break hello.s:37 to break at line 37 in hello.s) and type: "print $ebp" (note that $ is used instead of %) to see the value in a particular register or "info registers" to see the values in all registers. Pay particular attention to the values of %esp and %ebp as forgetting to allocate enough space for local variables or forgetting to push or "pop" parameters (or doing so in the wrong order) will be a common mistake.

Above and Beyond

If you get everything in the D language working as expected feel free to extend the project further. I will award some small amount of extra credit for going above and beyond what is required. If you attempt extra credit I would *strongly* recommend that you get the basic version of the compiler working first, make a copy of your entire project (perhaps even go ahead and submit it), and then work on any extra credit. *If you make modifications to the language itself, (for example so that the grammar changes) such as by adding new constructs or types you should submit two versions of your compiler - one for the original language and one for your extended version*. Here are a few suggestions:

If you implement any advanced features, types, or error checking in your compiler we ask that you submit: 1) a short writeup describing what you implemented, and 2) test cases and output that demonstrate your additions. In addition, if you If you do not submit these then you will not receive credit for your additions!

Project Turnin

Electronic: Turn in your program electronically using the link on the assignments page. If you worked with a partner only one person should submit. You should submit *all* files required to make your compiler (scanner, parser/code gen code), as well as test cases, output, and your written report. Include:

Paper: Hand in a printout of your code, test output, and written report in class on Dec 7. Only one partner needs to submit printouts.

Additional Notes (added after 12/03/07)

Added 12/03/07:

  1. Here is another file of sample output for this more complex file: div_mod_loop.d.

    Added 12/05/07:

  2. Although not explicitly mentioned, it would be helpful if you would electronically submit a Makefile when you submit all files required to build your compiler. Instructions on what to do to build your compiler would also be appreciated. No need to print out your Makefile.
  3. Turning in a compiler that generates x86 code that will work on cygwin or linux (the difference being whether you need to prepend underscores a few places or not) will be fine. If more changes would need to be made to get your code to work in one of these two environments than simply changing _get/_put/_d$main to get/put/d$main (or vise versa) then we may not be able to run it.
  4. It would also be helpful if you would state whether your code will run on cygwin or linux.
  5. Note that the short report is meant to cover the entire compiler in terms of describing what works and doesn't work as well as who did what, etc. This doesn't have to be long (e.g. It is sufficient to say that you scan, parse, and generate code for the entire language correctly if that is true.) But be sure to point out things you know that do not work and also don't forget to mention any extra credit options you implemented.