Due: Electronic turnin of entire compiler, including test output and written report, due Friday, Dec. 7, by 8:00 am. Printouts of your code, tests cases as well as output, and written report due at the beginning of class on Friday, Dec. 7.
You may work with a partner on this assignment. If you have a partner from the previous compiler assignments, you should continue working with that person.
For this assignment, add code generation to your compiler. When the finished compiler is executed, it should open a D source program file, compile it, and produce a .s assembler text file containing an x86 assembly language version of the D program. Source lines from the D file, including comments and whitespace, should appear as comments in the assembler code, with each source line near the code generated from it.
Most of the work in this assignment consists of adding new code to the existing parser. While you might find it useful to create some extra functions or data structures, the bulk of the changes will be additions to existing parser functions. A separate handout contains details about code generation for D programs.
The easiest way to run the compiled x86 code is to call it from a
trivial C program as if it were an ordinary function. That
ensures that the stack is properly set up when the compiled code
begins executing, and provides a convenient place to put functions
that provide an interface between the compiled code and the external
world (get
and put
).
The bootstrap program is named druntime.c
(right click on the file name to download a copy). You should use
this program to run your compiled code. The file contains code
for the get
and put
functions as well as
the main
function where program execution will begin.
Looking at the contents of the main
function
in druntime.c
(shown below), you can see that all it
does is call a function named "d$main
", (which we can
see from the function prototype returns an int value) and print that
result to stdout
.
int d$main(); /* prototype for external function */ int main() { printf("\nValue returned from D main: %d.\n", d$main()); return 0; }
A key point is that the "real" main program is here
in druntime.c
. When your compiler generates assembly
code for the main
function from a *D program*, the
code for *that* main
should have a label with the
name _d$main
(the underscore is required if you will be
running your programs with cygwin), and that name should appear in
a .globl
declaration in the compiled code to make it
visible to the bootstrap program. In essence, when
the main
function in druntime.c
calls d$main
, it is really just treating it in the same
way it would any other function that might have been defined in
another file. It just so happens in our case rather than
writing d$main
in C and then compiling it, our compiler
will have generated it directly into assembly code. In the
end gcc
will be happy to compile (C files) or assemble
(assembler files) whatever files we give it and link them all
together into one executable (as seen below).
The output of your compiler is an ordinary text file. It can have any name you would like, but it is customary on Unix systems to give assembler source files a name ending with ".s".
Assuming that your compiled code is stored in a file named,
e.g., code.s
, you can use gcc
to
assemble code.s
, compile druntime.c
, and
link the result with a single command:
gcc code.s druntime.c -o code.exe
That produces an executable file named code.exe
containing the complete program, which can be run normally.
As with previous assignments, it's helpful to figure out what piece of the compiler can be done first, without having to finish everything at once. Here are some suggestions.
First I would suggest looking at the sample compiler output for simple.d (simple.s) linked from the main assignments page. Your output does not need to match this exactly (definitely does not need to match the comments) but it should be fairly similar in terms of instructions if you follow the suggested guidelines for generating code (e.g. leaving values of expressions in eax). As you can see from this output we could obviously improve the code by removing some redundant or extra instructions (e.g. no need to pop parameters off the stack after a call to a zero-parameter function). For our purposes we just want code that works - do not worry about optimizing at this point.
Also, at this point please feel free to modify any part of the
code we have given you previously
(io.c
, symtab.h
etc.) Do not feel
constrained to use the putline
and putstr
functions in io.c
, feel free to add things to your
symbol table, create multiple types of symbol tables, etc..
To get going, try implementing just enough to generate labels for
functions and the ret
instruction that ends a return
statement. That should be enough to generate a text
(.s
) file containing an empty main
program that can be called and return to the bootstrap without
crashing.
Once you've done that, try implementing factor()
for
integer constants. Then return statements with constant values
should work. From there you could implement arithmetic operations,
which gives you return statements with expressions involving
constants. You can test the code by writing small programs each of
which has a main
method with a different return
expression.
The next big step could be to figure out the details of stack frame layouts and the offsets assigned to parameters and local variables. Check your work by compiling some sample programs, print the local symbol tables, and verify that the offsets or your offset calculations are correct. Be sure to update your code for function definitions to allocate the correct amount of space on the stack for local variables. From there you should be able to add assignment statements and expressions (factors) that include integer variables.
There are several possible ways to go from here. Probably the
most useful is to get function prologues and epilogues (the rest
of return) to work. At this point you should have straight-line
code with function calls, including calls to get
and put
, working properly.
Finally, look at code generation for conditions
(rel-exp
and bool-exp
) and
if
and while
statements. The issues here
are getting the labels planted in the right place, and picking the
right conditional jumps to the correct labels.
Several D test programs are available from the course web. Feel free to create additional tests to demonstrate or debug your compiler. If you create any new test programs, please feel free to share them with others or contribute them to the collection on the course web (send mail to the instructor).
You can use gdb with your generated assembly programs just as with a C program. To do this be sure that you use the -g option when compiling/linking:
gcc -g hello.s druntime.c -o hello.exe
.
break
hello.s:37
to break at line 37 in hello.s) and type: "print
$ebp"
(note that $ is used instead of %) to see the value in a
particular register or "info registers"
to see the values
in all registers. Pay particular attention to the values
of %esp
and %ebp
as forgetting to allocate
enough space for local variables or forgetting to push or "pop"
parameters (or doing so in the wrong order) will be a common mistake.
If you get everything in the D language working as expected feel free to extend the project further. I will award some small amount of extra credit for going above and beyond what is required. If you attempt extra credit I would *strongly* recommend that you get the basic version of the compiler working first, make a copy of your entire project (perhaps even go ahead and submit it), and then work on any extra credit. *If you make modifications to the language itself, (for example so that the grammar changes) such as by adding new constructs or types you should submit two versions of your compiler - one for the original language and one for your extended version*. Here are a few suggestions:
Electronic: Turn in your program electronically using the link on the assignments page. If you worked with a partner only one person should submit. You should submit *all* files required to make your compiler (scanner, parser/code gen code), as well as test cases, output, and your written report. Include:
Paper: Hand in a printout of your code, test output, and written report in class on Dec 7. Only one partner needs to submit printouts.
Added 12/03/07:
Added 12/05/07: