CSE 413 -- Assignment 8 -- D Code Generation
Due: Friday, June 4, at the beginning of class. You may work with a partner on this
assignment. If you have a partner from the previous D assignments, you should continue
working with that person. We will want you to turn in your compiler, and some sample
compiled code, details tba.
Overview
For this assignment, add code generation to your compiler. When the finished compiler
is executed, it should open a D source program file, compile it, and produce a .asm
text file containing an x86 assembly language version of the D program. Source lines from
the D file, including comments and whitespace, should appear as comments in the .asm
code, with each source line near the code generated from it.
Most of the work in this assignment consists of adding new code to the existing parser.
While you may find it useful to create new classes for various data structures or utility
routines, the bulk of the changes will be additions to existing parser methods.
Assembly File Format
The MASM assembler source file generated by the compiler should have the following
format:
.386
.model flat, c
public D_main
extern get:near, put:near
.code
<your generated code goes here>
end
This tells the assembler to assemble 32-bit flat-addressing code using 32-bit
instructions and registers available in 80386 and later processors. It specifies that the
C-language conventions will be used for external names and that get and put
are external symbols defined elsewhere (in the code we give you). It also specifies that
symbol D_main is an externally-visible name defined in the compiled code.
We have written a very short C main program, Dtest.c that will execute your
code (located here). This program calls function D_main
(the D main function) and prints the value returned. The code generated for
function main in the D program should be labeled D_main, not main.
This is needed because file Dtest.c contains the actual function main
where execution must begin, so the environment (stack, etc.) is properly initialized
before your compiled code is called.
Code Generation Details and Hints
Use the code generation model that was presented in lecture. Some key points:
- Treat the x86 as a single-accumulator machine. The code generated for an arithmetic
expression, or any component of an arithmetic expression, should leave the resulting value
in register eax.
- Use simple, 32-bit flat-model instructions. The following 32-bit instructions should
probably be all that are needed: mov (memory-register, register-memory, and
register-register); push and pop; add, sub, and imul
(register-register); cmp (register-register); jmp and conditional
branches; call and ret.
- The compiler routines that handle conditional expressions need to generate a compare and
branch instruction. One way to organize this is to add some arguments to the parser
functions for bool-exp and rel-exp that give the String label that is
the jump target, and boolean value to indicate if the jump should be taken if the
condition is true or if the condition is false.
- Use the x86 C-language function calling conventions, except that arguments should be
pushed on the stack from left to right. Use a call instruction to call a
function; after the function returns, add an appropriate amount to esp to pop the
arguments off the stack.
- In a function body, use ebp as a frame pointer. Allocate space for local
variables by subtracting an appropriate amount from esp immediately after setting
ebp. When the function returns, registers ebp, ebx, esi,
and edi should have their original values (as should esp). The integer
result of a function must be returned in register eax. Other registers may be
used freely. If you need an extra register for the second operand of a binary arithmetic
instruction, it will probably be easiest to use ecx, which doesn't need to be
saved or restored during a function call.
- When compiling a function call, the compiler should emit code to evaluate each argument
and leave its value in register eax as the arguments are parsed from left to
right. Each of the arguments should be pushed on the stack after it is evaluated.
- The standard functions get and put are ordinary C functions defined in
Dtest.c. The extern directive makes them available to the compiled
assembly language code. Use the normal C conventions to call them (push arguments as
needed, call get or put, pop any arguments off the stack when the function returns). You
need to manually create entries for get and put in the external function
symbol table when your compiler starts, before any D code is compiled. Once that has been
done, get and put should not require any further special handling
in the compiler.
- The code generated at the beginning of the D program's main function must begin
with the label D_main, not main. This avoids a conflict with
the program that executes your code, which contains the actual function main.
- Assembly language instruction names (mov, add, etc.) have fixed
meanings in MASM and cannot be used as label names. So, if you use D function names to
label the generated code, you will run into problems (and get really obscure error
messages) if you have a D function with the same name as a MASM reserved word. If you
wish, you can simply avoid using MASM reserved words as function names. Or you could use a
``mangled'' version of the function name as a label in the assembly code -- for instance,
if the D function is named inc, you might use inc$d as the label in the
code.
- Because of the Windows C language calling conventions, every external name generated by
the compiler or assembler begins with an extra underscore. This is normally transparent,
but it is visible in the debugger, where you will see symbols like _get and _D_main}.
- Generate code as you parse the D program. Don't save up multiple lines of assembler
output code.
- Don't worry about generating instructions that reload a just-stored variable, or do
other stupid things. We're trying to get running, correct code.
- You will find it handy to have a function named emit or gen or
something in your compiler that you can call when you want to write a line of code to the
output .asm file. To generate code, say, for an addition operation, your
compiler method for expr() could execute something like this:
term();
gen(" push eax");
term();
gen(" pop ecx");
gen(" add eax,ecx");
Assuming that the code generated by term() loads the value of the term into eax,
the code generated for 1+2 would look roughly like this :
mov eax,1
push eax
mov eax,2
pop ecx
add eax,ecx
The mov instructions would be produced by term() (actually by factor(),
which is called by term()); the other instructions are generated by expr().
Executing MASM Code with Visual C++
Your compiler should produce a text file with a name ending in .asm containing
the assembly language version of a D program. To execute this code, you will need to
create a Visual C++ project with a C (not C++) main program that we will supply and the
assembler code generated by your compiler. The resulting program can be run and debugged
using Visual C++.
MASM is a complete programming environment for 16-bit assembly language programs. The
assembler itself (ml.exe) can also assemble 32-bit code, but cannot link or
execute it. For that, we need to use the regular Visual C++ environment. You may find it
easiest to use the assembler from a command prompt window, but it is also possible to
configure Visual C++ to use MASM to assemble the .asm file containing the
translated D program. In any case, you'll need to use the normal Visual C++ 32-bit linker
and debugger to execute the resulting program. Here's how to configure Visual C++ to use
MASM to assemble and run your generated code the following:
- Create a new project in Visual C++ by selecting File>New; click on the Projects tab;
select Win32 Console Application for the project type and Win32 for the project platform.
Enter a name for your project and select the desired directory, then click OK.
- Add file Dtest.c (which you get from us) and the .asm file generated
by your compiler to the project. Pick Project>Select Files, then use the Insert Files
into Project dialog to add Dtest.c. Do this again to add the .asm file.
(You may have to change the type of files displayed in the dialog to ``all files'' to see
the .asm file.)
- Configure the project to use MASM to assemble the .asm file. Select
Project>Settings. In the dialog box that appears, be sure that Win32 Debug is displayed
in the Settings: field. Expand the file list if needed, then select your .asm
file -- and only this file. Click on the Custom Build tab.In the first
line of the Build Command(s) field, enter the MASM command to be used to assemble the
file. This includes the full path name for the MASM assembler, the assembly options to be
used, and a macro that specifies the input path. In the CSUGlab, the following command
should work:
c:\masm611\bin\ml.exe /c /Cx /coff /Zi ${InputPath}
(The executable file name ml.exe has a letter l in it, not a digit 1.
The InputPath macro can be entered by clicking on button Files and selecting Input Path in
the menu that appears.)Finally, you need to specify the output file name that MASM should
use for the assembled object code. In the Output File(s) field, enter filename.obj,
where filename is the name of your assembly source file (without the .asm
suffix).
You should now be able to compile, link, and execute your program with the normal
Visual C++ Build commands. Visual C++ will use MASM to assemble the .asm file as
needed. You can even use the symbolic debugger to step through the assembly language code,
set breakpoints in it, etc.