Electronic turnin of programming part due Thursday, March 8, by 10:00 pm.
Turnin receipt, test output, and writtem report due in class Friday, March 9.
You may work with a partner on this assignment. If you have a partner from the previous compiler assignments, you should continue working with that person.
For this assignment, add code generation to your compiler. When the finished compiler is executed, it should open a D source program file, compile it, and produce a .asm text file containing an x86 assembly language version of the D program. Source lines from the D file, including comments and whitespace, should appear as comments in the .asm code, with each source line near the code generated from it.
Most of the work in this assignment consists of adding new code to the existing parser. While you might find it useful to create a few new classes for various data structures or utility routines, the bulk of the changes will be additions to existing parser methods. A separate handout contains details about code generation for D programs.
The easiest way to run the compiled x86 code is to call it from a trivial C
program as if it were an ordinary function. That ensures that the stack is properly
set up when the compiled code begins executing, and provides a convenient place to put functions that provide an interface between the compiled code and the
external world (get
and put
). Here are the details of how that works.
Recall that for our purposes, every x86 MASM assembly language file should have this structure.
.386 .model flat, c public d$main extern get:near, put:near .code <your generated code goes here> end
The .386
and .model
directives specify that this
assembly program uses the flat, 32-bit address space and instruction set
introduced with the 80386 many years ago. The key directives for linking
and running your program are public
and extern
.
public
specifies labels defined in this assembly file that
can be accessed by other source files in the program. In this case, we
specify that we want the label of the D main function (d$main
) to be
callable from the bootstrap program.extern
specifies labels that are used in this file, but are
defined elsewhere. The standard I/O functions get
and put
are listed here. These are contained in the bootstrap program, and by
declaring them this way, the assembly code can call them directly (use a call
get
or call put
instruction after pushing the parameter).The bootstrap program is named dtest.c (right click on the file name to download a copy). You should use this program to run your compiled code. The bootstrap program is very small; here is a listing.
#include <stdio.h> extern int d$main(); /* main function in compiled code */ /* Prompt for input, then return next integer from standard input. */ int get() { int k; printf("get: "); scanf("%d", &k); return k; } /* Write x to standard output with a title and yield value of x */ int put(int x) { printf("put: %d\n", x); return x; } /* Execute D program d$main and print value returned */ void main() { printf("\nValue returned from D main: %d.\n", d$main()); }
This test program takes advantage of a non-standard addition to C supported
by Visual C++: the character $
is used in the function name d$main
.
Your compiler should produce a text file with a name ending in .asm containing
the assembly language version of a D program. To execute this code, you will need to
create a Visual C++ project. Add to the project the C (not C++) main program
(dtest.c
) and your assembly language file. The resulting program can be run and debugged
using Visual C++.
MASM is a complete programming environment for 16-bit assembly language programs. The assembler itself (ml.exe) can also assemble 32-bit code, but cannot link or execute it. For that, we need to use the regular Visual C++ environment. You may find it easiest to use the assembler from a command prompt window, but it is also possible to configure Visual C++ to use MASM to assemble the .asm file containing the translated D program. In any case, you'll need to use the normal Visual C++ 32-bit linker and debugger to execute the resulting program. Here's how to configure Visual C++ to use MASM to assemble and run your generated code:
c:\masm611\bin\ml.exe /c /Cx /coff /Zi ${InputPath}
If MASM has been installed in a different directory, you'll need to
change the path name (c:\masm611
) to whatever is
appropriate In the MSCC lab, ml.exe
is located in c:\program
files\98ddk\bin\win98\ml.exe
.
(The executable file name ml.exe has a letter l in it, not a digit 1.
The InputPath
macro can be entered by clicking on button Files and selecting Input Path in
the menu that appears. If it doesn't work on your setup, you might
find it useful to substitute the actual file name for ${Inputpath}
.)
Finally, you need to specify the output file name that MASM should use for the assembled object code. In the Output File(s) field, enter filename.obj, where filename is the name of your assembly source file (without the .asm suffix).
You should now be able to compile, link, and execute your program with the normal Visual C++ Build commands. Visual C++ will use MASM to assemble the .asm file as needed. You can even use the symbolic debugger to step through the assembly language code, set breakpoints in it, etc.
In the past some people have had trouble setting up MASM to work on their
machine, but there doesn't seem to be any systematic reason (it's a Windows
thing). Try this early using a small hand-written .asm
file
to be sure you've got the configuration right, so it doesn't become a problem at
the last minute.
As with previous assignments, it's helpful to figure out what piece of the compiler can be done first, without having to finish everything at once. Here are some suggestions.
The first thing that needs to be done is to figure out the details of stack frame layouts and the offsets assigned to parameters and local variables. Then add code to the compiler to process parameters lists and variable declarations. Check your work by compiling some sample programs, print the local symbol tables, and verify that the offsets are correct.
There are several possible ways to go from here. Probably the most useful is
to get function prologues and return working. That gives you enough to generate
a working program with a d$main
function that can be called and returns
properly. (If you haven't implemented code generation for expressions yet, the value
returned by main will be whatever random bits happen to be in eax
, but that's
ok.)
It's probably useful to tackle factor at this point, followed by assignment. That gives you enough to generate code for a=b; or x=17;. Extend this to handle code generation for arithmetic expressions, including function call. You've now got enough to execute straight-line programs, complete with input and output and functions -- even recursive ones.
Finally, look at code generation for conditions (rel-exp and bool-exp) and
if
and while
statements. The issues here are getting the labels planted in the right place,
and picking the right conditional jumps to the correct labels.
Several D test programs are available from the
course web. Feel free to create additional tests to demonstrate or debug
your compiler. If you create any new test programs, please feel free to
share them with others or contribute them to the collection on the course web
(send mail to cse413-staff@cs
).
Turn in your program electronically using this turnin form.
The electronic turnin should include all of the Java source code for your compiler. It does not need to include test programs or output.
Turn in the following at the beginning of class, Friday, March 9: