CSE 413 -- Assignment 8 -- D Code Generation

Due: Friday, June 4, at the beginning of class. You may work with a partner on this assignment. If you have a partner from the previous D assignments, you should continue working with that person. We will want you to turn in your compiler, and some sample compiled code, details tba.

Overview

For this assignment, add code generation to your compiler. When the finished compiler is executed, it should open a D source program file, compile it, and produce a .asm text file containing an x86 assembly language version of the D program. Source lines from the D file, including comments and whitespace, should appear as comments in the .asm code, with each source line near the code generated from it.

Most of the work in this assignment consists of adding new code to the existing parser. While you may find it useful to create new classes for various data structures or utility routines, the bulk of the changes will be additions to existing parser methods.

Assembly File Format

The MASM assembler source file generated by the compiler should have the following format:

  .386
  .model flat, c
  public D_main
  extern get:near, put:near
  .code

<your generated code goes here>

end

This tells the assembler to assemble 32-bit flat-addressing code using 32-bit instructions and registers available in 80386 and later processors. It specifies that the C-language conventions will be used for external names and that get and put are external symbols defined elsewhere (in the code we give you). It also specifies that symbol D_main is an externally-visible name defined in the compiled code.

We have written a very short C main program, Dtest.c that will execute your code (located here). This program calls function D_main (the D main function) and prints the value returned. The code generated for function main in the D program should be labeled D_main, not main. This is needed because file Dtest.c contains the actual function main where execution must begin, so the environment (stack, etc.) is properly initialized before your compiled code is called.

Code Generation Details and Hints

Use the code generation model that was presented in lecture. Some key points:

Treat the x86 as a single-accumulator machine. The code generated for an arithmetic expression, or any component of an arithmetic expression, should leave the resulting value in register eax.
Use simple, 32-bit flat-model instructions. The following 32-bit instructions should probably be all that are needed: mov (memory-register, register-memory, and register-register); push and pop; add, sub, and imul (register-register); cmp (register-register); jmp and conditional branches; call and ret.
The compiler routines that handle conditional expressions need to generate a compare and branch instruction. One way to organize this is to add some arguments to the parser functions for bool-exp and rel-exp that give the String label that is the jump target, and boolean value to indicate if the jump should be taken if the condition is true or if the condition is false.
Use the x86 C-language function calling conventions, except that arguments should be pushed on the stack from left to right. Use a call instruction to call a function; after the function returns, add an appropriate amount to esp to pop the arguments off the stack.
In a function body, use ebp as a frame pointer. Allocate space for local variables by subtracting an appropriate amount from esp immediately after setting ebp. When the function returns, registers ebp, ebx, esi, and edi should have their original values (as should esp). The integer result of a function must be returned in register eax. Other registers may be used freely. If you need an extra register for the second operand of a binary arithmetic instruction, it will probably be easiest to use ecx, which doesn't need to be saved or restored during a function call.
When compiling a function call, the compiler should emit code to evaluate each argument and leave its value in register eax as the arguments are parsed from left to right. Each of the arguments should be pushed on the stack after it is evaluated.
The standard functions get and put are ordinary C functions defined in Dtest.c. The extern directive makes them available to the compiled assembly language code. Use the normal C conventions to call them (push arguments as needed, call get or put, pop any arguments off the stack when the function returns). You need to manually create entries for get and put in the external function symbol table when your compiler starts, before any D code is compiled. Once that has been done, get and put should not require any further special handling in the compiler.
The code generated at the beginning of the D program's main function must begin with the label D_main, not main. This avoids a conflict with the program that executes your code, which contains the actual function main.
Assembly language instruction names (mov, add, etc.) have fixed meanings in MASM and cannot be used as label names. So, if you use D function names to label the generated code, you will run into problems (and get really obscure error messages) if you have a D function with the same name as a MASM reserved word. If you wish, you can simply avoid using MASM reserved words as function names. Or you could use a ``mangled'' version of the function name as a label in the assembly code -- for instance, if the D function is named inc, you might use inc$d as the label in the code.
Because of the Windows C language calling conventions, every external name generated by the compiler or assembler begins with an extra underscore. This is normally transparent, but it is visible in the debugger, where you will see symbols like _get and _D_main}.
Generate code as you parse the D program. Don't save up multiple lines of assembler output code.
Don't worry about generating instructions that reload a just-stored variable, or do other stupid things. We're trying to get running, correct code.
You will find it handy to have a function named emit or gen or something in your compiler that you can call when you want to write a line of code to the output .asm file. To generate code, say, for an addition operation, your compiler method for expr() could execute something like this:
```
   term();
   gen("  push eax");
   term();
   gen("  pop ecx");
   gen("  add eax,ecx");
```
Assuming that the code generated by term() loads the value of the term into eax, the code generated for 1+2 would look roughly like this :
```
   mov  eax,1
   push eax
   mov eax,2
   pop ecx
   add eax,ecx
```
The mov instructions would be produced by term() (actually by factor(), which is called by term()); the other instructions are generated by expr().

Executing MASM Code with Visual C++

Your compiler should produce a text file with a name ending in .asm containing the assembly language version of a D program. To execute this code, you will need to create a Visual C++ project with a C (not C++) main program that we will supply and the assembler code generated by your compiler. The resulting program can be run and debugged using Visual C++.

MASM is a complete programming environment for 16-bit assembly language programs. The assembler itself (ml.exe) can also assemble 32-bit code, but cannot link or execute it. For that, we need to use the regular Visual C++ environment. You may find it easiest to use the assembler from a command prompt window, but it is also possible to configure Visual C++ to use MASM to assemble the .asm file containing the translated D program. In any case, you'll need to use the normal Visual C++ 32-bit linker and debugger to execute the resulting program. Here's how to configure Visual C++ to use MASM to assemble and run your generated code the following:

Create a new project in Visual C++ by selecting File>New; click on the Projects tab; select Win32 Console Application for the project type and Win32 for the project platform. Enter a name for your project and select the desired directory, then click OK.
Add file Dtest.c (which you get from us) and the .asm file generated by your compiler to the project. Pick Project>Select Files, then use the Insert Files into Project dialog to add Dtest.c. Do this again to add the .asm file. (You may have to change the type of files displayed in the dialog to ``all files'' to see the .asm file.)
Configure the project to use MASM to assemble the .asm file. Select Project>Settings. In the dialog box that appears, be sure that Win32 Debug is displayed in the Settings: field. Expand the file list if needed, then select your .asm file -- and only this file. Click on the Custom Build tab.In the first line of the Build Command(s) field, enter the MASM command to be used to assemble the file. This includes the full path name for the MASM assembler, the assembly options to be used, and a macro that specifies the input path. In the CSUGlab, the following command should work:
```
  c:\masm611\bin\ml.exe /c /Cx /coff /Zi ${InputPath}
```
(The executable file name ml.exe has a letter l in it, not a digit 1. The InputPath macro can be entered by clicking on button Files and selecting Input Path in the menu that appears.)Finally, you need to specify the output file name that MASM should use for the assembled object code. In the Output File(s) field, enter filename.obj, where filename is the name of your assembly source file (without the .asm suffix).

You should now be able to compile, link, and execute your program with the normal Visual C++ Build commands. Visual C++ will use MASM to assemble the .asm file as needed. You can even use the symbolic debugger to step through the assembly language code, set breakpoints in it, etc.