CSE 413 Spring 2000

Assignment 7 -- D Parser & Symbol Tables

Due: Wednesday, May 24 at the beginning of class. If you've been working with a partner on the compiler assignments, you should continue working with the same person.

Overview

For this assignment, add a recursive descent parser and symbol tables to your D compiler. The resulting compiler should parse D programs and print a trace of the parser methods showing the order in which they were executed; build and print a symbol table for each function; and, at the end of the program, print a global symbol table showing the names of all functions. For now, each symbol table only needs to contain identifiers. We will add more information later.

There are also some warmup problems on the D language itself and x86 assembly language.

Written Problems

Do these problems on paper. You do not need to execute anything on a machine.

Here is a simple C program. Write an equivalent program in D, including comments. If this program uses C features that are not included in D, use the closest available D constructs that have the same effect.

#include <stdio.h>

/* = absolute value of n */
int abs(int n) {
  if (n < 0)
    return -n;
  else
    return n;
}

/* = maximum of a and b */
int max(int a, int b) {
  if (a >= b)
    return a;
  else
    return b;
}

/* read two numbers and print the larger of
   the absolute values of those numbers */
int main(void) {
  int x, y, big;

  scanf("%d %d", &x, &y);
  big = max(abs(x),abs(y));
  printf("%d", big);
}

Translate the following D program into x86 assembly language. Any straightforward x86 code will do - in particular, don't worry about producing the same code sequence that a compiler might generate. But be sure to use the standard x86/Win32 C-language function calling conventions. Include the D source code in your answer as comments (either above the corresponding lines of x86 code or to the right).
```
// Read two positive numbers and print their greatest common divisor

// = gcd of x and y provided x, y >= 0
int gcd(int x,int y) {
  while (!(x == y))
    if (x > y)
      x = x-y;
    else
      y = y-x;
  return x;
}

int main() { 
  int a; int b; int ans;
  a = get();
  b = get(); 
  ans = gcd(a,b);
  ans = put(ans);
  return 0;
}
```

Parser

The methods (functions) that make up the recursive descent parser should be placed in a class named Parser. This class should include a separate method to parse each major D construct (expression, statement, function_def, parameters, etc.). Also include in class Parser any additional methods and data that you need.

The constructor for Parser should have a parameter of class Scanner. The parser should get the tokens that make up the input program by calling the nextToken method of the supplied Scanner object as needed. The constructor should also have a parameter of class EchoIO that that the parser can use to print trace output, symbol tables and (if implemented) error messages. This EchoIO object should be the same one the scanner is using to read and echo-print the source program.

In addition to parsing the input program, the parser should do the following:

Create a local symbol table containing names of parameters and local variables of each function as it is parsed. Class Parser should include a method that can be called to control whether this table is printed after each function definition has been processed.
Create a global symbol table containing the names of all functions in the program. The parser should optionally print this table after processing a complete program (set of function definitions). The method that controls printing of local symbol tables should also control whether this table is printed.
Be able to optionally print a message when each parser procedure is entered and when it exits. This trace will help you visualize how the parser works, and can be a useful debugging tool. More details below.
Print the source lines of the program as they are read by the scanner (via EchoIO). Any trace or symbol table output should appear interspersed with the corresponding source code. (Don't worry if the trace output isn't exactly synchronized with the echoed input lines. Some of the trace output from the current input line may not appear until after the next line of the D program is read and echoed.)
(optional) Print meaningful error messages and recover from some syntax errors. Don't spend a lot of effort here, but it is a useful exercise to try to implement some simple error detection.

Symbol Tables

The parser should contain two symbol tables: one for the parameters and local variables in the function currently being compiled, and a second for the (global) function names. These tables are similar, but will eventually contain different information. For now, they only need to contain identifiers.

The local symbol table should contain the names of all identifiers and parameters declared in the current function. This table should be initialized to empty when the compiler reaches the beginning of a function definition. As each parameter or local variable declaration is parsed, the name of that variable or parameter should be added to the local symbol table. When the entire definition of the function has been parsed, this table should be printed, if symbol table output has been requested.

For this assignment, the main use of the local symbol table is to verify that all declarations were processed and that the names of parameters and local variables have been recorded properly. If you are checking for some errors, you could use this information to detect duplicate declarations of an identifier, and use of undeclared identifiers in statements and expressions. If you do this, it's best if your compiler only complains once about each undeclared identifier. (Redundant error messages tend to annoy users and can hide other, useful information.) A good strategy to avoid redundant "undeclared identifier" messages is to add the undeclared identifier to the symbol table after it is first seen and an error message has been printed.

The global symbol table should contain one entry for each function name used in the program. Remember that in D, as in Java, a function does not need to be declared before it can be used. So the first occurrence of a function name may be either in a function call, or in the definition of the function itself. To deal with this, enter a function name in the global symbol table when it is first encountered (either a call or the actual definition).

If you want to check for errors, there are a couple of things that can be detected easily. First, you can include an boolean value with each function entry indicating whether the function definition has been compiled yet. This should be set to false if the function name is first encountered in a function call. When a function definition is compiled, set this to true (if it is already true, then the function name is being declared twice - an error that you could report). When the end of the program is reached, you could scan the global symbol table and report any functions that were used (called) but never defined.

Another useful piece of information about each function is the number of parameters. If you record this when you first encounter the function, you can detect when the function is called with the wrong number of arguments, or if the function definition contains a different number of parameters than were used in an earlier call.

The D language includes two functions, get and put, that provide input and output of integer values. These functions are not defined in D programs, but it should be possible to use them without generating error messages. The simplest way to handle this is to initialize the global symbol table so it contains appropriate entries for get and put before starting to compile a D program.

There are many ways to implement symbol tables: lists, trees, etc. We suggest you take advantage of the available Java library routines and implement the symbol tables using a standard Java container class like Hashtable. You might want to create classes (data structures) for the symbol tables to hide the underlying representation and provide symbol table operations more appropriate for the rest of the compiler.

Trace Output

The trace should be produced by printing a suitable message at the beginning and end of each parser method. You should include a method in the parser to turn the trace on or off. Here's a sketch of how this could work.

   public class Parser {
      private EchoIO eio;       // I/O object for parser output
      private Scanner scan;     // source of input tokens
      private boolean tracing;  // = "print parser trace"

      // constructors, etc. here....

      // start or stop tracing
      public void setTracing(boolean onOff) { tracing = onOff; }

      // print trace messages when entering and leaving named method
      private void traceEnter(String method) { eio.println("entering " + method); }
      private void traceExit (String method) { eio.println("leaving  " + method); }

      ...
      // term ::= factor | term * factor
      private void term() {
         traceEnter("term");
         code to parse a term
         traceExit("term");
      }
   }

There should be a similar mechanism to turn the symbol table printouts on and off.

Test Program

Class Parser should contain a main method that tests the parser. This test program should initialize I/O, the scanner, and the parser, then parse a D program by calling the method for the nonterminal program. The test program might look something like this.

   // parser test program
   public static void main (String [] args) {

      // create I/O object and open files specified by args
      EchoIO eio = new EchoIO(args);

      // create scanner object (source of tokens)
      Scanner scan = new Scanner(eio);

      // create parser object
      Parser parse = new Parser(scan, eio);

      // parse source program and print a trace
      parse.setTracing(true);
      parse.program();
   }

Hints

Some of the rules in the D grammar contain left recursion and similar problems. Your parser routines won't be able to follow these rules literally. Make the necessary adjustments.
Recursion in the grammar is often used to define sequences of various things like the list of function definitions that make up a program, the factors to be multiplied together to calculate the value of a term, etc. The corresponding parser methods should normally use iteration to process these sequences.
It is a good idea to introduce separate parser methods for things that are similar in the grammar, but mean different things in a D program. An example is the rules for parameters and declarations. While these look nearly the same, the processing needed to handle them in the final compiler may well be quite different. The compiler structure will be cleaner if you define different methods for these non-terminals.
Define simple helper methods to avoid redundant code in the parser. A couple of obvious ones: a method that skips past the current input token and advances to the next one; a method that checks whether the current token is what is expected and advances to the next one if it is.
Feel free to do some checking for syntax errors. One principle to keep in mind: it is important that the parser continue to make progress through the source program. Be sure your error handling strategy doesn't leave the parser stuck somewhere in the input, repeatedly examining the same token without advancing.
Finally, start small. Get a few pieces of the parser working first to test your understanding of the problem by processing a small part of the grammar. Then add to this until you have a complete parser.

Test Programs

We will provide some test programs on the course web site. Watch email for an announcement. It would also be great if everyone would contribute additional test programs by either posting them to the class mailing list, or sending them to cse413-staff@cs so we can add them to the course web.

You should demonstrate that your parser works by running the test program (method main in class Parser) on some sample D programs. The output should show the parser trace and symbol tables and the source code lines that generate them. Be sure that your test program(s) use all of the constructs in the grammar to verify that all of the parser methods work properly.

If you have implemented any error checking, include a test program and output that demonstrates these features of your compiler.

Project Turnin

Turn in your program electronically using the links below:

The regular turnin form will let you turn in several Java source files.
The jar turnin form will let you turn in a single jar file, which can contain an arbitrary number of files. See the jar file help page if you do not know how to create a jar file.

Hand in your solutions to the written problems, the online receipt from the electronic turnin, and the output from your test programs in class on May 24.