CSE 413 Winter 2001

Assignment 6 -- D Parser & Symbol Tables

Electronic turnin of programming part due Tuesday, Feb. 27, by 10:00 pm.
Turnin receipt and test output due in class Wednesday, Feb. 28.

Overview

For this assignment, add a recursive descent parser and symbol tables to your D compiler. The resulting compiler should parse D programs and print a trace of the parser methods showing the order in which they were executed; build and print a symbol table for each function; and, at the end of the program, print a global symbol table showing the names of all functions. For now, each symbol table only needs to contain identifiers. We will add more information later.

Parser

The methods (functions) that make up the recursive descent parser should be placed in a class named Parser.  This class should include a separate method to parse each major D construct (expression, statement, different kinds of statements, function_def, parameters, etc.).  Class Parser should also include any other methods or data that you need.

The constructor for Parser should have a Scanner parameter.  The parser should get the tokens that make up the input program by calling the nextToken method of the supplied Scanner object as needed.  The constructor should also have an EchoIO parameter that that can be used to print trace output, symbol tables and (if implemented) error messages.  This EchoIO object should be the same one the scanner is using to read and echo-print the source program, so any output from the parser will appear interspersed with lines from the source program.

In addition to parsing the input program, the parser should do the following:

Symbol Tables

The parser should contain two symbol tables: one for parameters and local variables of the function currently being compiled, and a second, global one, containing function names. These tables are similar, but will eventually contain different information. For now, they only need to contain identifiers.

The local symbol table should contain the names of all identifiers and parameters declared in the current function.  An empty table should be initialized each time the compiler reaches the beginning of a function definition.  As each parameter or local variable declaration is parsed, the name of that variable or parameter should be added to the local symbol table.  When the entire definition of the function has been parsed, this table should be printed if symbol table output has been requested.

For this assignment, the main use of the local symbol table is to verify that all declarations were parsed successfully and that names of parameters and local variables have been recorded properly.  If you are doing some error checking, you could use this information to detect duplicate declarations of identifiers or use of undeclared identifiers in statements and expressions. If you do this, it's best if your compiler only complains once about each undeclared identifier.  (Redundant error messages tend to annoy users and can hide other, useful information.)  A good strategy to avoid redundant "undeclared identifier" messages is to add the undeclared identifier to the symbol table after it is first seen and an error message has been printed.

The global symbol table should contain one entry for each function name used in the program.  Remember that in D, as in Java, a function does not need to be declared before it can be used.  So the first occurrence of a function name might be in a function call expression in some other function, instead of the actual function declaration.  To deal with this, put a function name in the global symbol table when it is first encountered, no matter how.

If you want to check for errors, there are a couple of things that can be detected easily.  First, you can include an boolean value with each function entry indicating whether the function definition has been encountered yet.  This should be set to false if the name is first encountered in a function call.  When a function declaration is parsed, set this variable to true.  If it was already true when the function definition is reached, then the function name is being declared twice - an error that you could report.  When the end of the program is reached, you could scan the global symbol table and report any functions that were used (called) but never defined.

Another useful piece of information about each function is the number of parameters.  If you record this when you first encounter the function, you can detect when the function is called with the wrong number of arguments, or if the function definition contains a different number of parameters than were used in an earlier call.

The D language includes two functions, get and put, that provide input and output of integer values.  These functions are not defined in D programs, but are available and no error messages should be generated if they are used.  The simplest way to handle this is to initialize the global symbol table so it contains appropriate entries for get and put before parsing the program.

There are many ways to implement symbol tables: lists, trees, etc. But the Java library has collection classes that you should use.  In particular, HashMap is a likely candidate.  You might want to create symbol table classes to hide this underlying data structure and provide suitable operations that the parser can use to access it.

Trace Output

The trace should be produced by printing a suitable message at the beginning and end of each parser method. You should include a method in the parser to turn tracing on or off. Here's a sketch of how this could work.

   public class Parser {
      private EchoIO eio;       // I/O object for parser output
      private Scanner scan;     // source of input tokens
      private boolean tracing;  // = "print parser trace"

      // constructors, etc. here....

      // start or stop tracing
      public void setTracing(boolean onOff) { tracing = onOff; }

      // print trace messages when entering and leaving named method
      private void traceEnter(String method) { eio.println("entering " + method); }
      private void traceExit (String method) { eio.println("leaving  " + method); }

      ...
      // term ::= factor | term * factor
      private void term() {
         traceEnter("term");
         code to parse a term
         traceExit("term");
      }
   }

There should be a similar mechanism to turn the symbol table printouts on and off.

Test Program

Class Parser should contain a main method that tests the parser.  This test program should initialize I/O, the scanner, and the parser, then parse a D program by calling the method for the nonterminal program. The test program might look something like this.

   // parser test program
   public static void main (String [] args) {

      // create I/O object and open files specified by args
      EchoIO eio = new EchoIO(args);

      // create scanner object (source of tokens)
      Scanner scan = new Scanner(eio);

      // create parser object
      Parser parse = new Parser(scan, eio);

      // parse source program and print a trace
      parse.setTracing(true);
      parse.program();
   }

Hints

Test Programs

There are some test programs on the course web site in the assignments section.  It would also be great if you want to contribute additional test programs by either posting them to the class mailing list, or sending them to the course staff.

You should demonstrate that your parser works by parsing some sample D programs.  The output should show the parser trace and symbol tables, along with the corresponding lines from the source code.  Be sure that your test program(s) use all of the constructs in the grammar to verify that all of the parser methods work properly.

If you have added any error checking, include output that demonstrates these checks.

Project Turnin

Turn in your program electronically using this turnin form.

Hand in the online receipt along with test output and your written problems in class on February 2.