CSE 413 Autumn 2007 -- Assignment 7

Context Free Grammars, & D Parser

Due: Electronic turnin of the entire compiler (so far) due no later than 8am, Friday, Nov. 30. Printouts of your parser code (only) and tests, and answers to written questions due at the beginning of class on Friday, Nov. 30.

Overview

The main goal of this assignment is to add a recursive descent parser and symbol tables to your D compiler. The resulting compiler should parse D programs and print a trace of the parser functions showing the order in which they were executed; build and print a symbol table for each function; and, at the end of the source program, print a global symbol table showing the names of all functions. For now, each symbol table only needs to contain identifiers. We will add (a little) more information later.

Written problems

Here are a few problems involving context free grammars.  You should do these problems individually, even if you have a partner for the compiler project.

  1. Consider the following grammar:
    S ::= aAS | a
    A ::= SbA | SS | ba
                   
    1. What are the terminals, nonterminals, and start symbol for this grammar?
    2. Draw parse trees for the following sentences:
      1. aabbaa
      2. aabaaabaa
    3. Give top down, leftmost derivations of the sentences in (b).
  2. Give a context free grammar that generates
    1. The language containing strings with an equal number of a's and b's.
    2. The language containing strings of the form wwRcanb2n, where w is a string of 1 or more a's and b's; wR denotes the reversal of w, and the superscript n stands for any positive integer, meaning that the end of the string should contain 1 or more a's followed by twice as many b's as a's.

Parser

The functions that make up the recursive descent parser should be placed in a source file named parser.c, and you should create a header file parser.h with definitions of the function(s) that compile a program and control the parser.  This file should include a separate function to parse each major D construct (expression, statement, different kinds of statements, function_def, parameters, etc.).  You should include any other functions and variables that you need in this file. Everything except functions declared in parser.h should be static so they are not visible outside parser.c. (If this file seems to get too large, you might want to split it into a handful of files, but you definitely don't want to have a huge collection of files each of which contains only one or two functions.)

The parser should get the tokens that make up the input program by calling the scanner's next_token() function as needed.  The parser should use functions from io.[hc] to print the parser trace, symbol table output and, if you implement them, error messages. This is the same set of I/O routines used by the scanner to read the program, and the result is that output produced by the parser will appear interspersed with the lines read from the source program, which helps visualize the relationship between the program and the parser actions. You should turn off any tracing of scanner output (token printouts) except when (or if) you need to do this during debugging.

In addition to parsing the input program, the parser should do the following:

Symbol Tables

The parser should contain two symbol tables: one for parameters and local variables of the function currently being compiled, and a second, global one, containing function names. These tables are similar, but will eventually contain different information. For now, they only need to contain identifiers. You should use the symbol tables you created for assignment 5 to implement this.

The local symbol table should contain the names of all identifiers and parameters declared in the current function.  An empty table should be initialized each time the compiler reaches the beginning of a function definition.  As each parameter or local variable declaration is parsed, the name of that variable or parameter should be added to the local symbol table.  When the entire definition of the function has been parsed, this table should be printed if symbol table output has been requested.

For this assignment, the main use of the local symbol table is to verify that all declarations were parsed successfully and that names of parameters and local variables have been recorded properly.  If you are doing some error checking, you could use this information to detect duplicate declarations of identifiers or use of undeclared identifiers in statements and expressions. If you do this, it's best if your compiler only complains once about each undeclared identifier.  (Redundant error messages tend to annoy users and can hide other, useful information.)  A good strategy to avoid redundant "undeclared identifier" messages is to add the undeclared identifier to the symbol table after it is first seen and an error message has been printed.

The global symbol table should contain one entry for each function name used in the program.  Remember that in D a function does not need to be declared before it can be used.  So the first occurrence of a function name might be in a function call expression in another function.  To deal with this, put a function name in the global symbol table when it is first encountered, no matter how.

If you want to check for errors, there are a couple of things that can be detected easily.  First, you can include a boolean value with each function entry indicating whether the function definition has been encountered yet.  This should be set to false if the name is first encountered in a function call.  When a function declaration is parsed, set this variable to true.  If it was already true when the function definition is reached, then the function name is being declared twice - an error that you could report.  When the end of the program is reached, you could scan the global symbol table and report any functions that were used (called) but never defined.

Another useful piece of information about each function is the number of parameters.  If you record this when you first encounter the function, you can detect when the function is called with the wrong number of arguments, or if the function definition contains a different number of parameters than were used in an earlier call.

The D language includes two functions, get and put, that provide input and output of integer values.  These functions are not defined in D programs, but are available without declaration, and no error messages should be generated if they are used.  The simplest way to handle this is to initialize the global symbol table so it contains appropriate entries for get and put before parsing the program.

Trace Output

The trace should be produced by printing a suitable message at the beginning and end of each parser function. You should also include a function in the parser to turn tracing on or off. Here's a sketch of code that could be included in the parser to make this work

      /* true if producing parser trace output, false if not */
      static int tracing = 1;

      /* start or stop tracing */
      void set_tracing(int on_off) { tracing = on_off; }

      /* print trace messages when entering and leaving the named parser function */
      void enter(char* function) { 
         if (tracing) { putstr("entering "); putline(function); }
      }
      void exit (char* function) {
         if (tracing) { putstr("leaving "); putline(function); }
      }

      ...
      /* parse:  term ::= factor | term * factor   */
      private void term() {
         enter("term");
         code to parse a term
         exit("term");
      }
   }

If you want to get fancy, you could keep track of the nesting depth of the calls (number of parser functions called that are not yet finished) and indent the trace lines to reflect this nesting.

There should be a similar mechanism to turn the symbol table printouts on and off.

Test Program

Your compiler should contain a main function that tests the parser.  This test program should initialize I/O and any other necessary routines, then parse the D program by calling the function for the nonterminal program.

Hints

Test Programs

There are some test programs on the course web site in the assignments section.  It would also be great if you would contribute additional test programs.

You should demonstrate that your parser works by parsing some sample D programs.  The output should show the parser trace and symbol tables, along with the corresponding lines from the source code.  Be sure that your test program(s) use all of the constructs in the grammar to verify that all of the parser functions work properly.

If you have added any error checking, include output that demonstrates these checks. (Addendum: you can save including this output until next week's turn-in of the entire working compiler.)

Project Turnin

Electronic: Turn in your program electronically using the link on the assignments page. If you worked with a partner only one person should submit. You should submit all files required to make your compiler so far (scanner and parser code).

Paper: Hand in a printout of your code, test output and written problems in class on November 30. You should turn in your complete compiler online, but you only need to hand in printed copies of the parser and tests in class (i.e., the new stuff). Note that if you worked with a partner on the compiler each partner must still turn in solutions to the written problems individually.

Additional Notes (added after 11/26/07)

Added 11/26/07:

  1. It is o.k. to include the empty string in the grammars you create in question #2.
  2. 2.a. The set of terminal symbols is a and b.
  3. 2.b. The set of terminal symbols is a, b, and c.
  4. You may modify any of the files we have given you (token, io, etc.) as well as your original symtab.h/.c files.
  5. I will award some small amount of extra credit for going above and beyond what is required on the compiler project. If you attempt extra credit I would *strongly* recommend that you get the basic version of the parser working first, make a copy of your entire project (perhaps even go ahead and submit it), and then work on any extra credit. While it is fine to include some extra credit options with what you submit on Nov 30th when you submit your parser, we won't be evaluating extra credit until your final submission of the entire compiler on Dec. 7th (so you could also wait until next week to work on these). The following things are not required for your compiler and would be considered *extra credit*:
    • Check that a variable is not declared twice in the same function (either as a parameter or a local variable).
    • Check that all variables that are used have been declared.
    • Give a warning (but not an error) for variables that are not used.
    • Check that functions that are used are also defined (other than get and put which we will count as already being defined).
    • Check that functions are called with the correct number of parameters.
    • Check that one and only one main function has been declared.
    There are other things you might check for as well. If you implement any of these or other error checking in your compiler we will ask that when you turn in the finished compiler on Dec 7th that you submit 1) a short writeup describing what you implemented, and 2) test cases and output that demonstrate your additions. Note that you do not need to submit this writeup this week as we will take a single extra credit writeup on Dec 7th when the final version of the compiler is submitted.
  6. Note: Unless you are doing error checking you are not required to put function names into the global symbol table as soon as they are encountered, just on their definition will be fine.

    Added 11/27/07:

  7. Here are sample output files showing both tracing and the printing of symbol tables set "on": hello.s and functions.s. Your output does not have to match this exactly but it gives you an idea of what should be generated.