Due: Wednesday, February 13, at 11:59 pm. You should turn your project in using the assignment drop box (see link on the course home page).
For this part of the project, construct a parser (recognizer) and build abstract syntax trees (ASTs) for MiniJava. You should use the CUP parser generator tool to interface with your JFlex scanner. Be sure that your parser and scanner together can successfully parse legal MiniJava programs.
The semantic actions in the parser should create an Abstract Syntax Tree (AST) representation of the parsed program. For now, just build the AST. Semantic analysis and type checking will be included in the next part of the project.
In addition to building the AST, you should provide an implementation of the Visitor pattern to print a nicely indented representation of the tree on standard output (more below).
Feel
free to experiment with language extensions (additional Java constructs
not included in MiniJava) or syntactic error recovery (the CUP documentation describes how you can do this using the special error
symbol) if you wish,
but be sure to get the basic parser/AST/visitor working first.
You may need to massage the MiniJava grammar to make it LALR(1) so that CUP can use it to produce a parser. Please keep track of the changes you make and turn in a description of them with this part of the project (see below).
Take advantage of precedence and associativity declarations in the
parser specification to keep the overall size of the parser grammar small.
(In particular, exp ::= exp op exp
productions along
with precedence and associativity declarations for various operators will shorten
the specification
considerably
compared to a grammar that encodes that information in separate productions.)
CUP's input syntax is basically the same that used by YACC and Bison, described
in many compiler books. It should be easy enough to pick up the syntactic details
from
the CUP
documentation and
example
code.
Your grammar should not contain any reduce-reduce conflicts, and should have as few shift-reduce conflicts as possible. You should describe any remaining shift-reduce conflicts in your writeup.
Once you have the parsing rules in place and have sorted out any grammar issues, add semantic actions to your parser to create an Abstract Syntax Tree (AST) and add Visitor code to print a nicely indented representation of the AST on standard output.
The AST should represent the abstract structure of the code, not the concrete syntax. Each node in the tree should be an instance of a Java class corresponding to a production in an abstract grammar. The abstract grammar should be as simple as possible, but no simpler. Nodes for nonterminals should contain references to nodes representing non-trivial component parts of the nonterminal, i.e., the important items on the right-hand side of an abstract grammar rule. Trivial things like semicolons, braces, and parentheses should not be represented in the AST, and chain productions (like Expression -> Term -> Factor -> Identifier) should not be included unless they convey useful semantic information. The classes defining the different abstract syntax tree nodes should take advantage of inheritance to build a strongly-typed, self-describing tree. We suggest that you start with the AST classes given on the MiniJava project web page and included in the starter code for the project. You may extend them if you need to.
The Visitor output should be a readable
representation of the abstract tree, not a "de-compiled" version
of the program, and, in particular, the output will not be a syntactically
legal Java program that could then be fed back into a Java compiler. Each node
in the tree should normally begin on a separate line, and children of a node
should be indented below it to show the nesting structure. The output should include source line numbers for major constructs in the tree (certainly for individual statements and for things like class, method, and instance variable declarations, but not necessarily for every minor node in, for example, expressions). The tree printout should
not include syntactic noise from the original program like curly braces ({}
),
semicolons, and other punctuation that is not retained in the AST, although
you should use reasonable punctuation (whitespace, parentheses, commas, etc.)
in your output to make things readable. Although most tree nodes should occupy
a separate line in the output, you can, if you wish, print things like expressions
on fewer lines to reduce the vertical space used, as long as your output clearly
shows the AST structure of the expression.
We have included the MiniJava Visitor
interface and PrettyPrintVisitor.java
file from the MiniJava project web site in the directory AST/Visitor
in the starter
code. Note that this PrettyPrintVisitor
doesn't print the AST in the format
requested for this part of the project - it is closer to the concrete syntax
and doesn't have much indentation logic. But you may find it useful as a starting
point for understanding how the visitor pattern works, and cloning an existing
visitor is often a good way to start developing a new one.
You should test your parser, semantic actions, and printing visitor
by processing several MiniJava programs, both correct ones and ones with syntax
errors. You should create a test program for this phase of the project in
a class TestAST
whose source file is TestAST.java
in
directory src
in
the MiniJava project tree. When executed without arguments, TestAST
should
read a MiniJava source program from System.in
, parse it, build
the AST, then print the AST on System.out
using your AST Visitor
implementation. If any errors
are detected while scanning or parsing the MiniJava input, when TestAST
terminates it should
exit with a non-zero return code (System.exit(1);
or something
similar), otherwise it should exit normally.
The main things we'll be looking at in this phase of the project are the following:
Factorial.java
sample
program from the MiniJava web site.INFO
file describing any changes you made to the
grammar, and a list of shift-reduce conflicts remaining in the grammar (if
any) with an explanation
of why these are not a problem.As before, your code should run on attu
when built with ant
. You should do
an ant clean
, then bundle up your compiler directory in a tar
file
and turn that in. That will ensure that we have all the pieces of your compiler
if we need to check something.
You and your partner should turn in only a single copy of the project using
one of your UW netids, preferably the same one you used for the scanner, although
this is not required. Your INFO
file should include your names and uw netids
so we can correctly identify
everyone
involved
in
the group
and
get
feedback
to you. Multiple
turnins are fine - we'll grade the last one you give us. In particular, if
you plan on adding error handling or other extra features, turn in a copy
of the
working,
basic
assignment first before you add these, then turn in the enhanced parser
later once your additions are working.