Due: Thursday, April 28 at 11:00 pm PST. You will "turn in" your project as you did with the scanner by pushing it to your GitLab repository and providing a suitable tag. See the end of this writeup for details.
For this part of the project, construct a parser (recognizer) and build abstract syntax trees (ASTs) for MiniJava. You should use the CUP parser generator tool to interface with your JFlex scanner. Be sure that your parser and scanner together can successfully parse legal MiniJava programs.
The semantic actions in the parser should create an Abstract Syntax Tree (AST) representation of the parsed program. For now, just build the AST. Semantic analysis and type checking will be included in the next part of the project.
In addition to building the AST, you should provide two Visitors
that print the AST in different ways. One should be an implementation of the
Visitor pattern to print a nicely indented representation of the tree
on standard output. The second should use the PrettyPrintVisitor
supplied with the starter code to print a "decompiled" version of the AST as
a Java source program when requested. These output formats are different
and more details are given below.
Modify your MiniJava main
program so that when it is
executed using the command
java MiniJava -A filename.javait will parse the MiniJava program in the named input file and print an abstract representation of the AST on standard output.
Similarly, if your compiler is executed using the command
java MiniJava -P filename.javait should parse the MiniJava program in the named input file and print on standard output a decompiled version of the AST ("prettyprint") in a format that is a legal Java source program that could be processed by any Java compiler. The
-P
option should use
the PrettyPrintVisitor
class provided in our project
starter code, with whatever small modifications might be needed due
to changes or additions made to the AST classes in your compiler.
The decompiled version of a MiniJava program produced by
this option won't necessarily have the same indenting, whitespace,
or other details as the original program from which the AST was created,
but it will be semantically identical.
To make this a bit more concrete, suppose we have this input
file: Foo.java.
The MiniJava -A
abstract AST printout should
resemble this output,
while MiniJava -P
should prettyprint the file and
produce something
resembling this
output. Your actual output will undoubtedly be different -- these
examples are intended as a concrete example of what we've got in mind,
not something to be strictly matched character-for-character.
In particular, the prettyprint output will
very likely have differences in whitespace and line breaks, could have additional
parentheses so there are no potential ambiguities, and so forth;
and the AST names, indenting, and other details will depend on your
compiler structure and implementation decisions.
But the output needs to include the required information, appropriately formatted.
(For more about parser output, see the Details section
below.) Note that Foo.java is
clearly an illegal Java program, but it is syntactically correct, so
you should be able to parse it and build an AST.
The java
commands shown above will also
need a -cp
argument or CLASSPATH
variable
as before to locate the compiled .class
files and
libraries. See the scanner assignment
if you need a refresher on the details
Your MiniJava
compiler should still be able to print
out scanner tokens if the -S
option is used instead
of -P
or -A
. There is no requirement for
how your compiler should behave if more than one
of -A
, -P
, and -S
are
specified at the same time. That is up to you. You could treat
that as an error, or, maybe more useful, the compiler could print
tokens followed by the AST, or the abstract AST followed by the
prettyprinted one, etc.
As with the scanner, when your compiler terminates it should return an exit or status code of 0 if no errors were detected while compiling (scanning and parsing) the input program. It should return an exit code of 1 if any errors were detected.
If the input program contains syntax errors, it is up to you how you handle the situation. You can simply report the error and quit immediately without producing the AST, or, if you wish, you can try to do better than that.
Feel free to experiment with language extensions (additional Java
constructs not included in MiniJava) or syntactic error recovery if
you wish, but be sure to get the basic parser/AST/visitors working
first. You should document any extensions or error handling in
the Notes/parser-notes.txt
file described in the "What to
Hand In" section.
You may need to massage the MiniJava grammar to make it LALR(1) so that CUP can use it to produce a parser. Please keep track of the changes you make and turn in a description of them with this part of the project (see below). Your grammar should not contain any reduce-reduce conflicts, and should have as few shift-reduce conflicts as possible. You should describe any remaining shift-reduce conflicts in your writeup and explain how they are resolved and why this resolution is appropriate.
Note that the minijava.cup
example file supplied
with the starter code is intended only to show how the tools work together.
The grammar given there is not a proper subset of the MiniJava grammar,
and you should make appropriate changes as you create the full grammar
for your MiniJava compiler.
Take advantage of precedence and associativity declarations in the
parser specification to keep the overall size of the parser
grammar small. In particular, exp ::= exp op exp
productions along with precedence and associativity declarations
for various operators will shorten the specification considerably
compared to a grammar that encodes that information in separate
productions like expr
, term
, factor
,
etc. CUP's input syntax is basically the same as that used
by YACC and Bison, described in many compiler books. It should be
easy enough to pick up the syntactic details from the CUP
documentation and example code.
Hint/warning: Be sure to test precedence and associativity
carefully, and not just in the obvious cases involving operators
like +
and *
. You should be sure that
things like variables, array element references, parenthesized
subexpressions, and method calls in expressions interact correctly
with other operations, and that the resulting ASTs have the right
structure (your AST print visitor can be very helpful here for
visualizing the structure of the tree).
For example, a+b.f()
must have a parse tree that
corresponds to a+(b.f())
, not
(a+b).f()
.
The starter code contains a DemoParser
program that
shows how the parsing tools work together. You can run that
with ant demo-parser
to see what it does (although you
may need to use the original starter code versions of the jflex and
CUP input files for this ant target to work successfully). You will need
to adapt the ideas found there for your own compiler. Feel free to
add additional targets to build.xml
to help build and
test your parser.
You should work out the details of the grammar and fix any conflicts first before adding semantic actions (code) to build the AST. It is easier to debug grammar problems if you work on the grammar productions by themselves before adding additional code.
CUP
has some useful features for debugging.
First, the starter build.xml
ant file already
includes options that cause
CUP
to create a cup.out
file in the
build
directory. This file contains details of the
parser generation, including a description of the LR states and
other information, which can be
very useful for understanding conflicts and other problems with the grammar.
After the parser is generated, a MiniJava compiler parses its
input by calling
the parse()
method.
As described in the DemoParser.java
sample program,
if debug_parse()
is called instead, the parser will
print a detailed trace of all of the shift and reduce actions performed as it
parses the input. That information can be very useful when
trying to figure out parser problems, particularly in combination
with the information in
the cup.out
file produced when the parser was generated,
which describes the parser states in detail.
Once you have the parsing rules in place and have sorted out any
grammar issues, add semantic actions (Java code associated with the grammar
rules in the CUP input file) to your parser to create an
Abstract Syntax Tree (AST), and add Visitor code to print a nicely
indented representation of the AST on standard output. Also add code
to use the supplied PrettyPrintVisitor
class to print the
decompiled version of the AST when requested by the appropriate option
used to run your MiniJava compiler.
The AST should represent the abstract structure of the code, not
the concrete syntax. Each node in the tree should be an
instance of a Java class corresponding to a production in an
abstract grammar. The abstract grammar should be as simple as
possible, but no simpler. Nodes for nonterminals should contain
references to nodes representing non-trivial component parts of the
nonterminal, i.e., the important items on the right-hand side of an
abstract grammar rule. Trivial things like semicolons, braces,
and parentheses should not be represented in the AST, and chain
productions (like Expression -> Term
-> Factor -> Identifier in the classic expression
grammar) should not be
included unless they convey useful semantic information. The classes
defining the different abstract syntax tree nodes should take
advantage of inheritance to build a strongly-typed, self-describing
tree. We suggest that you start with the AST classes given on
the MiniJava
project web page. This code, with some small local changes,
is included in the starter code for our project in the src/AST
directory.
The output of the new AST Visitor should be a readable representation of
the abstract tree, not a "de-compiled"
version of the program, and, in particular, the output will not be a
syntactically legal Java program that could then be fed back into a
Java compiler. Each node in the tree should normally begin on a
separate line, and children of a node should be indented below it to
show the nesting structure. The output should include source line
numbers for major constructs in the tree (certainly for individual
statements and for things like class, method, and instance variable
declarations, but not necessarily for every minor node in, for
example, expressions). The abstract tree printout should not include
syntactic noise from the original program like curly braces
({}
), semicolons, and other punctuation that is not
retained in the AST, although you should use reasonable punctuation
(indentation, whitespace, parentheses, commas, etc.) in your output to make
things readable. Although most tree nodes should occupy a separate
line in the output, you can, if you wish, print things like
expressions on fewer lines to reduce the vertical space used, as
long as your output clearly shows the AST structure of the
expression (perhaps by adding parentheses to show nesting
if line breaks and indenting are not used). The
sample output
for the Foo.java example
given above does this by printing arithmetic expressions and some
individual method calls on a single line. You do not have to do this.
Each of the component parts of an expression, or anything else in the tree,
can be printed on a properly indented line by itself.
We have included the MiniJava Visitor
interface
and PrettyPrintVisitor.java
file from the MiniJava
project web site in the directory src/AST/Visitor
in
the starter code. Note that this PrettyPrintVisitor
doesn't print the AST in the abstract format required above -- it is
closer to the concrete syntax and doesn't have much indentation
logic. But you will want to use it as the basis for your code to pretty-print the AST
(compiler -P
option) and you will find it useful as a
starting point for understanding the visitor pattern and creating
the new AST visitor (compiler -A
option). Also, cloning
an existing visitor is often an effective way to start developing a new
one.
You should test your parser, semantic actions, and printing visitor by processing several MiniJava programs, both correct ones and ones with syntax errors.
You should continue to use your CSE 401 gitlab repository to store the code for this and remaining parts of the compiler project.
The main things we'll be looking at in this phase of the project are the following:
-A
and -P
options).parser-notes.txt
file describing any
changes you made to the grammar, and a list of shift-reduce
conflicts remaining in the grammar (if any) with an explanation of
why these are not a problem. Please include this in
the Notes/
top-level directory of your repository so
it will be easy to find.parser-notes.txt
file.As before you will submit your parser project by pushing code
to your GitLab repository. Once you are satisfied that everything is working
properly,
create a parser-final
tag and push that to the repository.
Then we strongly suggest that you create a fresh clone of your
repository in some completely different temporary directory, checkout
the parser-final
tag, and verify that everything
works as expect. If necessary, fix any problems back in your regular
working copy of the code, push the changes to
the repository, and update the parser-final
tag to
refer to the correct commit in the repository.
Then run your tests again on a fresh clone of the repository to be sure
everything is working properly.
When you are satisfied that the parser-final
tag in the
repository correctly identifies the finished parser+ast project you are done.