
CSE 401 15wi - Project II - Parser+AST
Due: Thursday, February 5 at 11:00 pm. You should turn in your project using the assignment drop box (see link on the course home page). Further instructions are at the bottom of this writeup.
Overview
For this part of the project, construct a parser (recognizer) and build abstract syntax trees (ASTs) for MiniJava. You should use the CUP parser generator tool to interface with your JFlex scanner. Be sure that your parser and scanner together can successfully parse legal MiniJava programs.
The semantic actions in the parser should create an Abstract Syntax Tree (AST) representation of the parsed program. For now, just build the AST. Semantic analysis and type checking will be included in the next part of the project.
In addition to building the AST, you should provide an
implementation of the Visitor pattern to print a nicely indented
representation of the tree on standard output.
You should also use the
PrettyPrintVisitor
supplied with the starter
code to print a "decompiled" version of the AST as a Java source program
when requested. These output formats are different and more details
are given below.
Modify your MiniJava main
program so that when it is
executed using the command
java MiniJava -A filename.javait will parse the MiniJava program in the named input file and print an abstract representation of the AST on standard output. Similarly, if your compiler is executed using the command
java MiniJava -P filename.javait should parse the MiniJava program in the named input file and print on standard output a decompiled version of the AST ("prettyprint") in a format that is a legal Java program that could be processed by any Java compiler. The
-P
option should use the PrettyPrintVisitor
class provided in our project
starter code, with whatever small modifications might be needed due to
changes or additions made to the AST classes in your compiler.
To make this a bit more concrete, suppose we use this input file: Foo.java. The MiniJava -A
abstract AST printout should resemble this output, while MiniJava -P
should prettyprint the file and produce something resembling this output. Your actual output may differ slightly, but it should be similar. Note that Foo.java is clearly an illegal Java program, but it is syntactically legal, so you should be able to parse it.
If the input program contains syntax errors it is up to you how you handle the situation. You can simply report the error and quit without producing the AST, or, if you wish, you can try to do better than that.
The java
commands shown above will also
need a -cp
argument or CLASSPATH
variable
as before to locate the compiled .class
files and
libraries. See the scanner assignment
if you need a refresher on the details
Your MiniJava
compiler should still be able to print out
scanner tokens if the -S
option is used instead
of -P
or -A
. There is no requirement for how your compiler
should behave if more than one of -A
, -P
, and -S
are
specified at the same time. That is up to you. You could treat that
as an error, or, maybe more useful, the compiler could print
tokens followed by the AST, or the abstract AST followed by the prettyprinted one, etc.
Feel free to experiment with language extensions (additional Java
constructs not included in MiniJava) or syntactic error recovery
if you wish, but be sure to get the basic parser/AST/visitors
working first. You should document any extensions or error
handling in the INFO
file described in the
"What to Hand In" section.
Details
You may need to massage the MiniJava grammar to make it LALR(1) so that CUP can use it to produce a parser. Please keep track of the changes you make and turn in a description of them with this part of the project (see below).
Take advantage of precedence and associativity declarations in the
parser specification to keep the overall size of the parser
grammar small. In particular, exp ::= exp op exp
productions along with precedence and associativity declarations
for various operators will shorten the specification considerably
compared to a grammar that encodes that information in separate
productions. CUP's input syntax is basically the same that used
by YACC and Bison, described in many compiler books. It should be
easy enough to pick up the syntactic details from the CUP
documentation and example code.
Your grammar should not contain any reduce-reduce conflicts, and should have as few shift-reduce conflicts as possible. You should describe any remaining shift-reduce conflicts in your writeup.
The starter code contains a TestParser
program that
shows how the parsing tools work together. You can run that with ant test-parser
to see what it does. You will need to
adapt the ideas found there for your own compiler. Feel free to add
additional targets to build.xml
to help build and test your parser.
Once you have the parsing rules in place and have sorted out any
grammar issues, add semantic actions to your parser to create an
Abstract Syntax Tree (AST) and add Visitor code to print a nicely
indented representation of the AST on standard output. Also add code to
use the supplied PrettyPrintVisitor
class to print the decompiled version of the AST when requested.
The AST should represent the abstract structure of the code, not the concrete syntax. Each node in the tree should be an instance of a Java class corresponding to a production in an abstract grammar. The abstract grammar should be as simple as possible, but no simpler. Nodes for nonterminals should contain references to nodes representing non-trivial component parts of the nonterminal, i.e., the important items on the right-hand side of an abstract grammar rule. Trivial things like semicolons, braces, and parentheses should not be represented in the AST, and chain productions (like Expression -> Term -> Factor -> Identifier) should not be included unless they convey useful semantic information. The classes defining the different abstract syntax tree nodes should take advantage of inheritance to build a strongly-typed, self-describing tree. We suggest that you start with the AST classes given on the MiniJava project web page, which are included in the starter code for our project.
The output of the new AST Visitor should be a readable representation of
the abstract tree, not a "de-compiled"
version of the program, and, in particular, the output will not be a
syntactically legal Java program that could then be fed back into a
Java compiler. Each node in the tree should normally begin on a
separate line, and children of a node should be indented below it to
show the nesting structure. The output should include source line
numbers for major constructs in the tree (certainly for individual
statements and for things like class, method, and instance variable
declarations, but not necessarily for every minor node in, for
example, expressions). The tree printout should not include
syntactic noise from the original program like curly braces
({}
), semicolons, and other punctuation that is not
retained in the AST, although you should use reasonable punctuation
(indentation, whitespace, parentheses, commas, etc.) in your output to make
things readable. Although most tree nodes should occupy a separate
line in the output, you can, if you wish, print things like
expressions on fewer lines to reduce the vertical space used, as
long as your output clearly shows the AST structure of the
expression (perhaps by adding parentheses to show nesting
if line breaks and indenting are not used).
We have included the MiniJava Visitor
interface
and PrettyPrintVisitor.java
file from the MiniJava
project web site in the directory src/AST/Visitor
in the
starter code. Note that this PrettyPrintVisitor
doesn't
print the AST in the abstract format required above --
it is closer to the concrete syntax and doesn't have much
indentation logic. But you will want to use it to pretty-print the AST (compiler -P
option) and you will find it useful as a starting point
for understanding the visitor pattern and creating the new AST visitor (compiler -A
option). Also, cloning an
existing visitor is often a good way to start developing a new
one.
You should test your parser, semantic actions, and printing visitor by processing several MiniJava programs, both correct ones and ones with syntax errors.
As before, your compiler should only use language features available in Java 7, which is the environment that will be used to test your code.
You should continue to use your CSE 401 gitlab repository to store the code for this and remaining parts of the compiler project.
What to Hand In
The main things we'll be looking at in this phase of the project are the following:
- The grammar specification, including semantic actions, for your parser.
- Source files containing definitions of the AST node classes and visitors.
- Printed AST representations produced by both print visitors (
-A
and-P
options). - A brief
INFO
file describing any changes you made to the grammar, and a list of shift-reduce conflicts remaining in the grammar (if any) with an explanation of why these are not a problem. - If you include any error handling or extra features in your
parser, or add any language extensions, include a brief
description of these features in your
INFO
writeup.
As before, your code should run on the linux lab machines or attu
when built
with ant
. You should do an ant clean
, then
bundle up your compiler directory in a tar
file and
turn that in. That will ensure that we have all the pieces of your
compiler if we need to check something.
To create the tar file, run the following commands starting in your main project directory (the one that contains build.xml
):
ant clean cd .. tar cvfz parser.tar.gz your_project_directory_nameThen turn in the
parser.tar.gz
file.
You and your partner should turn in only a single copy of the
project using one of your UW netids, preferably the same one you
used for the scanner, although this is not
required. Your INFO
file should include your names and
uw netids so we can correctly identify everyone involved in the
group and get feedback to you. Multiple turnins are fine, as usual - we'll
grade the last one you give us. In particular, if you plan on adding
error handling or other extra features, turn in a copy of the
working, basic assignment first before you add these, then turn in
the enhanced parser later once your additions are working.
To be sure that everything is in working order, we strongly suggest
that before you create the tar file you first run ant clean;
ant
to rebuild your project from scratch, then run any tests
you want, then run the commands given above to create the actual tar
file to be turned in. That will also verify that your code will
compile using Java 7, assuming you run this on a lab machine or other
machine that has Java 7 installed.
Computer Science & Engineering University of Washington Box 352350 Seattle, WA 98195-2350 (206) 543-1695 voice, (206) 543-2969 FAX
Comments to adminanchor