Assigned: April 19, 2015.
Due: 11pm, May 4, 2015.
The goal of this assignment is to familiarize yourself with dataflow analysis by implementing one using the Polyglot compiler framework.
a = b * c + d; e = b * c + f;the expression
b * c
is redundant on the second line. Applying CSE
to the code above will generate the following:
t = b * c; a = t + d; e = t + f;Note the introduction of the temporary variable
t
for this purpose. In general,
CSE is targeted to remove any redundant computations, not just binary
expressions. However,
this is done at the expense of introducing extra temporary variables,
which might be a problem for architectures that have few number of registers,
causing
register spills
as a result. We will not consider that to be an issue for this assignment.
x = y;
x = op y;
x = y op z;
x = y.f;
x.f = y;
if (x) { ... }
while (true) { ... }
break;
return x;
x, y, z
in the code above are all local variables or constants where allowed.-
and !
, binary operator includes the usual set of
arithmetic and boolean operators.
++, --
operators),
and compound assignments (+=
, etc).super()
call to java.lang.Object
in each class' constructor which you can ignore.e
, record e
in a table.e
in the method, check if e
is
an available expression. If it is, increase the number of uses
of e
by 1.e
, check if e
is used more than once.
If so create a new temporary variable t
to store the
evaluation result.e
where e
is available, replace e
with
t
.x = a + b; if (c) y = a + b; else z = a + b;The expression
a + b
defined on line 1 is available inside both
branches of the condition. Hence lines 3 and 5 contains common subexpressions that
can be eliminated:
t1 = a + b; x = t1; if (c) y = t1; else z = t1;Notice the introduction of temporary variable
t1
.
Your job for this assignment is to implement intraprocedural form of CSE (i.e.,
you don't need to implement passing extra parameters to avoid redundant computation across function
call boundaries). However, you will need to do so by introducing as few
temporaries as possible.
An expression e is available at program point p if every path from the entry node to p evaluates e. Furthermore, after the last such evaluation prior to reaching p, there are no subsequent assignments to any variables contained within e.
To compute available expressions, we set up a forward dataflow analysis that is very similar to reaching definitions. Part of the assignment is to figure out what are the transfer function and meet operation for this analysis.
You can download the starter code for the assignment here. Here is the structure of the code:
lib
: contains binary distributions of Polyglot.compiler
: contains source files of the passes that you will write.build.xml
: ant build script for the compiler.bin
: the script
hw1c in that directory can be used to invoke the compiler. By default,
running hw1c foo.java
generates a new foo.java
after
running all the passes in the compiler. If you don't want to overwrite the input file,
pass -d [directory name]
flag to the script.
(You can pass multiple java files to hw1c
, just like
javac
.)
tests
: contains test files for the compiler.You are free to add any new classes as needed. Some files inside compiler/src that you might want to look at are:
cse.Main
: invokes the compiler, you shouldn't need to change this.cse.Hw1Scheduler
: pass scheduler for the compiler, note the additional
passes that are added to the standard pipeline.cse.ast.cseExtFactory_c
: extensions are means in Polyglot to
associate each type of AST node with extra information. In the implemented example
we store whether an assignment expression is assigning the value 1 or not as a boolean flag.
cse.visit
: each class in this package implements a compiler pass. Each
pass that takes in an AST and outputs another (possibly changed) one. In the
provided example, AssignOneDetector
performs a dataflow that collects
all assign one expressions, and AssignOneConverter
changes the values
of all such assignments. Check
Dataflow
and NodeVisitor APIs for details.
cse.parse
: this directory contains parser-related files that you
don't need to change. In particular, hw1.ppg contains the input grammar. For this
assignment, we just use the standard Java 1.4 grammar, which is a superset of
the input language described above. For those who are interested, the grammar is
specified using the PPG
parser generator.
For extra credit, implement CSE in presence of method calls. In particular,
add o.f(...)
to the input language. Assume that there will only be
calls to other methods that are defined in the same class (i.e., no external library
calls), and any side effects will be limited to the receiver object (i.e.,
o
in o.f(...)
, or any of the method parameters).
Method calls complicate the analysis as their side effects might invalidate certain available expressions, for instance changing the value of a field in the receiver object.
One way to implement available expression analysis is to first assume all calls make changes to their receivers and parameters, and rerun the analysis after knowing the actual effect (if any) of any called methods. Since the default Dataflow class works on each method individually, you will need to implement another pass that first runs dataflow analysis on each method, recording down those methods that need to be re-analyzed as they contain function calls in a worklist, along with methods that are called.
Another possibility is to use the Accrue interprocedural analysis framework that is built on top of Polyglot. See this tutorial for details.
You should turn in an archive file of your hw1
directory.
You don't need to submit any
class files as we will rebuild your compiler using the top level ant script
(make sure it builds before turning it in). You should not be using any external
libraries beyond those that are included in the lib directory.
Contact the staff if you have some libraries that you would like to use.
In addition, the archive should include a short (2 pages max) writeup that describes the following:
Please submit your code and writeup on Catalyst. You may submit as many times as you like; we will use the latest version you submit before the deadline. We will not grade any late submissions, however.
While this is an individual assignment, we encourage you to discuss with others and the staff if you encounter any issues. Feel free to use the discussion board for questions.