Assigned: April 19, 2015.
Due: 11pm, May 4, 2015.

Objective

The goal of this assignment is to familiarize yourself with dataflow analysis by implementing one using the Polyglot compiler framework.

Background

In this assignment you will implement common subexpression elimination (CSE). Common subexpressions represent redundant computation. For instance, in the following code:
a = b * c + d;
e = b * c + f;
the expression b * c is redundant on the second line. Applying CSE to the code above will generate the following:
t = b * c;
a = t + d;
e = t + f;
Note the introduction of the temporary variable t for this purpose. In general, CSE is targeted to remove any redundant computations, not just binary expressions. However, this is done at the expense of introducing extra temporary variables, which might be a problem for architectures that have few number of registers, causing register spills as a result. We will not consider that to be an issue for this assignment.

Polyglot

We will build the analysis using version 2.6.1 of the Polyglot compiler framework. Polyglot is a source-to-source, pass-based optimizing compiler, written in Java. The compiler first parses the input code into an abstract syntax tree (AST), and then forwards the AST to each of the optimization passes in a predefined order. The final AST is then written to the output file. Optionally the output can also be compiled to Java bytecode. You might want to refer to this tutorial and API pages for reference.

Details

Input Language

For this assignment we will assume the input language to be a subset of Java where each method body will consist of one of the following types of statements:
  • simple assignment: x = y;
  • unary expression: x = op y;
  • binary expression: x = y op z;
  • field load: x = y.f;
  • field store: x.f = y;
  • conditional: if (x) { ... }
  • while true loops: while (true) { ... }
  • break statements: break;
  • return statements: return x;
In particular:
  • x, y, z in the code above are all local variables or constants where allowed.
  • unary operator includes - and !, binary operator includes the usual set of arithmetic and boolean operators.
  • binary operators will not be another assignment.
  • there are no increment / decrement (++, -- operators), and compound assignments (+=, etc).
  • there are no method calls in this language. The only exception is the super() call to java.lang.Object in each class' constructor which you can ignore.
  • there is no shared memory in the execution model. And you can assume that there is only one thread executing at any time.

Analysis Overview

CSE is a procedural-based analysis. Here are the steps to implement the analysis in our input language:
  1. Determine the availabile expressions at each code block.
  2. Go through each method. At the first evaluation of each binary expression e, record e in a table.
  3. At the use of e in the method, check if e is an available expression. If it is, increase the number of uses of e by 1.
  4. Go through each method again. At the first evaluation of each binary expression e, check if e is used more than once. If so create a new temporary variable t to store the evaluation result.
  5. At each subsequent use of e where e is available, replace e with t.
Here is an example:
x = a + b;
if (c)
  y = a + b;
else
  z = a + b; 
The expression a + b defined on line 1 is available inside both branches of the condition. Hence lines 3 and 5 contains common subexpressions that can be eliminated:
t1 = a + b;
x = t1;
if (c)
  y = t1;
else
  z = t1;
Notice the introduction of temporary variable t1. Your job for this assignment is to implement intraprocedural form of CSE (i.e., you don't need to implement passing extra parameters to avoid redundant computation across function call boundaries). However, you will need to do so by introducing as few temporaries as possible.

Available Expressions

An expression e is available at program point p if every path from the entry node to p evaluates e. Furthermore, after the last such evaluation prior to reaching p, there are no subsequent assignments to any variables contained within e.

To compute available expressions, we set up a forward dataflow analysis that is very similar to reaching definitions. Part of the assignment is to figure out what are the transfer function and meet operation for this analysis.

Starter Code

You can download the starter code for the assignment here. Here is the structure of the code:

  • lib: contains binary distributions of Polyglot.
  • compiler: contains source files of the passes that you will write.
  • build.xml: ant build script for the compiler.
  • bin: the script hw1c in that directory can be used to invoke the compiler. By default, running hw1c foo.java generates a new foo.java after running all the passes in the compiler. If you don't want to overwrite the input file, pass -d [directory name] flag to the script. (You can pass multiple java files to hw1c, just like javac.)
  • tests: contains test files for the compiler.
To help you get started, the starter code contains an implementation of a trivial dataflow and code transformation pass (one that changes all expressions that assign 1 to assign 42, try running it!).

You are free to add any new classes as needed. Some files inside compiler/src that you might want to look at are:

  • cse.Main: invokes the compiler, you shouldn't need to change this.
  • cse.Hw1Scheduler: pass scheduler for the compiler, note the additional passes that are added to the standard pipeline.
  • cse.ast.cseExtFactory_c: extensions are means in Polyglot to associate each type of AST node with extra information. In the implemented example we store whether an assignment expression is assigning the value 1 or not as a boolean flag.
  • cse.visit: each class in this package implements a compiler pass. Each pass that takes in an AST and outputs another (possibly changed) one. In the provided example, AssignOneDetector performs a dataflow that collects all assign one expressions, and AssignOneConverter changes the values of all such assignments. Check Dataflow and NodeVisitor APIs for details.
  • cse.parse: this directory contains parser-related files that you don't need to change. In particular, hw1.ppg contains the input grammar. For this assignment, we just use the standard Java 1.4 grammar, which is a superset of the input language described above. For those who are interested, the grammar is specified using the PPG parser generator.

Extra Credit

For extra credit, implement CSE in presence of method calls. In particular, add o.f(...) to the input language. Assume that there will only be calls to other methods that are defined in the same class (i.e., no external library calls), and any side effects will be limited to the receiver object (i.e., o in o.f(...), or any of the method parameters).

Method calls complicate the analysis as their side effects might invalidate certain available expressions, for instance changing the value of a field in the receiver object.

One way to implement available expression analysis is to first assume all calls make changes to their receivers and parameters, and rerun the analysis after knowing the actual effect (if any) of any called methods. Since the default Dataflow class works on each method individually, you will need to implement another pass that first runs dataflow analysis on each method, recording down those methods that need to be re-analyzed as they contain function calls in a worklist, along with methods that are called.

Another possibility is to use the Accrue interprocedural analysis framework that is built on top of Polyglot. See this tutorial for details.

What to Turn in

You should turn in an archive file of your hw1 directory. You don't need to submit any class files as we will rebuild your compiler using the top level ant script (make sure it builds before turning it in). You should not be using any external libraries beyond those that are included in the lib directory. Contact the staff if you have some libraries that you would like to use.

In addition, the archive should include a short (2 pages max) writeup that describes the following:

  • how you implemented available expressions analysis. For instance, what (semi-)lattice did you use, what are the transfer functions, etc.
  • how you implemented CSE: what passes did you write, how are they connected to each other, etc.
  • any changes that you made to the APIs, and justifications.
  • how long you spent on the assignment, and whether there was anything you found particularly difficult or confusing.

Please submit your code and writeup on Catalyst. You may submit as many times as you like; we will use the latest version you submit before the deadline. We will not grade any late submissions, however.

While this is an individual assignment, we encourage you to discuss with others and the staff if you encounter any issues. Feel free to use the discussion board for questions.

Submitting Bugs

As this is the first time that this assignment is given, it is likely that there are bugs or various confusing aspects. Please submit (friendly!) bug reports to the staff. When you do so, please include:
  • a description of the bug.
  • a minimal test case that reproduces the bug.
If you are the first person to report a particular bug in the sample code, we will give you a candy bar! (no kidding!)

Grading

50% of your grade will be based on whether or not your code passes the system test suite we will run over it. These tests will be a superset of the tests we have provided. Before handing in your code, you should make sure it produces no errors (passes all of the tests) when you submit. The other 50% of your grade will be based on the quality of your writeup and our subjective evaluation of your code. You will receive an additional 10% if you implement the extra credit correctly.