CSE 501: Implementation of Programming Languages

Course Project Information



We will be using the Soot framework, developed at McGill University.  Soot is a compiler analysis framework takes Java bytecode as input and produces optimized bytecode.  An advantage of Soot is that it converts bytecode to a nice, three-address intermediate form in a control flow graph, which makes optimization much more convenient than using bytecode directly.

We modified the Soot distribution slightly to get it to work here at UW.  It has been tested under Linux and Solaris, but it seems to compile faster under Linux.  Since Soot requires the JDK 1.2, under Solaris you have to use the /uns javac--just include /uns/lib/jdk1.2.2/bin/ in your path before /usr/bin.  To install Soot:

  1. Download the file soot-uw.tar.gz
  2. Run gunzip soot-uw.tar.gz
  3. Run tar -xvf soot-uw.tar

  4. This will create a directory soot-1.2.0
  5. Set your CLASSPATH variable to include the soot/classes and jasmin/classes subdirectories of soot-1.2.0, as well as the current directory and the Java runtime libraries at /cse/courses/cse501/01wi/rt.  For example, if $soot is the path to your soot-1.2.0 directory and you use some kind of C shell, type

  6. setenv CLASSPATH .:$soot/soot/classes:$soot/jasmin/classes:/cse/courses/cse501/01wi/rt
    For Jonathan $soot might be /homes/gws/jonal/501/soot-1.2.0
  7. To test your installation, type java soot.Main.  You should see a list of options; otherwise, check to make sure your CLASSPATH includes all the right directories.  If you want to experiment more, check out some of the online tutorials.

Important!

Soot already performs a number of common compiler analyses and transformations, written on top of Jimple, a nice but simple intermediate representation.  For the project, you will design a more advanced intermediate representation and analysis framework.  Therefore, you shouldn't use any of the analyses that already exist in Soot (or even base your design on them).  Instead, you should write the analyses appropriately for your own representation and framework; ideally, they will work better or faster than the original Soot versions because of the framework improvements you have made.

Some specific guidelines:

Soot links:

General Project Information

See the handout from the first day of class for general information about the project, including the list of optimizations your compiler should handle.

First part of the project (due Feb. 5th)

In the first part of the project, you will create your own intermediate representation and provide transformations from Jimple to this intermediate representation and back. You can reuse the control flow graph representation and the three address code representation from Jimple. Thus, what you will be doing is extending Jimple to create your own IR.

 Your IR should include some sort of factored def-use graph, as well as some representation for control dependences sufficient for reconstructing the CFG. The CFG itself is acceptable as the control dependence graph representation, although something like the CDG part of the PDG (program dependence graph) or use of the VDG (value dependence graph) would be cooler. Here are some examples of factored def-use graphs that you can choose from:

You should turn in a hardcopy source code listings for the classes you add or change (with changes marked with a highlighter pen) and the first part of your project write-up (2-3 pages). This part should explain your IR design, why you designed it the way you did and any issues that came up in your design, and how it fits into the existing Soot framework.

Second part of the project (due Feb. 21st)

In the second part of the project, you will create a dataflow analysis framework that works over an arbitrary directed graph, and you will use it for some simple optimizations. You should define an interface for graphs, an lattice-theoretic interface for information at program points (edges in the graph, or at least srcs of edges in the graph), and an interface for the flow function of nodes of the graph. Using this interface, you will implement an engine that computes the best fixed-point solution to the analysis.

For this part of the project, you will also implement two optimizations using this engine: constant propagation and folding, and dead assignment elimination. Both should be implemented using your factored dataflow graph from part 1 of the project. (You will also need to be able, in the 3rd part, to write analyses that operate over the control flow graph, so the interface to your engine should be general enough to accomodate both kinds of analyses.)

As before, you should turn in a hardcopy source code listings for the classes you add or change from the previous version (with changes marked with a highlighter pen) and the second part of your project write-up (2-3 additional pages). This part should explain your analysis engine design, why you designed it the way you did and any issues that came up in your design, and how your client optimizations are implemented. You should also work up some test case examples, and show how your optimizations work on your test cases.

Third part of the project (implementation due March 5th, write-up due March 9)

Implement all other optimizations that are required for the project: On Monday, March 5, you should turn in a hardcopy source code listings for the classes you added or changed from the previous versions (with changes marked with a highlighter pen). On Friday, March 9, you should turn in your final project write-up (~10 pages). Your write-up should show how your compiler applies its analyses and transformations to illustrative test cases. Reflect on your compiler development experience, discussing the choices you made that were good ones, and the ones that turned out to be bad ones.

You should arrange some time during the final week of the quarter to demonstrate your compiler in action to either the instructor or one of the TAs.


chambers@cs.washington.edu