Students enrolled in CSE M 501 are expected to do additional work beyond the basic requirements for CSE 401. In most cases this will be done by adding additional significant features to the compiler project. This note describes some possibilities.
Project groups should decide what they would like to do, then send a short email to the instructor outlining proposed plans so we can check that the scope is appropriate and reasonable. Other ideas for additional work that are not in this list are possible. Please discuss with the instructor.
At the end of the quarter, each group should prepare a separate report (one per group) discussing what was done for this extra work. This report should explain what was attempted, how the work was divided, and evaluate the results. The evaluation can include code samples, benchmarks or other measurements, and so forth. The report should not be too long -- a few pages can often be enough. The idea is to communicate the work done to the reader concisely but with enough detail to evaluate it, and not so much that the reader struggles to figure out what is important.
We have offered CSE M 501 a few times in the past, but we are still gaining experience with possible project extensions. It is possible that some of these suggestions may be too extensive to be done in a reasonable timeframe and (maybe) some will actually turn out to be too simple. Please stay in touch with the instructor as you go, especially if you discover surprises or need to adjust plans in significant ways. We will be appropriately flexible in grading to take these issues into account.
Note that your compiler must still compile basic MiniJava programs correctly and the code should still pass any tests used during regular grading. The only exception is if your extension adds to MiniJava in a way that produces different results for the basic project tests. The main example of this that we've run into in the past is groups that add method overloading. That will result in programs compiling successfully that would have been rejected by a basic MiniJava compiler. There may be other similar examples; if in doubt, please discuss the situation with the instructor.
The CSE M 501 project additions and an accompanying (relatively short) written report will be due at the end of the quarter as part of the final project and report.
Deadlines: The M501 project extensions and the report about them should be combined with the final (codegen) phase of the basic project and the basic project report, rather than being turned in separately. Deadlines are:
codegen-final
, due by 11 pm,
Saturday, June 4.
Method overloading. Support method overloading following the normal Java rules for ordinary methods and classes. This would include allowing methods with the same name to have different types and/or different numbers of parameters in the same class, and also allowing methods in subclasses to overload, rather than override, superclass methods if their parameter lists differ.
Implementation will mainly involve additions to compiler symbol
tables and the type checking rules. See the Java Language Specification
sec. 8.4.9 on
overloading
and sec. 15.12 on
method
calls for the full rules (not all of which apply to
MiniJava since we don't have interfaces or generic
types, and don't have all of the primitive types
included in full Java). Related reference sources will
also be useful, as will writing and running small test
programs using the standard javac
compiler
and java
VM.
You will also need to do some work when
generating code for methods and vtables. The biggest
complication here is that each overloaded method will
need its own unique assembly language label (name), and
vtables will have to contain properly organized pointers to
overloaded as well as simple methods. You might get
some useful ideas for naming conventions from sections
in the Java Virtual Machine specification. Sec. 4.3.3
describes
method descriptors
and sec. 4.3.2 covers
field
descriptors. Or you might find it easier to just
create method labels like ClassName$foo$1
and
ClassName$foo$2
and so forth.
In any case it will likely be helpful for debugging if you include comments
in your generated assembly language code identifying overloaded
methods precisely so you can be sure overloading is done correctly.
Code profiling/statement counting. Accumulate a count
of the number of times each statement in a program is
executed and print an annotated listing of the program
showing this information. You will need to decide how to
split the work between code generation, runtime
execution (including possible additions to boot.c
),
and post-processing after the program terminates. A
reasonable strategy for doing this is: (i) allocate an
array of 8-byte ints with one entry per counted
statement, (ii) zero out the array in the assembly code
or at the beginning of execution (this is really easy if
you use calloc
to allocate the array on the
heap or if you call a small C function in boot.c
to do the work),
(iii) generate code at the beginning of each
statement to increment the counter in the array associated with that
statement, (iv) at the end of execution dump to a file the
counters and information that allows you to correlate
counters in the array with source line numbers,
and (v) merge the counter information with the
original source file to produce the annotated
listing. Don't worry about edge cases like lines that
contain more than one statement.
Register allocation. Add a register allocator to utilize multiple registers in the generated code. It's not reasonable to try to implement a global graph-coloring register allocator in the time available, but it is possible to do much better in basic blocks than the single-register strategy outlined in class. The textbook outlines a couple of possible strategies in sec. 13.3; the bottom-up local allocator is probably simpler and probably stands a better chance of being successfully implemented for our project. Although the discussion in 13.3 assumes that the program has been transformed into some sort of lower-level IR, it is possible to do bottom-up local allocation on the fly as the compiler generates code. The key observation is to treat every label in the code as the beginning of a basic block, and every jump or call instruction as the end of one (meaning that the next instruction begins a basic block even if it is not labeled). The register allocator needs to initialize its state at the beginning of each sequence of straight-line code (basic block), and adjust the contents of the registers at the end of each block by generating code if needed to leave necessary values in predictable places or push them to memory.
Improved code generation / peephole optimizations. The simple
code generation strategy outlined in class generates many, many extra
push and pop instructions, loads values from memory even if they were
just stored and are still present in a register, stores method
arguments in memory and then reloads them right before each method call, has
potentially extraneous save operations for argument registers at the beginning
of each function body just to be
safe, restricts function parameter lists to have five or fewer arguments
beyond the this
parameter,
materializes Boolean results as 0 or 1 even when all that is
needed is a simpler compare and branch, makes no effort to keep track of
constant values and fold them at compile time, and so forth.
Groups that pick this option should propose at least three specific, non-trivial
improvements to be implemented for this part of the project
when they discuss their plans with the instructor.
If you implement a better register allocator (above), it should be
easy to eliminate even more redundant copy operations between
registers and between registers and memory.
If you have other ideas or interests, please discuss them with the instructor.