CSE M 501 18au Project Extensions

Students enrolled in CSE M 501 are expected to do additional work beyond the basic requirements for CSE 401. In most cases this should be done by adding additional significant features to the compiler project. This note describes some possibilities.

Project groups should decide what they would like to do, then schedule a short (20 min.) meeting with the instructor to discuss plans and be sure they are appropriate. Other ideas for additional work that are not in this list are possible if appropriate. Please discuss.

At the end of the quarter, each group should prepare a separate report (one per group) discussing what was done for this extra work. This report should explain what was attempted, how the work was divided, and evaluate the results. The evaluation can include code samples, benchmarks or other measurements, and so forth. The report should not be too long -- a few pages may be enough. The idea is to communicate the work done to the reader concisely but with enough detail to evaluate it, and not so much that the reader struggles to figure out what is important.

This is only the second quarter that CSE M 501 has been offered and we have limited experience with these project extensions. It is possible that some of them may be too extensive to be done in a reasonable timeframe and (maybe) some will actually turn out to be too simple. Please stay in touch with the instructor as you go if you discover surprises or need to adjust plans in significant ways. We will be appropriately flexible in grading to take these issues into account.

Note that your compiler must still compile basic MiniJava programs correctly and the code should still pass any tests used during regular grading.

The M501 project additions and an accompanying (relatively short) written report will be due at the end of the quarter as part of the final project and report. More details will be supplied later when those parts of the overall project are assigned.

Deadlines: The project extensions and the report should be combined with the final phase of the basic project and the basic project report, rather than being turned in separately. Deadlines are:

Final part of the basic compiler plus code for the M501 project extensions: pushed to your project group's gitlab repo and tagged additions-final, due by 11 pm, Saturday, December 8.
Combined report with basic project information plus description of the CSE M 501 extension implemented due by 11:59 pm, Sunday, December 9.

No late submissionswill be accepted for either the final compiler code or the final report.

Possible project extensions

Method overloading. Support method overloading following the normal Java rules for ordinary methods and classes. This would include allowing methods with the same name to have different numbers and/or different types of parameters in the same class, and also allowing methods in subclasses to overload, rather than override, superclass methods if their parameter lists differ.

Implementation will mainly involve additions to compiler symbol tables and the type checking rules. See the Java Language Specification sec. 8.4.9 on overloading and sec. 15.12 on method calls for the full rules (not all of which apply to MiniJava since we don't have interfaces or generic types, and don't have all of the primitive types included in full Java). Related reference sources will also be useful, as will writing and running small test programs using the standard javac compiler and java VM.

You will also need to do some work at runtime when generating code for methods and vtables. The biggest complication here is that each overloaded method will need its own unique assembly language label (name), and vtables will have to properly organize pointers to overloaded as well as simple methods. You might get some useful ideas for naming conventions from sections in the Java Virtual Machine specification. Sec. 4.3.3 describes method descriptors and sec. 4.3.2 covers field descriptors. Or you might find it easier to just create method labels like ClassName$foo$1, ClassName$foo$2, and so forth. In any case it will likely be helpful for debugging if you include comments in your generated assembly language code identifying overloaded methods precisely so you can be sure overloading is done correctly.
Code profiling/statement counting. Accumulate a count of the number of times each statement in a program is executed and print an annotated listing of the program showing this information. You will need to decide how to split the work between code generation, runtime execution (including possible additions to boot.c), and post-processing after the program terminates. A reasonable strategy for doing this is: (i) allocate an array of 8-byte ints with one entry per counted statement, (ii) zero out the array in the assembly code or at the beginning of execution, (iii) generate code at the beginning of each statement to increment the counter in the array for that statement, (iv) at the end of execution dump the counters and information that allows you to correlate counters in the array with source line numbers, and (v) merge the counter information with the original source file to produce that annotated listing. Don't worry about edge cases like lines that contain more than one statement.
Register allocation. Add a register allocator to utilize multiple registers in the generated code. It's not reasonable to try to implement a global graph-coloring register allocator in the time available, but it is possible to do much better in basic blocks than the single-register strategy outlined in sections. The textbook outlines a couple of possible strategies in sec. 13.3; the bottom-up local allocator is probably simpler and (we think) stands a better chance of being successfully implemented for our project. Although the discussion in 13.3 assumes that the program has been transformed into some sort of lower-level IR, it is possible to do bottom-up local allocation on the fly as the compiler generates code. The key observation is to treat every label in the code as the beginning of a basic block, and every jump or call instruction as the end of one (meaning that the next instruction begins a basic block even if it is not labeled). The register allocator needs to initialize its state at the beginning of each sequence of straight-line code (basic block), and adjust the contents of the registers at the end of each block by generating code if needed to leave necessary values in predictable places or push them to memory.
Improved code generation / peephole optimizations. The simple code generation strategy outlined in sections generates many, many extra push and pop instructions, loads values from memory even if they were just stored and are still present in a register, stores function arguments in memory and then reloads them right before the call, has potentially extraneous save operations for argument registers just to be safe, materializes Boolean results as 0 or 1 even when all that is needed is a simpler compare and branch, makes no effort to keep track of constant values and fold them at compile time, and so forth. Groups that pick this option should propose at least 3 specific, non-trivial improvements to be implemented for this part of the project when they discuss their plans with the instructor.