CSE M 501 23sp Project Extensions

Students enrolled in CSE M 501 are expected to do additional work beyond the basic requirements for CSE 401. In most cases this will be done by adding additional significant features to the compiler project. This note describes some possibilities.

Project groups should decide what they would like to do, then send a short email note to the instructor (please send mail to cse401-staff@cs... to help route it properly) outlining proposed plans so we can check that the scope is appropriate and reasonable. Other ideas for additional work that are not in this list are possible. Please discuss with the instructor.

At the end of the quarter, each group should include in their final report (one per group) a discussion of what was done for this extra work. This report should explain what was attempted, how the work was divided, and evaluate the results. The evaluation can include code samples, benchmarks or other measurements, and so forth. This should not be too long -- a few pages can often be enough. The idea is to communicate the work done to the reader concisely but with enough detail to evaluate it, and not so much that the reader struggles to figure out what is important.

We have offered CSE M 501 several times in the past, but we are always open to new possible extensions, and we are still calibrating how well some of these suggestions work in practice. It is possible that some of these suggestions may be too extensive to be done in a reasonable timeframe and (maybe) some will actually turn out to be too simple. Please stay in touch with the instructor as you go, especially if you discover surprises or need to adjust plans in significant ways. We will be appropriately flexible in grading to take these issues into account.

Note that your compiler must still compile basic MiniJava programs correctly and the code should still pass any tests used during regular grading. The only exception is if your extension adds to MiniJava in a way that produces different results for the basic project tests. The main example of this that we've run into in the past is groups that add method overloading. That will result in programs compiling successfully that would have been rejected by a basic MiniJava compiler. There may be other similar examples; if in doubt, please discuss the situation with the instructor.

The CSE M 501 project additions and an accompanying written report will be due at the end of the quarter as part of the final project and report. More details and specific deadlines will be supplied later when those parts of the overall project are assigned.

Deadlines: The M501 project extensions and the report about them should be combined with the final (codegen) phase of the basic project and the basic project report, rather than being turned in separately. Deadlines are:

Final part of the basic compiler plus code for the M501 project extensions: pushed to your project group's gitlab repo and tagged codegen-final, due by 11 pm, Saturday, June 3.
Combined report with basic project information plus description of your group's CSE M 501 extensions pushed to your repo and due by 11 pm, Sunday, June 4.

No late submissions after these deadlines will be accepted for either the CSE M 501 final compiler code or the final report.

Possible project extensions

Method overloading. Support method overloading following the normal Java rules for ordinary methods and classes. This would include allowing methods with the same name to have different types and/or different numbers of parameters in the same class, and also allowing methods in subclasses to overload, rather than override, superclass methods if their parameter lists differ.

Implementation will mainly involve additions to compiler symbol tables and the type checking rules. See the Java Language Specification sec. 8.4.9 on overloading and sec. 15.12 on method calls for the full rules (not all of which apply to MiniJava since we don't have interfaces or generic types, and don't have all of the primitive types included in full Java). Related reference sources will also be useful, as will writing and running small test programs using the standard javac compiler and java VM.

You will also need to do some work when generating code for methods and vtables. The biggest complication here is that each overloaded method will need its own unique assembly language label (name), and vtables will have to contain properly organized pointers to overloaded as well as simple methods. You might get some useful ideas for naming conventions from sections in the Java Virtual Machine specification. Sec. 4.3.3 describes method descriptors and sec. 4.3.2 covers field descriptors. Or you might find it easier to just create method labels like ClassName$foo$1 and ClassName$foo$2 and so forth. In any case it will likely be helpful for debugging if you include comments in your generated assembly language code identifying overloaded methods precisely so you can be sure overloading is done correctly.
Code profiling/statement counting. Accumulate a count of the number of times each statement in a program is executed and print an annotated listing of the program showing this information. You will need to decide how to split the work between code generation, runtime execution (including possible additions to boot.c), and post-processing after the program terminates. A reasonable strategy for doing this is: (i) allocate an array of 8-byte ints with one entry per counted statement, (ii) zero out the array in the assembly code or at the beginning of execution (this is really easy if you use calloc to allocate the array on the heap or if you call a small C function in boot.c to do the work), (iii) generate code at the beginning of each statement to increment the counter in the array associated with that statement, (iv) at the end of execution dump to a file the counters and information that allows you to correlate counters in the array with source line numbers, and (v) merge the counter information with the original source file to produce the annotated listing. Don't worry about edge cases like lines that contain more than one statement.
Register allocation. Add a register allocator to utilize multiple registers in the generated code. It's not reasonable to try to implement a global graph-coloring register allocator in the time available, but it is possible to do much better in basic blocks than the single-register strategy outlined in class. The textbook outlines a couple of possible strategies in sec. 13.3; the bottom-up local allocator is probably simpler and probably stands a better chance of being successfully implemented for our project. Although the discussion in 13.3 assumes that the program has been transformed into some sort of lower-level IR, it is possible to do bottom-up local allocation on the fly as the compiler generates code. The key observation is to treat every label in the code as the beginning of a basic block, and every jump or call instruction as the end of one (meaning that the next instruction begins a basic block even if it is not labeled). The register allocator needs to initialize its state at the beginning of each sequence of straight-line code (basic block), and adjust the contents of the registers at the end of each block by generating code if needed to leave necessary values in predictable places or push them to memory.
Improved code generation / peephole optimizations. The simple code generation strategy outlined in class generates many, many extra push and pop instructions, loads values from memory even if they were just stored and are still present in a register, stores method arguments in memory and then reloads them right before each method call, has potentially extraneous save operations for argument registers at the beginning of each function body just to be safe, restricts function parameter lists to have five or fewer arguments beyond the this parameter, materializes Boolean results as 0 or 1 even when all that is needed is a simpler compare and branch, makes no effort to keep track of constant values and fold them at compile time, and so forth. Groups that pick this option should propose at least three specific, non-trivial improvements to be implemented for this part of the project when they discuss their plans with the instructor. If you implement a better register allocator (above), it should be easy to eliminate even more redundant copy operations between registers and between registers and memory.
Add support for floating-point arithmetic (doubles). A basic version of this has been assigned in some previous quarters as the final part of the main compiler project (see the 19au version here for a recent example). Groups choosing to use this for the CSE M 501 project extensions this quarter should do at least the required work outlined in that assignment and also must add support for implicit int to double conversions to support mixed-mode arithmetic expressions that include both int and double values, and assignment of int values to variables of type double. That will require defining a new conversion operation in the AST to convert int values to double, splicing the AST of a program being compiled to add conversion operations where needed (probably as part of the type-checking/semantics module in the compiler), and then generate correct code for these conversion operations. You are free to add a more complete set of numeric conversions, including explicit casts between numeric types, but the minimum requirement is to implement doubles as outlined in the 19au assignment plus implicit conversion of int to double values where needed to support mixed-mode arithmetic and assignment.

Other extensions are possible depending on the interests of your group. One that would be very exciting to explore, but needs to be specified carefully to be sure it is feasible to do in a few weeks, is code generation for the RISC V architecture. (The RISC V Reader has a very nice, concise description of this.) We would need to look carefully to be sure there's a reasonable way to run and evaluate code generated for this machine, but it would be a really interesting project. Your instructor has some ideas along these lines, and this has been done successfully by a few groups in the past - please discuss.

If you have other ideas or interests, please discuss them.