Students enrolled in CSE M 501 are expected to do additional work beyond the basic requirements for CSE 401. In most cases this will be done by adding additional significant features to the compiler project. This note describes some possibilities.
Project groups should decide what they would like to do, then
send a short email note to the instructor
(please send mail to cse401-staff@cs...
to help route it properly)
outlining proposed plans so we can
check that the scope is appropriate and reasonable.
Other ideas for additional work that
are not in this list are possible. Please discuss with the instructor.
At the end of the quarter, each group should include in their final report (one per group) a discussion of what was done for this extra work. This report should explain what was attempted, how the work was organized and implemented by members of the group, and evaluate the results. The evaluation can include code samples, benchmarks or other measurements, and so forth. This should not be too long -- a few pages should often be sufficient. The idea is to communicate the work done to the reader concisely but with enough detail to evaluate it, and not so much that the reader struggles to figure out what is important.
We have offered CSE M 501 many times in the past, but we are always open to new possible extensions, and we are still calibrating how well some of these suggestions work in practice. It is possible that some of these suggestions may be too extensive to be done in a reasonable timeframe and (maybe) some will actually turn out to be too simple. Please stay in touch with the instructor as you go, especially if you discover surprises or need to adjust plans in significant ways. We will be appropriately flexible in grading to take these issues into account.
Note that your compiler must still compile basic MiniJava programs correctly and the code should still pass any tests used during regular grading. The only exception is if your extension adds to MiniJava in a way that produces different results for the basic project tests. The main example of this that we've run into in the past is adding method overloading to the compiler. That will result in programs compiling successfully that would have been rejected by a basic MiniJava compiler. There may be other similar examples; if in doubt, please discuss the situation with the instructor.
Deadlines: The M501 project extensions and the report about them should be combined with the final (codegen) phase of the basic project and the basic project report, rather than being turned in separately. Deadlines are:
codegen-final
, due by 11:59 pm,
Saturday, December 7.No late submissions after these deadlines will be accepted for either the CSE M 501 final compiler code or the final report.
Method overloading. Support method overloading following the normal Java rules for ordinary methods and classes. This would include allowing methods with the same name to have different types and/or different numbers of parameters in the same class, and also allowing methods in subclasses to overload, rather than override, superclass methods if their parameter lists differ.
Implementation will mainly involve additions to compiler symbol
tables and the type checking rules. See the Java Language Specification
sec. 8.4.9 on
overloading
and sec. 15.12 on
method
calls for the full rules (not all of which apply to
MiniJava since we don't have interfaces or generic
types, and don't have all of the primitive types
included in full Java). Related reference sources will
also be useful, as will writing and running small test
programs using the standard javac
compiler
and java
VM.
You will also need to do some work when
generating code for methods and vtables. The biggest
complication here is that each overloaded method will
need its own unique assembly language label (name), and
vtables will have to contain properly organized pointers to
overloaded as well as simple methods. You might get
some useful ideas for naming conventions from sections
in the Java Virtual Machine specification. Sec. 4.3.3
describes
method descriptors
and sec. 4.3.2 covers
field
descriptors. Or you might find it easier to just
create method labels like ClassName$foo$1
and
ClassName$foo$2
and so forth.
In any case it will likely be helpful for debugging if you include comments
in your generated assembly language code identifying overloaded
methods precisely so you can be sure overloading is done correctly.
Code profiling/statement counting. Accumulate a count
of the number of times each statement in a program is
executed and print an annotated listing of the program
showing this information. You will need to decide how to
split the work between code generation, runtime
execution (including possible additions to boot.c
),
and post-processing after the program terminates. A
reasonable strategy for doing this is: (i) allocate an
array of 8-byte ints with one entry per counted
statement, (ii) zero out the array in the assembly code
or at the beginning of execution (this is really easy if
you use calloc
to allocate the array on the
heap or if you create and call a small C function in boot.c
to do the work),
(iii) generate code at the beginning of each
statement to increment the counter in the array associated with that
statement, (iv) at the end of execution dump to a file the
counters and information that allows you to correlate
counters in the array with source line numbers,
and (v) merge the counter information with the
original source file to produce the annotated
listing. Don't worry about edge cases like lines that
contain more than one statement.
Register allocation. Add a register allocator to utilize multiple registers in the generated code. It's not reasonable to try to implement a global graph-coloring register allocator in the time available, but it is possible to do much better in basic blocks than the single-register strategy outlined in class. The textbook outlines a couple of possible strategies in sec. 13.3; the bottom-up local allocator is probably simpler and probably stands a better chance of being successfully implemented for our project (in the third edition of the book, this is the single algorithm in sec. 13.3). Although the discussion in 13.3 assumes that the program has been transformed into some sort of lower-level IR, it is possible to do bottom-up local allocation on the fly as the compiler generates code. The key observation is to treat every label in the code as the beginning of a basic block, and every jump or call instruction as the end of one (meaning that the next instruction begins a basic block even if it is not labeled). The register allocator needs to initialize its state at the beginning of each sequence of straight-line code (basic block), and adjust the contents of the registers at the end of each block by generating code if needed to leave necessary values in predictable places or push them to memory.
Improved code generation / peephole optimizations. The simple
code generation strategy outlined in class generates many, many extra
push and pop instructions, loads values from memory even if they were
just stored and are still present in a register, stores method
arguments in memory and then reloads them right before each method call, has
potentially extraneous save operations for argument registers at the beginning
of each function body just to be
safe, restricts function parameter lists to have five or fewer arguments
beyond the this
parameter,
materializes Boolean results as 0 or 1 even when all that is
needed is a simpler compare and branch, makes no effort to keep track of
constant values and fold them at compile time, and so forth.
Groups that pick this option should propose at least three specific, non-trivial
improvements to be implemented for this part of the project
when they discuss their plans with the instructor.
If you implement a better register allocator (above), it should be
easy to eliminate even more redundant copy operations between
registers and between registers and memory.
Add support for floating-point arithmetic (doubles
).
A basic version of this has been assigned in some previous quarters as the
final part of the main compiler project (see the 19au version
here
for a fairly recent example).
Groups choosing to use this for the CSE M 501 project extensions this quarter should
do at least the required work outlined in that assignment and also must add
support for implicit int
to double
conversions
to support mixed-mode arithmetic expressions that include both int
and double
values, and assignment of int
values to variables of type double
.
That will require defining a new conversion operation in the AST to convert
int
values to double
, splicing the AST of a program
being compiled to add conversion operations where needed (probably as part
of the type-checking/semantics module in the compiler), and then generate correct
code for these conversion operations. You are free to add a more complete
set of numeric conversions, including explicit casts between numeric types,
but the minimum requirement is to implement doubles
as
outlined in the 19au assignment plus implicit conversion of int
to double
values where needed to support mixed-mode arithmetic
and assignment.
Other extensions are possible depending on the interests of your group. One that would be very exciting to explore, but needs to be specified carefully to be sure it is feasible to do in a few weeks, is code generation for the RISC V architecture. (The RISC V Reader has a very nice, concise description of this.) We would need to look carefully to be sure there's a reasonable way to run and evaluate code generated for this machine, but it would be a really interesting project. Your instructor has some ideas along these lines, and this has been done successfully by a few groups in the past - please discuss.
If you have other ideas or interests, please discuss them.