Project 7: Compiler, Debugging, & Office Hours

Due 11:59pm Thursday, May 28

Objectives

To apply the concepts of the Scanner, Parser, and Code Generator in a real compiler.
To go through the process of familiarizing yourself with a sizeable existing codebase.
To practice deliberate debugging strategies.
To practice getting the most out of office hours and engaging with the course TAs as a resource.

Part 0: Get the Starter Materials

To complete this assignment, you will need to use git to access the starter materials that have been pushed to your repository by the course staff. First, run git status and see if it reports any changes. If you have any, you'll need to commit them before proceeding (git add . followed by git commit -m "Your message here"). Then, run:

git pull (download any commits on the Gitlab server to your local repository)

After running that command, you have synchronized the local copy of your repo with the copy on the server. Very rarely, you may get a "merge conflict" where git can't automatically figure out how to combine the changes we pushed to the server. If that happens, feel free to email the staff list to resolve the issue.

Part I: Buggy Compiler

In this part of the project, you will finish a buggy and incomplete implementation of a compiler that converts from MicroJack (a rough subset of the Jack programming language) to Hack assembly instructions. This project will complete our journey from basic Nand gates to a high-level language like Java. When your compiler is finished, you will be able to write high-level Jack code, compile it to Hack assembly, and execute that assembly on the computer you built! We hope you will find the experience pretty cool.

Implementing a compiler is a big task. As we saw in lecture, compilers are typically organized into a "front-end" that discovers the structure and meaning of the source code (the Scanner and Parser), a "middle-end" that processes that structure to verify properties or improve execution (the Type Checker and Optimizer), and a "back-end" that spits out equivalent code in the target language (the Code Generator). In this assignment, you are provided with a working Scanner and Parser, and you are given a substantial amount of starter code for the Code Generation step -- you will debug some parts of the Code Generation step and implement others, ultimately completing the compiler. Our goal with this division of work is to make the huge challenge of building a compiler manageable and less overwhelming, while strategically picking parts for you to implement that preserve the learning objectives of this assignment.

Getting Help Understanding Code: There is a significant amount of starter code in this assignment, and it is very normal to find a large piece of code difficult to understand on the first read. If at any point you'd like clarification about what part of the starter code is doing, we strongly encourage you to post on the course discussion board. Since everyone has the same starter code, you can post about specifics and get a second opinion from your peers without fear of posting any solutions. Reading the starter code is an important part of this assignment, but we don't want it to be a barrier to the implementation — so please post as much as you'd like about clarifications, and remember that if you're confused about a piece of code it's almost certain that someone else in the class is too!

Background: The MicroJack Programming Language

See the MicroJack Programming Language specification for a description of the source language that our compiler will read in. MicroJack is almost a subset of the Jack programming language provided by Nand2Tetris, and is very similar to Java.

Background: The MicroJack -> Hack ASM Compiler

In this assignment, you are implementing a MicroJack -> Hack ASM compiler, meaning a program that reads in a file containing MicroJack source code and spits out another file containing equivalent Hack ASM that, when executed, has the behavior specified in the MicroJack source. Note that the compiler is itself a program, and in this case one that is written in Java (and compiled with the Java compiler!).

This compiler has three parts: a Scanner, Parser, and Code Generation phase. The starting point of the program is in compiler/Compiler.java, which manages the execution of the other three components based on user input. The Scanner is implemented in compiler/JackScanner.java, which is given a file and scans through that file to generate tokens. The scanner class is intended to be used like an iterator, so it doesn't immediately scan through the whole file but instead produces each subsequent token on request when advance() is called.

The Parser is implemented in compiler/JackParser.java, which is given a JackScanner object and uses it to iterate through the token stream, detecting the program's structure as it goes and building up an Abstract Syntax Tree corresponding to the program. A substantial part of the compiler starter code is the AST nodes found in compiler/AST/. The following diagram shows the inheritance hierarchy among those AST nodes, where JackProgram is the root node, VarDeclarations represent different types of variable declarations (int vs. int[]), Statements are the pieces of code that "perform an action" and that make up the main body of the program, Expressions are the pieces of code that evaluate to a result, VarAccesses are special expressions that access a variable in memory, and Identifier is a simple wrapper class around a String name that is used throughout the other nodes.

The Code Generation phase is not a single piece of code at all, but is distributed among the AST Nodes -- each AST Node knows how to generate its own Hack ASM when the printASM method is called on it. Thus, when an AST Node has child AST nodes, it can simply recursively call printASM on them in the appropriate place to generate a chunk of assembly code wherever it makes sense among the assembly code being produced. The Code Generation phase of the compiler, therefore, simply consists of calling printASM on the root of the AST (JackProgram) and letting recursion take care of the rest!

In this assignment you will debug and implement printASM methods as listed below. As a reference of how this recursive process works, you can look at the printDebug methods already implemented in each AST Node -- note that the two methods have very different purposes, but both model the recursive calling of child nodes. Each printASM method's task is to generate assembly, which is best done using the convenience instr(), label(), and comment() functions implemented in the base ASTNode.java class -- take a look in there to see what's available to you as you write these printASM methods.

Your Task

Setting Up

We encourage you to use whatever setup you are comfortable with for editing Java code. You are free to edit using any editor (e.g. VSCode, Atom, Sublime Text, Vim, Emacs, etc.) and then compile the Java source code of the compiler manually, using the java and javac shell commands. You can also use an IDE (e.g. IntelliJ (recommended), Eclipse) if you'd prefer.

Command-Line Instructions: First, navigate to the src directory, and from within that directory, run the following command to compile all Java source files:

javac compiler/**.java

Repeat that command whenever you make changes to the compiler source code. Then, when you want to execute the compiler, run:

java compiler/Compiler compile <PATH TO MICROJACK FILE>.jack

The Compiler main class (provided) accepts two command-line arguments. The first is the mode, where "compile" runs the entire compiler and generates corresponding .asm file, but "scan" and "parse" are also available and simply print out the output of those compiler phases for debugging purposes. The second argument is the path of a MicroJack file to run, which must end with the .jack extension.

IDE Instructions: To set things up, import the projects/7 directory into your IDE as a new project. The project structure with src and test should make it easier to get working with an IDE -- we've also provided an IntelliJ configuration file with the starter code to make it possible to import into the IntelliJ editor (recommended). If you'd like to run the compiler from within your IDE, you will need to create a new "runtime configuration" that gives the command-line arguments listed above.

Steps

Because there's a lot of starter code to look at, we've organized your task in this assignment into 8 steps. We strongly recommend you go in this order.

0.	Familiarize yourself with the starter code.	Look through the provided implementation, especially the Scanner, Parser, and AST Nodes in `src/compiler/AST/`. The AST classes are where you'll do the bulk of the work in this assignment. Make sure to review the abstract classes that form the basis of the inheritance hierarchy so you know how the classes are organized.
1.	Implement `printASM` in `src/AST/NumberLiteral.java`	This is a simple warm-up to practice the weird-but-cool concept of writing Java code that generates Hack ASM code. The solution is very short and should simply place a literal number value in the right place when the ASM is executed. Hint: you can find the solution among Thursday's lecture slides.
2.	Debug `printASM` in `src/AST/Plus.java`	This is a more complex (and recursive) expression that needs to evaluate both arguments, then produce a result that is the sum of their two results. The implementation is mostly complete but contains two bugs: you'll want to run the compiler to generate the buggy ASM code and then step through it to understand why it doesn't produce the correct result. Look at `test/1_SimpleAddition.jack` for a good starting point in your debugging process.
3.	Implement `printASM` in `src/AST/Minus.java`	This expression is very much like `Plus`, but it should subtract the right expression's result from the left expression's result.
4.	Implement `printASM` in `src/AST/NotEquals.java`	This expression is very much like `Equals` (which is completed for you), but should produce the inverse effect. First read through the `printASM` method in `Equals`, then demonstrate your understanding by making the necessary changes to invert the result.
5.	Implement `printASM` and `printASMToGetAddress` in `src/AST/ArrayVarAccess.java`	This node is very different from the others -- as a child of the `VarAccess` class, it has two ASM-generating methods. One produces the assembly corresponding to reading from a variable (`printASM`) as normal, and one produces the assembly corresponding to getting the address of a variable (`printASMToGetAddress`), which is only used by the `Assignment` node for the `VarAccess` that appears on the left side of the =. As you implement this node, look at `IntVarAccess` for a similar example. Note that for an array access like `a[1+2]`, the expression inside the [] should be evaluated to get some resulting int, which is then added to the address of the array itself (here, `a`) to get the address of the slot within the array to read from or write to.
6.	Debug `printASM` in `src/AST/If.java`	The If node contains an expression as its condition and a list of statements to execute if that condition evaluates to true (the int 1 in our case). Debug the code to find the two bugs preventing it from matching that behavior.
7.	Implement `printASM` in `src/AST/While.java`	This is the most difficult part of the assignment, where you are asked to implement the ASM generation for a while loop with no starter code. Remember that a while loop should evaluate its condition before the first execution of its inner statements, and can repeat infinitely.

Step 0 should not be underestimated -- your job in the rest of the assignment will be much easier if you understand how the printASM is used, and why the compiler is built that way.

After reading through the code, you can search for "PROJECT 7 TODO" in the repo to find all of the changes you need to make to complete the assignment. We recommend going in order, because they are arranged by difficulty but also to guide you in becoming gradually more comfortable with different features of the compiler.

How to Run Tests

Running a test consists of using the compiler to compile some MicroJack code to Hack assembly language code, and then running that Hack assembly language code using the CPUEmulator to ensure that it operates correctly. Before doing so, make sure you've compiled the latest version of your Java source code, so your compiler itself is up to date.

Then, run the following command from the src directory to operate the compiler on a given MicroJack source file (which will create a Hack assembly file next to it):

java compiler/Compiler compile ../test/1_SimpleAddition.jack

Note that while "compile" is the ultimate goal, you can replace that directive in the first argument slot with "scan" to print out the token stream or "parse" to print out a representation of the AST. These tools may be helpful to you as you complete the assignment.

The tests are ordered by number in the test directory, and their order roughly approximates how difficult they are to get working (with 0 being already correct in the starter code and 1 being the easiest after that). The final test (9) shows a small game written in MicroJack, complete with user input! Your entire program will likely have to be correct for that program to run successfully. Note that these test numbers DO NOT correspond to the step numbers above!

You can load the generated Hack assembly language code using the CPUEmulator, which lets you step through, inspect various regions of memory, and also interact with the program through the keyboard and screen.

Part II: TA 1:1 Meetings

As part of the assignment, you will have a 30-minute 1:1 with one of the TAs to discuss strategies for working through the Buggy Compiler assignment. Being able to describe a bug verbally is a fundamental practice in working through difficult bugs. Check your email for a doodle link that will let you sign up for your 1:1 meeting.

Part III: Project 7 Reflection

Upon completing the Buggy Compiler assignment, take time to refflect on what the experience was like for you. Please address the following questions:

Describe a bug that you identified in your code. How did you engage in the iterative scientific process in debugging your code?
- What was your hypothesis?
- What experiment did you design?
- What happened when you performed the experiment?
- How did the results inform your next steps?
When you reached a point of frustration in debugging your code, what did you do?
- What aspects of your approach helped you work through the problem?
- What aspects of your approach hindered your ability to work through the problem?

Write your answers in the document called project_7_reflection.docx located in the projects/7 directory. Edit the document "in-place" and submit by pushing to Gitlab.

Submitting

To submit your project, make sure you commit and push your changes, then tag them as project-7 and push that tag as well:

git add . (stage all your changes)
git commit -m "Finish project 7" (bundle those changes as a commit; the message is up to you!)
git push (push that commit to the Gitlab server)
git tag project-7 (add the project-7 tag to that last commit)
git push --tags (push the tag to the Gitlab server)

Then, verify that the tag and all your files are uploaded correctly by checking on gitlab.cs.washington.edu.