Project 7: Compilers & Debugging

Due on Tuesday, March 8th, 2022 at 11:59pm PST

Objectives

  • To apply the concepts of the Scanner, Parser, and Code Generator in a real compiler.
  • To go through the process of familiarizing yourself with a sizeable existing codebase.
  • To practice deliberate debugging strategies.

Part 0: Get the Starter Materials

To complete this assignment, you will need to use git to access the starter materials that have been pushed to your repository by the course staff. First, run git status and see if it reports any changes. If you have any, you'll need to commit them before proceeding (git add . followed by git commit -m "Your message here"). Then, run:

After running that command, you have synchronized the local copy of your repo with the copy on the server. Very rarely, you may get a "merge conflict" where git can't automatically figure out how to combine the changes we pushed to the server. If that happens, feel free to email the staff list to resolve the issue.

Part I: Buggy Compiler

In this part of the project, you will finish a buggy and incomplete implementation of a compiler that converts from MicroJack (a rough subset of the Jack programming language) to Hack assembly instructions. This project will complete our journey from basic Nand gates to a high-level language like Java. When your compiler is finished, you will be able to write high-level Jack code, compile it to Hack assembly, and execute that assembly on the computer you built! We hope you will find the experience pretty cool.

Implementing a compiler is a big task. As we saw in lecture, compilers are typically organized into a "front-end" that discovers the structure and meaning of the source code (the Scanner and Parser), a "middle-end" that processes that structure to verify properties or improve execution (the Type Checker and Optimizer), and a "back-end" that spits out equivalent code in the target language (the Code Generator). In this assignment, you are provided with a working Scanner and Parser, and you are given a substantial amount of starter code for the Code Generation step -- you will debug some parts of the Code Generation step and implement others, ultimately completing the compiler. Our goal with this division of work is to make the huge challenge of building a compiler manageable and less overwhelming, while strategically picking parts for you to implement that preserve the learning objectives of this assignment.

Getting Help Understanding Code: There is a significant amount of starter code in this assignment, and it is very normal to find a large piece of code difficult to understand on the first read. If at any point you'd like clarification about what part of the starter code is doing, we strongly encourage you to post on the course discussion board. Since everyone has the same starter code, you can post about specifics and get a second opinion from your peers without fear of posting any solutions. Reading the starter code is an important part of this assignment, but we don't want it to be a barrier to the implementation — so please post as much as you'd like about clarifications, and remember that if you're confused about a piece of code it's almost certain that someone else in the class is too!

Background: The MicroJack Programming Language

See the MicroJack Programming Language specification for a description of the source language that our compiler will read in. MicroJack is almost a subset of the Jack programming language provided by Nand2Tetris, and is very similar to Java.

Background: The MicroJack -> Hack ASM Compiler

In this assignment, you are implementing a MicroJack -> Hack ASM compiler, meaning a program that reads in a file containing MicroJack source code and spits out another file containing equivalent Hack ASM that, when executed, has the behavior specified in the MicroJack source. Note that the compiler is itself a program, and in this case one that is written in Java (and compiled with the Java compiler!).

This compiler has three parts: a Scanner, Parser, and Code Generation phase. The starting point of the program is in compiler/Compiler.java, which manages the execution of the other three components based on user input. The Scanner is implemented in compiler/JackScanner.java, which is given a file and scans through that file to generate tokens. The scanner class is intended to be used like an iterator, so it doesn't immediately scan through the whole file but instead produces each subsequent token on request when advance() is called.

The Parser is implemented in compiler/JackParser.java, which is given a JackScanner object and uses it to iterate through the token stream, detecting the program's structure as it goes and building up an Abstract Syntax Tree corresponding to the program. A substantial part of the compiler starter code is the AST nodes found in compiler/AST/. The following diagram shows the inheritance hierarchy among those AST nodes, where JackProgram is the root node, VarDeclarations represent different types of variable declarations (int vs. int[]), Statements are the pieces of code that "perform an action" and that make up the main body of the program, Expressions are the pieces of code that evaluate to a result, VarAccesses are special expressions that access a variable in memory, and Identifier is a simple wrapper class around a String name that is used throughout the other nodes.

The Code Generation phase is not a single piece of code at all, but is distributed among the AST Nodes -- each AST Node knows how to generate its own Hack ASM when the printASM method is called on it. Thus, when an AST Node has child AST nodes, it can simply recursively call printASM on them in the appropriate place to generate a chunk of assembly code wherever it makes sense among the assembly code being produced. The Code Generation phase of the compiler, therefore, simply consists of calling printASM on the root of the AST (JackProgram) and letting recursion take care of the rest!

In this assignment you will debug and implement printASM methods as listed below. As a reference of how this recursive process works, you can look at the printDebug methods already implemented in each AST Node -- note that the two methods have very different purposes, but both model the recursive calling of child nodes. Each printASM method's task is to generate assembly, which is best done using the convenience instr(), label(), and comment() functions implemented in the base ASTNode.java class -- take a look in there to see what's available to you as you write these printASM methods.

Your Task

Setting Up

We encourage you to use whatever setup you are comfortable with for editing Java code. You are free to edit using any editor (e.g. VSCode, Atom, Sublime Text, Vim, Emacs, etc.) and then compile the Java source code of the compiler manually, using the java and javac shell commands. You can also use an IDE (e.g. IntelliJ (recommended IDE), Eclipse) if you'd prefer.

Command-Line Instructions: First, navigate to the src directory, and from within that directory, run the following command to compile all Java source files:

javac $(find . -name "*.java")

Repeat that command whenever you make changes to the compiler source code. Then, when you want to execute the compiler, run:

java compiler/Compiler compile <PATH TO MICROJACK FILE>.jack

The Compiler main class (provided) accepts two command-line arguments. The first is the mode, where "compile" runs the entire compiler and generates corresponding .asm file, but "scan" and "parse" are also available and simply print out the output of those compiler phases for debugging purposes. The second argument is the path of a MicroJack file to run, which must end with the .jack extension.

IDE Instructions: To set things up, import the projects/7 directory into your IDE as a new project. The project structure with src and test should make it easier to get working with an IDE -- we've also provided an IntelliJ configuration file with the starter code to make it possible to import into the IntelliJ editor (recommended IDE). If you'd like to run the compiler from within your IDE, you will need to create a new "runtime configuration" that gives the command-line arguments listed above.

Steps

Because there's a lot of starter code to look at, we've organized your task in this assignment into 8 steps. We strongly recommend you go in this order.

0.Familiarize yourself with the starter code.Look through the provided implementation, especially the Scanner, Parser, and AST Nodes in src/compiler/AST/. The AST classes are where you'll do the bulk of the work in this assignment. Make sure to review the abstract classes that form the basis of the inheritance hierarchy so you know how the classes are organized.
1.Implement printASM in src/AST/NumberLiteral.javaThis is a simple warm-up to practice the weird-but-cool concept of writing Java code that generates Hack ASM code. The solution is very short and should simply place a literal number value in the right place when the ASM is executed. Hint: you can find the solution among the "Compiler Part II" lecture slides.
2.Debug printASM in src/AST/Plus.javaThis is a more complex (and recursive) expression that needs to evaluate both arguments, then produce a result that is the sum of their two results. The implementation is mostly complete but contains two bugs: you'll want to run the compiler to generate the buggy ASM code and then step through it to understand why it doesn't produce the correct result. Look at test/SimpleAddition.jack for a good starting point in your debugging process.
3.Implement printASM in src/AST/Minus.javaThis expression is very much like Plus, but it should subtract the right expression's result from the left expression's result.
4.Implement printASM in src/AST/NotEquals.javaThis expression is very much like Equals (which is completed for you), but should produce the inverse effect. First read through the printASM method in Equals, then demonstrate your understanding by making the necessary changes to invert the result.
5.Implement printASM and printASMToGetAddress in src/AST/ArrayVarAccess.javaThis node is very different from the others -- as a child of the VarAccess class, it has two ASM-generating methods. One produces the assembly corresponding to reading from a variable (printASM) as normal, and one produces the assembly corresponding to getting the address of a variable (printASMToGetAddress), which is only used by the Assignment node for the VarAccess that appears on the left side of the =. As you implement this node, look at IntVarAccess for a similar example. Note that for an array access like a[1+2], the expression inside the [] should be evaluated to get some resulting int, which is then added to the address of the array itself (here, a) to get the address of the slot within the array to read from or write to.
6.Debug printASM in src/AST/If.javaThe If node contains an expression as its condition and a list of statements to execute if that condition evaluates to true (the int 1 in our case). Debug the code to find the two bugs preventing it from matching that behavior.
7.Implement printASM in src/AST/While.javaThis is the most difficult part of the assignment, where you are asked to implement the ASM generation for a while loop with no starter code. Remember that a while loop should evaluate its condition before the first execution of its inner statements, and can repeat infinitely.

Step 0 should not be underestimated -- your job in the rest of the assignment will be much easier if you understand how the printASM is used, and why the compiler is built that way.

After reading through the code, you can search for "PROJECT 7 TODO" in the repo to find all of the changes you need to make to complete the assignment. We recommend going in order, because they are arranged by difficulty but also to guide you in becoming gradually more comfortable with different features of the compiler.

How to Run Tests

We've provided some sample jack programs in the test/ directory to help you test and debug your compiler implementation. You'll note that if we were designing a test suite for our compiler, ideally we'd write individual unit tests that attempt to test each AST node in isolation from others as much as possible. That would make debugging errors much easier!

Because the focus of this assignment is partly practicing your debugging skills, the test programs we've provided are not nicely isolated; some tests incorporate multiple AST node implementations, and you'll have to spend time debugging to figure out which implementation is the source of your bug. This simulates situations you'll run into in the real world, where your code might rely on multiple helper methods, you might be working on something that isn't easily unit testable, and/or you are working in a code base where there is a lack of good testing infrastructure!

For each test, there are three files: a .jack file containing the high level code and a .tst file and a .cmp file for testing your assembly output from the compiler. At the top of each .jack file there is a comment describing what the program should do. For the programs that use the screen, you can additionally check the "Screen" view to make sure things look correct. The general process for running one of the tests is:

  1. Navigate to the src/ directory (e.g. cd src/)
  2. Compile your java source code by running javac $(find . -name "*.java")
  3. Use your compiler to compile the jack file for the test you want to run using java compiler/Compiler compile ../test/TESTFILE.jack
  4. Load the corresponding .tst file using the CPUEmulator.
  5. Run the .tst file in the CPUEmulator. A lot of the files will take a super long time to run if you have animations turned on, so we recommend turning off animations (set the Animate option at the top of the emulator to "No animation"). You may want to turn animations back on when debugging, particularly the "Program Flow" option to step through the code.

You can also load the generated Hack assembly language code directly using the CPUEmulator, which lets you step through, inspect various regions of memory, and also interact with the program through the keyboard and screen.

To help you understand what each test uses (and to guide your debugging process) we've identified which AST nodes are used in which of the test programs. This is not intended to be an order related to how you should develop/test your programs. A rough overview of what each file uses:

You may find it helpful to modify some of the test programs when debugging. We do not recommend you edit the test files directly; instead we have provided a Sandbox test with empty .jack, .tst, and .asm. Feel free to copy over files from other tests into your Sandbox files and then modify/test/debug with those files.

Submitting

To submit your project, make sure you commit and push your changes, then tag them as project-7 and push that tag as well:

Then, verify that the tag and all your files are uploaded correctly by checking on gitlab.cs.washington.edu.

Project Skills and Passing Requirements

Below are the skills you need to demonstrate in order to pass this project, along with concrete tasks we expect you to complete to demonstrate each skill. Remember that in order to receive a passing grade for the class, you must receive a passing grade on each project. Only completing the bare minimum to pass a project will result in a low grade. See the syllabus for more details and let us know if you have any questions.