MicroJack Language Specification

The specification for the computer you've built in this course is called "Hack" (provided by the Nand2Tetris project), and it comes with a corresponding high-level langauge called "Jack" that strongly resembles Java. Jack supports all the same features, and the only major difference is syntax: Jack has many extra keywords that make the task of the Parser easier.

To make the compiler project manageable in our timeframe, your compiler will focus on just a few key language constructs from Jack. For this project, we've designed a language called "MicroJack" which is very close to being a smaller subset of the Jack language. In particular, MicroJack includes the following fundamental language features:

Here is an incomplete list of features common to high-level programming languages that MicroJack does NOT include:

Despite being so small, MicroJack's features are carefully chosen to make it possible to still write powerful, complex programs -- a surprising number of more convenient programming constructs can still be written using just the subset of features in MicroJack.

MicroJack "Specification"

In this assignment, we avoid giving a formal definition of MicroJack because (1) it is simple enough to be conveyed via descriptions, (2) formal systems for language specification are beyond the scope of this class, and (3) most of it is already handled for you in the Parser. What follows is an English definition of the MicroJack language, with a few examples. You can look at the provided test programs in project 7 for more examples of valid MicroJack.

MicroJack is weakly typed, in that it supports int and int[] types but does no typechecking of any kind (and will not, for example, prevent you from running off the end of an array). Instead of having true booleans, logical operations in MicroJack evaluate to the integer value 0 for false and the integer value 1 for true.

A MicroJack program consists of a "Variable Declarations" section followed by a "Statements" section. Each section may contain any number of declarations or statements respectively, including zero. In the "Variable Declarations" section, each variable declaration statement looks something like:

var int a, b, c[5], d;

Notably, every variable declaration statement must start with "var int" and end with a semicolon. It can have 1 or more variable names to declare, separated by commas. As seen in the above example, arrays can be created by placing brackets after a variable name and giving a literal number describing the length of that array (used to instantiate the array in memory, but never checked during an array access). Arrays and ints can be intermixed in a single variable declaration statement. The "Variable Declarations" section can contain many variable declaration statements like the one above, so it is up to the programmer to decide how to break up their declarations for readability.

After the "Variable Declarations" section is the "Statements" section, at which point it is no longer valid to put variabel declaration statements in the program. The statements in the "Statements" section can only be one of the following.

Assignment:

let a = b + 4;

Assignment statements start with "let" and end with a semicolon. Between them is a variable access (either an int like "a" or an int array like "a[24]"), a single equals sign (referred to by the token BECOMES), and then any expression.

If:

if (a == b) {
  let a = b + 4;
  let b = 2;
}

If statements start with "if" and must contain any expression between parentheses, followed by required curly brackets (unlike Java, for instance) that can have any number of statements inside of them (including more if statements). When run, an if statement evaluates its condition and executes the inner statements in order if the condition evaluates to a non-zero integer. Otherwise, it skips the inner statements.

While:

while (a != b) {
  let a = a + 1;
  let b = b - 1;
}

While loops are syntactically exactly the same as If statements, except that they start with "while". When run, a while loop first checks its condition, then executes the inner statements in order if the condition evaluates to a non-zero integer. After running the statements, it repeats the process.

There are also several possible expressions in the MicroJack language. An expression can be one of the following, where E stands for a place where another expression goes:

MicroJack supports comments, but only single-line ones starting with // that go until the end of the line.

Note on Variables

In MicroJack, all variables are stored at a corresponding memory location starting with index 256. That is, if the program has a single int variable a, the name a can be thought of as an alias for the memory address 256. If a different program has an int[] variable arrwith length 10 and an int variable b declared afterward, arr will refer to address 256 and b will refer to address 266, because 10 spaces needed to be reserved for the elements of the array.

Two special, global, static variables are available in every MicroJack program without needing to be declared: screen, an array corresponding to the SCREEN memory map on the Hack computer, and keyboard, an array corresponding to the KBD memory map on the Hack computer (though only a single element in size).