Assembly Languages & Machine Code

You are encouraged to post any questions about the reading on the course discussion board!

On our path to building a computer, we need to take a quick detour from our hardware studies to learn a bit about assembly languages. Assembly is a type of programming language that is a human readable interface of the computer specification. We will learn a bit about the characteristics of assembly languages, which will set us up nicely to learn the specification of our computer.

Machine Code and Assembly Languages

You may have often heard the phrase “it's all 1s and 0s” when referring to what is happening in your computer. This is actually the case with the code you run as well! CPUs process code in steps known as instructions which are binary sequences of 1s and 0s that tell the CPU what to do.

Assembly language is a human readable format of those 1s and 0s. The important takeaway here is that every line of assembly code that you write translates roughly into one binary instruction that your CPU can execute. In other words, there is a one to one mapping of assembly language instructions to binary machine code instructions.

Take the following example assembly instruction, which adds two values together:

add 1, 3

This assembly instruction might correspond to the following binary instruction:

1000000100010011

That may look like a bunch of jibberish, but each of those parts of the binary instruction will correspond to a part of the human-readable assembly instruction. One possible mapping might be (the interpretation will depend on the hardware):

family operation input1 input2
10 000001 0001 0011

10 might indicate that this is an arithmetic operation, 000001 might correspond to the operation add, and 0001 and 0011 could be the binary values for 1 and 3, the operands!

It turns out that the original programmers only programmed in 1s and 0s. Sounds like a headache! That's why assembly languages were created - they provided a more human readable/friendly way of specifying the binary machine code instructions.

Producing Machine Code

When writing in high level languages like Java, you may have wondered what happens when you hit “compile” and how your computer understands the code you typed.

Most high-level languages are first compiled into an assembly language. The target assembly language will differ depending on what hardware the program is intended to run on. Compilation is actually a pretty complicated process that we will learn more about later in this course!

So how do we go from an assembly language to machine code? Assembly languages are translated to binary by an assembler. The assembler's job is much easier than a compiler's. Since each assembly language instruction has a corresponding machine code instruction, the process of assembling basically involves looking up the corresponding binary for the different parts of each assembly instruction.

Instruction Sets

Every hardware system has an instruction set which details what instructions can be computed by the CPU. These instruction sets vary depending on the type of hardware you are working with, which is often referred to as the architecture. There are many different architectures out there, examples include Intel's x86 architecture (prominent in many laptops/servers/computers), the ARM architecture (prominent in many mobile and “edge” devices), and the RISC-V architecture (a newer open-source architecture that is gaining steam).

In many ways, an architecture's instruction set can be viewed as a user interface for interacting with the hardware/CPU. That is, it specifies all of the different opertaions, operands (or inputs), and control logic that can be performed on a CPU. This is why we will learn about our computer's assembly language and instruction set before we build it - learning about the instruction set will inform us what logic we need to provide when implementing our CPU! We will dive a bit deeper into the components of instruction sets below.

Components of Instruction Sets

Machine Operations

Instruction sets usually have a set of operations that can be specified by an instruction. Examples are arithmetic operations (+, -), logical operations(And, Or), and flow control operations (see below for more details). Different hardware architectures and their assembly languages offer different operations, as well as different data types that you can operate on. For example, our Hack architecture won't support operations like multiplication and division, but Intel's x86 architecture does.

Registers

Registers are temporary storage locations that are in the CPU. Because they are located in the CPU (as opposed to memory which is located outside the CPU), registers are very fast to access, but we also don't have nearly as many registers as we do slots in memory. For example, our computer will have only 2 registers, but around 16,000 slots in memory!

Assembly programs try to make use of registers as much as possible because they are so efficient. Since we have so few registers, we won't be able to store as much temporary data in them, but you'll see that our assembly language still heavily relies on their use.

Addressing Modes

Addressing modes are the ways in which you can specify operands/inputs in an assembly language program. Usually assembly programs provide the following options: * Immediate or constant value: This would be like using the value 10 in a program, e.g. add 10, 15 * Registers: Can access a register by providing its name, e.g. add 10, reg2. * Direct memory access: Can access a hardcoded address in memory, e.g. add reg2, memory[10]. * Indirect memory access: Can access an address in memory based on a register contents, e.g. add reg2, memory[reg1].

Our Hack assembly language will provide all of this functionality, though it will look slightly different from the syntax you see above.

Flow Control

Your program and its machine code instructions are stored in memory. After executing an instruction, the default behavior is for the CPU to move to the next instruction in memory (the next with a higher address).

But often times in our programs we don't want to execute the next instruction - maybe we want to go back to the beginning of a loop, or skip the else branch after executing the if branch.

In order to implement complicated control logic, machine instructions provide a way to “jump” to a specific instruction instead of executing the next one.

Unconditional jumps

Take the following pseudocode:

while (true) {
    reg1++;
}

Notice how every time we get to the end of the loop, we will want to jump back to the top of the loop. In order to provide the ability to always perform a jump, assembly languages provide unconditional jumps. This means the program can specify where to jump, and it will always jump there. For example, this assembly program adds 1 to reg1, then jumps back to the TOP, where it will add 1 to reg1, and then jump again, and so on and so forth:

TOP:
    add 1, reg1
    jmp TOP

Conditional jumps

Take the following pseudocode:

if (reg1 < reg2) {
    reg1++;
}
reg2++

Notice how we want to execute the if branch only . In order to provide the ability to perform a jump based on a condition, assembly languages provide conditional jumps. Usually this involves an assembly program comparing two values, and then jumping depending on the result of the comparison. For example, this assembly program compares reg1 to reg2, and skips the if branch if reg1 is greater than or equal to reg2:

    cmp reg1, reg2
    jge SKIP
    add 1, reg1
SKIP:
    add 1, reg2

Notice how the condition in the assembly code is flipped from the condition in the pseudocode. This is because in the pseudocode we are specifying when we want to enter the if branch, but in the assembly code we are specifying when we want to skip the if branch (which is the opposite action and requires the opposite condition). Since jumps provide us an easy way to skip portions of code, often we view logic in terms of when we want to skip.

The Road Ahead

Now that we looked at assembly languages in general, we will take a deep dive into the Hack assembly language we will be using. Learning the ins and outs of this language will set us up to be able to implement the hardware that it interacts with!