CSE 390B, Spring 2022 **Building Academic Success Through Bottom-Up Computing** Assembler & Compilers Overview

Inside the Assembler, The Software Stack, Compilers Overview, Project 6 Tips

W UNIVERSITY of WASHINGTON

### **Lecture Outline**

#### Inside the Assembler

- Producing Machine Code
- Parsing, Symbols, Encoding
- The Software Stack
  - Roadmap of Hardware and Software Components
- Compilers Overview
  - Roadmap of Hardware and Software Components
- Hack CPU Logic Example: writeM
  - Project 6 CPU Logic Exercise

## **Producing Machine Code**







## The Assembler's Job



#### Look up each value in the corresponding table

| j1 (out < 0) | j2<br>( $out = 0$ ) | j3 (out > 0) | Mnemonic | Effect                 |
|--------------|---------------------|--------------|----------|------------------------|
| 0            | 0                   | 0            | null     | No jump                |
| 0            | 0                   | 1            | JGT      | If $out > 0$ jump      |
| 0            | 1                   | 0            | JEQ      | If $out = 0$ jump      |
| 0            | 1                   | 1            | JGE      | If $out \ge 0$ jump    |
| 1            | 0                   | 0            | JLT      | If <i>out</i> < 0 jump |
| 1            | 0                   | 1            | JNE      | If $out \neq 0$ jump   |
| 1            | 1                   | 0            | JLE      | If $out \le 0$ jump    |
| 1            | 1                   | 1            | JMP      | Jump                   |

| <b>d</b> 1 | d2 | d3 | Mnemonic | Destination (where to store the computed value) |
|------------|----|----|----------|-------------------------------------------------|
| 0          | 0  | 0  | null     | The value is not stored anywhere                |
| 0          | 0  | 1  | м        | Memory[A] (memory register addressed by A)      |
| 0          | 1  | 0  | D        | D register                                      |
| 0          | 1  | 1  | MD       | Memory[A] and D register                        |
| 1          | 0  | 0  | A        | A register                                      |
| 1          | 0  | 1  | AM       | A register and Memory[A]                        |
| 1          | 1  | 0  | AD       | A register and D register                       |
| 1          | 1  | 1  | AMD      | A register, Memory[A], and D register           |

| (when a=0)<br>comp mnemonic | c1 | c2 | c3 | c4 | c5 | C6 | (when a=1)<br>comp mnemonic |
|-----------------------------|----|----|----|----|----|----|-----------------------------|
| 0                           | 1  | 0  | 1  | 0  | 1  | 0  |                             |
| 1                           | 1  | 1  | 1  | 1  | 1  | 1  |                             |
| -1                          | 1  | 1  | 1  | 0  | 1  | 0  |                             |
| D                           | 0  | 0  | 1  | 1  | 0  | 0  |                             |
| A                           | 1  | 1  | 0  | 0  | 0  | 0  | м                           |
| !D                          | 0  | 0  | 1  | 1  | 0  | 1  |                             |
| !A                          | 1  | 1  | 0  | 0  | 0  | 1  | 1M                          |
| -D                          | 0  | 0  | 1  | 1  | 1  | 1  |                             |
| -A                          | 1  | 1  | 0  | 0  | 1  | 1  | -M                          |
| D+1                         | 0  | 1  | 1  | 1  | 1  | 1  |                             |
| A+1                         | 1  | 1  | 0  | 1  | 1  | 1  | M+1                         |
| D-1                         | 0  | 0  | 1  | 1  | 1  | 0  |                             |
| A-1                         | 1  | 1  | 0  | 0  | 1  | 0  | M-1                         |
| D+A                         | 0  | 0  | 0  | 0  | 1  | 0  | D+M                         |
| D-A                         | 0  | 1  | 0  | 0  | 1  | 1  | D-M                         |
| A-D                         | 0  | 0  | 0  | 1  | 1  | 1  | M-D                         |
| D&A                         | 0  | 0  | 0  | 0  | 0  | 0  | D&M                         |
| DA                          | 0  | 1  | 0  | 1  | 0  | 1  | DM                          |

### What Makes the Assembler's Job Difficult?

| Line # | ŧ           | Ac       | Address |                                         |  |
|--------|-------------|----------|---------|-----------------------------------------|--|
| 1      | @ <b>12</b> |          | 0       | 000000000001100                         |  |
| 2      | D=A         |          | 1       | 1110110000010000                        |  |
| 3      | 0i          |          | 2       | 000000000010000                         |  |
| 4      | M=D // init |          | 3       | 1110001100001000                        |  |
| 5      | (LOOP)      | Assemble |         |                                         |  |
| 6      | @R3         |          | 4       | 00000000000011                          |  |
| 7      | MD = M-1    |          | 5       | 1111110010011000                        |  |
| 8      | @LOOP       |          | 6       | 000000000000000000000000000000000000000 |  |
| 9      | D ; JGT     |          | 7       | 1110001100000001                        |  |

## **Difficulties for the Assembler**

#### Three broad concerns:

| Parsing  | Recognizing type of each instruction and label,<br>extracting relevant fields, skipping whitespace &<br>comments                                                             |
|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Symbols  | Mapping from labels to instruction addresses, mapping<br>from code symbols to RAM addresses, creating new<br>symbols, corresponding line numbers to instruction<br>addresses |
| Encoding | Converting relevant fields to binary values, converting symbol values to binary values                                                                                       |

## **Bells and Whistles... Why Bother?**

- Tradeoff: Adding convenience for programmer makes it harder to build the Assembler
  - E.g., removing symbols from Hack would make Assembler much simpler, still possible to write all the same programs
  - But language would be far more annoying to use

## Bells and Whistles... Why Bother?

- Tradeoff: Adding convenience for programmer makes it harder to build the Assembler
  - E.g., removing symbols from Hack would make Assembler much simpler, still possible to write all the same programs
  - But language would be far more annoying to use

#### Don't underestimate the importance of convenience

 Put another way: Adding these extra features makes programmers more <u>productive</u>

# Parsing

- Source code is just a giant string: we need to go character-by-character to understand that string
- Parser presents iterator-like interface:
  - To "advance" one instruction:
    - Move cursor forward, skipping whitespace and comments, until next non-empty line (ending on a newline)
  - To "read" current instruction:
    - Throw away whitespace & comments
    - Determine what type of instruction
    - Pull relevant fields out

- Keep symbol table, mapping symbols (strings) to their values (integers)
  - Initialize with built-in symbols

| SYMBOL | VALUE |
|--------|-------|
| R0     | 0     |
| R1     | 1     |
|        |       |
| R15    | 15    |
| SCREEN | 16384 |
| KBD    | 24576 |

- Keep symbol table, mapping symbols (strings) to their values (integers)
  - Initialize with built-in symbols
- Run through instructions, using this pseudocode:
  - If current line is (LABEL):
     Add LABEL → next line number to
     symbol table
  - If current line is @LABEL:

Lookup LABEL in symbol table, insert value into A instruction

| SYMBOL | VALUE |
|--------|-------|
| R0     | 0     |
| R1     | 1     |
|        |       |
| R15    | 15    |
| SCREEN | 16384 |
| KBD    | 24576 |

Problem: what if a label's use comes before its definition?



Problem: what if a label's use comes before its definition?

#### Solution: Two passes

- Pass 1: Populate symbol table by moving through file and ignoring anything that isn't a (LABEL) line
- Pass 2: Go through file again, ignoring (LABEL) lines, encoding C-instructions, and encoding A-instructions according to symbol table lookup



### **Lecture Outline**

#### Inside the Assembler

- Producing Machine Code
- Parsing, Symbols, Encoding

#### The Software Stack

Roadmap of Hardware and Software Components

#### Compilers Overview

Roadmap of Hardware and Software Components

#### Hack CPU Logic Example: writeM

Project 6 CPU Logic Exercise











### **Lecture Outline**

- Inside the Assembler
  - Producing Machine Code
  - Parsing, Symbols, Encoding
- The Software Stack
  - Roadmap of Hardware and Software Components
- Compilers Overview
  - Roadmap of Hardware and Software Components

#### Hack CPU Logic Example: writeM

Project 6 CPU Logic Exercise







## **The Compiler: Goal**

```
public int fact(int n) {
    if (n == 0) {
        return 1;
    } else {
        return n * fact(n - 1);
    }
}
High-Level Language
```

| (fact)           |                   |
|------------------|-------------------|
| @ <b>R0</b>      |                   |
| M=M+1            |                   |
| @ <b>R1</b>      |                   |
| D=A              |                   |
| <b>@ifbranch</b> |                   |
| D;JEQ            |                   |
|                  | Assembly Language |



# **The Compiler: Goal**

```
public int fact(int n) {
    if (n == 0) {
        return 1;
    } else {
        return n * fact(n - 1);
    }
}
High-Level Language
```

**Theory Definition:** a string, from the set of strings making up a language

| (fact)           |                   |
|------------------|-------------------|
| @ <b>R0</b>      |                   |
| M=M+1            |                   |
| @ <b>R1</b>      |                   |
| D=A              |                   |
| <b>@ifbranch</b> |                   |
| D;JEQ            |                   |
|                  | Assembly Language |



Compiler

### **The Compiler: Goal**

```
public int fact(int n) {
    if (n == 0) {
        return 1;
    } else {
        return n * fact(n - 1);
    }
}
High-Level Language
```

**Theory Definition:** a string, from the set of strings making up a language

**Practical Definition:** a file containing a bunch of characters

| (fact)      |                   |
|-------------|-------------------|
| @ <b>R0</b> |                   |
| M=M+1       |                   |
| @R1         |                   |
| D=A         |                   |
| @ifbranch   |                   |
| D;JEQ       |                   |
|             | Assembly Language |



# **The Compiler: Implementation**



### **Lecture Outline**

- Inside the Assembler
  - Producing Machine Code
  - Parsing, Symbols, Encoding
- The Software Stack
  - Roadmap of Hardware and Software Components
- Compilers Overview
  - Roadmap of Hardware and Software Components
- Hack CPU Logic Example: writeM
  - Project 6 CPU Logic Exercise

Example: Determine when writeM should be set to 1

Step 1: What do we pay attention to?

- writeM is related to whether we write to memory or not
- We need to look up the destination bits specification from Chapter 4

| d1 | d2 | d3 | Mnemonic | <b>Destination</b> (where to store the computed value) |
|----|----|----|----------|--------------------------------------------------------|
| 0  | 0  | 0  | null     | The value is not stored anywhere                       |
| 0  | 0  | 1  | м        | Memory[A] (memory register addressed by A)             |
| 0  | 1  | 0  | D        | D register                                             |
| 0  | 1  | 1  | MD       | Memory[A] and D register                               |
| 1  | 0  | 0  | А        | A register                                             |
| 1  | 0  | 1  | AM       | A register and Memory[A]                               |
| 1  | 1  | 0  | AD       | A register and D register                              |
| 1  | 1  | 1  | AMD      | A register, Memory[A], and D register                  |

Figure 4.4 The *dest* field of the *C*-instruction.

Example: Determine when writeM should be set to 1

Step 2: Determine logic for specification

- Read the "Destination Specification" section of Chapter 4
- Instruction bits:

1 1 1 a c1 c2 c3 c4 c5 c6 d1 d2 d3 j1 j2 j3

| d1 | d2 | <b>d</b> 3 | Mnemonic | Destination (where to store the computed value) |
|----|----|------------|----------|-------------------------------------------------|
| 0  | 0  | 0          | null     | The value is not stored anywhere                |
| 0  | 0  | 1          | м        | Memory[A] (memory register addressed by A)      |
| 0  | 1  | 0          | D        | D register                                      |
| 0  | 1  | 1          | MD       | Memory[A] and D register                        |
| 1  | 0  | 0          | А        | A register                                      |
| 1  | 0  | 1          | AM       | A register and Memory[A]                        |
| 1  | 1  | 0          | AD       | A register and D register                       |
| 1  | 1  | 1          | AMD      | A register, Memory[A], and D register           |

Figure 4.4 The *dest* field of the *C*-instruction.

Example: Determine when writeM should be set to 1

Step 2: Determine logic for specification

- Read the "Destination Specification" section of Chapter 4
- Instruction bits:

1 1 1 a c1 c2 c3 c4 c5 c6 d1 d2 d3 j1 j2 j3

| <b>d1</b> | d2 | d3 | Mnemonic | Destination (where to store the computed value) |
|-----------|----|----|----------|-------------------------------------------------|
| 0         | 0  | 0  | null     | The value is not stored anywhere                |
| 0         | 0  | 1  | М        | Memory[A] (memory register addressed by A)      |
| 0         | 1  | 0  | D        | D register                                      |
| 0         | 1  | 1  | MD       | Memory[A] and D register                        |
| 1         | 0  | 0  | А        | A register                                      |
| 1         | 0  | 1  | AM       | A register and Memory[A]                        |
| 1         | 1  | 0  | AD       | A register and D register                       |
| 1         | 1  | 1  | AMD      | A register, Memory[A], and D register           |

Figure 4.4 The *dest* field of the *C*-instruction.

Example: Determine when writeM should be set to 1

Step 2: Determine logic for specification

- Read the "Destination Specification" section of Chapter 4
- Instruction bits:

1 1 1 a c1 c2 c3 c4 c5 c6 d1 d2 d3 j1 j2 j3

- d3 determines if the output should be written to memory
- Which bit of our instruction is that?
- So writeM = instruction[3]?

| d | 1 | d2 | d3 | Mnemonic | Destination (where to store the computed value) |
|---|---|----|----|----------|-------------------------------------------------|
| ( | 0 | 0  | 0  | null     | The value is not stored anywhere                |
|   | 0 | 0  | 1  | м        | Memory[A] (memory register addressed by A)      |
| ( | 0 | 1  | 0  | D        | D register                                      |

Example: Determine when writeM should be set to 1

What's wrong with writeM = instruction[3]?

- What happens if we have an A-instruction?
- We only write to destinations in the case of a C-instruction
- So, writeM = C-instruction & instruction[3]
- Certain actions only occur on certain instruction types
- You may have to include a check for instruction type in your logic

| d1 | d2 | d3 | Mnemonic | Destination (where to store the computed value) |
|----|----|----|----------|-------------------------------------------------|
| 0  | 0  | 0  | null     | The value is not stored anywhere                |
| 0  | 0  | 1  | м        | Memory[A] (memory register addressed by A)      |
| 0  | 1  | 0  | D        | D register                                      |

# Hack CPU Implementation: Logic Sub Chips

- We provide three sub chips and tests that implement the control logic for the A Register, D Register, and PC
  - LoadAReg contains logic for loading the A Register
  - LoadDReg contains logic for loading the D Register
  - JumpLogic contains logic for determining if the PC should load, jump, or increment
- Implement and test these first, then use them in your CPU implementation
  - Intended to help you narrow the scope of bugs

### Lecture 13 Wrap-up

- Midterm will be graded with feedback by Thursday (5/12)
- Project 6: Mock Exam Problem & Building a Computer due this Thursday (5/12) at 11:59pm PDT
- Thursday's Lecture Reading: <u>Compilers Overview:</u> <u>Scanning and Parsing</u>
- Please submit the <u>mid-quarter feedback form</u> if you haven't already