CSE 410
Computer Systems

Hal Perkins
Spring 2010
Lecture 8 – Machine Language
Reading

- Computer Organization and Design
  - Section 2.5, Representing Instructions
  - Section 2.10, MIPS Addressing
  - MIPS Green Card
Machine Language

- **Machine language**, the binary representation for instructions.
  - We’ll see how MIPS machine language is designed for the common case
    - Fixed-sized (32-bit) instructions
    - Only 3 instruction formats
    - Limited-sized immediate fields
Assembly vs. machine language

• So far we’ve been using assembly language.
  – We assign names to operations (e.g., add) and operands (e.g., $t0).
  – Branches and jumps use labels instead of actual addresses.
  – Assemblers support many pseudo-instructions.
• Programs must eventually be translated into machine language, a binary format that can be stored in memory and decoded by the processor.
• MIPS machine language is designed to be easy to decode.
  – Each MIPS instruction is the same length, 32 bits.
  – There are only three different instruction formats, which are very similar to each other.
• Studying MIPS machine language will also reveal some restrictions in the instruction set architecture, and how they can be overcome.
R-type format

- Register-to-register arithmetic instructions use the R-type format.

<table>
<thead>
<tr>
<th>op</th>
<th>rs</th>
<th>rt</th>
<th>rd</th>
<th>shamt</th>
<th>func</th>
</tr>
</thead>
<tbody>
<tr>
<td>6 bits</td>
<td>5 bits</td>
<td>5 bits</td>
<td>5 bits</td>
<td>5 bits</td>
<td>6 bits</td>
</tr>
</tbody>
</table>

- This format includes six different fields.
  - **op** is an operation code or opcode that selects a specific operation.
  - **rs** and **rt** are the first and second source registers.
  - **rd** is the destination register.
  - **shamt** is only used for shift instructions.
  - **func** is used together with **op** to select an arithmetic instruction.

- The green card in the textbook lists opcodes and function codes for all of the MIPS instructions.
### R-type Instruction Example

<table>
<thead>
<tr>
<th></th>
<th>op</th>
<th>rs</th>
<th>rt</th>
<th>rd</th>
<th>shamt</th>
<th>func</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>6 bits</td>
<td>5 bits</td>
<td>5 bits</td>
<td>5 bits</td>
<td>5 bits</td>
<td>6 bits</td>
</tr>
</tbody>
</table>

**add $s4, $t1, $t2**

```
000000 01001 01010 10100 00000 100000
```
About the registers

- We have to encode register names as 5-bit numbers from 00000 to 11111.
  - For example, $t8$ is register $24$, which is represented as 11000.
  - The complete mapping is given on page B-24 in the book.
- The number of registers available affects the instruction length.
  - Each R-type instruction references 3 registers, which requires a total of 15 bits in the instruction word.
  - We can’t add more registers without either making instructions longer than 32 bits, or shortening other fields like op and possibly reducing the number of available operations.

<table>
<thead>
<tr>
<th>op</th>
<th>rs</th>
<th>rt</th>
<th>rd</th>
<th>shamt</th>
<th>func</th>
</tr>
</thead>
<tbody>
<tr>
<td>6 bits</td>
<td>5 bits</td>
<td>5 bits</td>
<td>5 bits</td>
<td>5 bits</td>
<td>6 bits</td>
</tr>
</tbody>
</table>
I-type format

- Load, store, branch, and immediate instructions all use the I-type format.

<table>
<thead>
<tr>
<th>op</th>
<th>rs</th>
<th>rt</th>
<th>address</th>
</tr>
</thead>
<tbody>
<tr>
<td>6 bits</td>
<td>5 bits</td>
<td>5 bits</td>
<td>16 bits</td>
</tr>
</tbody>
</table>

- For uniformity, op, rs and rt are in the same positions as in the R-format.
- The meaning of the register fields depends on the exact instruction.
  - rs is a source register—an address for loads and stores, or an operand for branch and immediate arithmetic instructions.
  - rt is a source register for branches and stores, but a destination register for the other I-type instructions.
- The address is a 16-bit signed two’s-complement value.
  - It can range from -32,768 to +32,767.
  - But that’s not always enough!
### I-type Instruction Examples

<table>
<thead>
<tr>
<th>op</th>
<th>rs</th>
<th>rt</th>
<th>address</th>
</tr>
</thead>
<tbody>
<tr>
<td>6 bits</td>
<td>5 bits</td>
<td>5 bits</td>
<td>16 bits</td>
</tr>
</tbody>
</table>

lw $t0, –4($sp)  

```
100011 11101 01000 1111 1111 1111 1100
```

sw $a0, 16($sp)  

```
101011 11101 00100 0000 0000 0001 0000
```

addi $s4, $t1, -1  

```
001000 01001 10100 1111 1111 1111 1111
```
Larger constants

• Larger constants can be loaded into a register 16 bits at a time.
  – The load upper immediate instruction lui loads the highest 16 bits of a register with a constant, and clears the lowest 16 bits to 0s.
  – An immediate logical OR, ori, then sets the lower 16 bits.
• To load the 32-bit value 0000 0000 0011 1101 0000 1001 0000 0000:

```assembly
lui $s0, 0x003D  # $s0 = 003D 0000 (in hex)
ori $s0, $s0, 0x0900  # $s0 = 003D 0900
```

• This illustrates the principle of making the common case fast.
  – Most of the time, 16-bit constants are enough.
  – It’s still possible to load 32-bit constants, but at the cost of two instructions and one temporary register.
• Pseudo-instructions may contain large constants. Assemblers including SPIM will translate such instructions correctly.
Loads and stores

• The limited 16-bit constant can present problems for accesses to global data.

• Suppose we want to load from address 0x10010004, which won’t fit in the 16-bit address field. Solution:

```assembly
lui $at, 0x1001 # 0x1001 0000
lw $t1, 0x0004($at) # Read from Mem[0x1001 0004]
```
Sequential Execution

• Recall that the processor executes instructions as follows if there are no jumps or branches:

\[
\text{do } \{ \\
    \text{fetch instruction at Mem[PC];} \\
    PC = PC + 4; \quad // \text{advance to next instruction} \\
    \text{execute fetched instruction;} \\
\} \text{ while (processor not halted);} \\
\]
Branches

- For branch instructions, the constant field is not an address, but an offset in words from the current program counter (PC) to the target address.

```
beq  $at, $0, L
add  $v1, $v0, $0
add  $v1, $v1, $v1
j    Somewhere
L:   add  $v1, $v0, $v0
```

- Since the branch target L is three instructions past the beq, the address field would contain 3. The whole beq instruction would be stored as:

```
000100  00001  00000  0000 0000 0000 0011
```

  op    rs    rt    address

- For some reason SPIM is off by one, so the code it produces would contain an address of 4. (But SPIM branches still execute correctly.)
Larger branch constants

- Empirical studies of real programs show that most branches go to targets less than 32,767 instructions away—branches are mostly used in loops and conditionals, and programmers are taught to make code bodies short.

- If you do need to branch further, you can use a jump with a branch. For example, if “Far” is very far away, then the effect of:

  ```
  beq $s0, $s1, Far
  ...
  ```

can be simulated with the following actual code.

  ```
  bne $s0, $s1, Next
  j Far
  Next: ...
  ```

- Again, the MIPS designers have taken care of the common case first.
J-type format

• Finally, the jump instruction uses the J-type instruction format.

<table>
<thead>
<tr>
<th>op</th>
<th>address</th>
</tr>
</thead>
<tbody>
<tr>
<td>6 bits</td>
<td>26 bits</td>
</tr>
</tbody>
</table>

• The jump instruction contains a word address, not an offset
  – Remember that each MIPS instruction is one word long, and word addresses must be divisible by four.
  – So instead of saying “jump to address 4000,” it’s enough to just say “jump to instruction 1000.”
  – A 26-bit address field lets you jump to any address from 0 to $2^{28}$.
    • your programs had better be smaller than 256MB

• For even longer jumps, the jump register, or jr, instruction can be used.

```
jr $ra  # Jump to 32-bit address in register $ra
```
Summary of Machine Language

• Machine language is the binary representation of instructions:
  – The format in which the machine actually executes them
• MIPS machine language is designed to simplify processor implementation
  – Fixed length instructions

<table>
<thead>
<tr>
<th>R</th>
<th>opcode</th>
<th>rs</th>
<th>rt</th>
<th>rd</th>
<th>shamt</th>
<th>funct</th>
</tr>
</thead>
<tbody>
<tr>
<td>I</td>
<td>opcode</td>
<td>rs</td>
<td>rt</td>
<td></td>
<td></td>
<td>immediate</td>
</tr>
<tr>
<td>J</td>
<td>opcode</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>target address</td>
</tr>
</tbody>
</table>
Decoding Machine Language

How do we convert 1s and 0s to assembly language and to C (or Java or similar) code?

Machine language --> assembly → C?

For each 32 bits:
1. Look at opcode to distinguish between R- Format, J-Format, and I-Format
2. Use instruction format to determine which fields exist
3. Write out MIPS assembly code, converting each field to name, register number/name, or decimal/hex number
4. Logically convert this MIPS code into valid C-like code.
   Always possible? Unique?
Decoding (1/7)

- Here are six machine language instructions in hexadecimal:
  
  \[ \begin{align*}
  &\text{00001025}_{\text{hex}} \\
  &\text{0005402A}_{\text{hex}} \\
  &11000003_{\text{hex}} \\
  &\text{00441020}_{\text{hex}} \\
  &\text{20A5FFFF}_{\text{hex}} \\
  &\text{08100001}_{\text{hex}}
  \end{align*} \]

- Assume the first instruction is at address 4,194,304\(_{\text{ten}}\) (0x00400000\(_{\text{hex}}\))

- Next step: convert hex to binary
Decoding (2/7)

• The six machine language instructions in binary:
  0000000000000000001000000100101
  0000000000010101000000000101010
  001000100000000000000000011
  0000000001001000000100000100000
  00100001010010001011111111111111
  00001000001000000000000000000001

• Next step: identify opcode and format
Decoding (3/7)

- Select the opcode (first 6 bits) to determine the format:
  
  000000 00000 00000 00010 00000 100101
  000000 00000 00101 01000 00000 101010
  000100 01000 00000 00000 00000 000011
  000000 00100 00100 00010 00000 100000
  001000 00101 00101 11111 11111 111111
  000010 00000 10000 00000 00000 00001

- Look at opcode: 0 means R-Format, 2 or 3 mean J-Format, otherwise I-Format

- Next step: separation of fields R R I R I J Format:

<table>
<thead>
<tr>
<th></th>
<th>0</th>
<th>rs</th>
<th>rt</th>
<th>rd</th>
<th>shamt</th>
<th>funct</th>
</tr>
</thead>
<tbody>
<tr>
<td>R</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>I</td>
<td>1, 4-62</td>
<td>rs</td>
<td>rt</td>
<td></td>
<td>immediate</td>
<td></td>
</tr>
<tr>
<td>J</td>
<td>2 or 3</td>
<td>target address</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
Decoding (4/7)

• Fields separated based on format(opcode):

<table>
<thead>
<tr>
<th>Format:</th>
</tr>
</thead>
<tbody>
<tr>
<td>R</td>
</tr>
<tr>
<td>R</td>
</tr>
<tr>
<td>I</td>
</tr>
<tr>
<td>R</td>
</tr>
<tr>
<td>I</td>
</tr>
<tr>
<td>J</td>
</tr>
</tbody>
</table>

• Next step: translate (“disassemble”) MIPS assembly instructions R R I R I J Format:
Decoding (5/7)

• MIPS Assembly (Part 1):
• Address: Assembly instructions:

  0x00400000  or  $2,$0,$0
  0x00400004  slt  $8,$0,$5
  0x00400008  beq  $8,$0,3
  0x0040000c  add  $2,$2,$4
  0x00400010  addi $5,$5,-1
  0x00400014  j  0x100001

• Better solution: translate to more meaningful MIPS instructions (fix the branch/jump and add labels, registers)
Decoding (6/7)

- MIPS Assembly (Part 2):

  or  $v0,$0,$0

  Loop:  slt  $t0,$0,$a1
          beq  $t0,$0,Exit
          add  $v0,$v0,$a0
          addi  $a1,$a1,-1
          j  Loop

  Exit:

  - Next step: translate to C/Java code (must be creative!)
• Possible higher-level code:

```plaintext
$v0: var1
$a0: var2
$a1: var3
var1 = 0;
while (var3 >= 0) {
    var1 += var2;
    var3 -= 1;
}
```

```plaintext
or $v0,$0,$0
Loop: slt $t0,$0,$a1
beq $t0,$0,Exit
add $v0,$v0,$a0
addi $a1,$a1,-1
j Loop
Exit:
```

```plaintext
24
```