## x86-64 Programming I

#### CSE 351 Summer 2024

**Instructor:** Ellis Haker

#### **Teaching Assistants:**

Naama Amiel Micah Chang Shananda Dokka Nikolas McNamee Jiawei Huang



### Administrivia

- Due today
  - HW5 (11:59pm)
  - Lab 1a (11:59pm)
- Due Friday, 7/5
  - RD8 (1pm)
  - HW6 (11:59pm)
- Quiz 1 released Friday at 11:59pm
- Optional reading posted on Ed
  - Recent article about designing assembly language design
  - Beyond the scope of this course, but very cool!

#### **Review Questions**

Assume that the register %rdx holds the value **0x 01 02 03 04 05 06 07 08** 

Answer the following questions about the instruction **subq \$1**, %rdx

- 1. Operation type: withmetic because subtraction
- 2. Operand types: immediate (\$), register (./.)
- 3. Operating width: 8B because of "g"
- 4. (extra) Result stored in %rdx:

·1.12+=:/. 11x-1= 0x 010203 040506 0707

## **Layers of Computing Revisited**

- So far, we've focused on hardware
  - How does the CPU store and read data from memory?
- Shifting focus to languages & libraries
  - How are programs created and executed on the CPU?
  - Take CSE 401 to learn more
- Still needs hardware support!
  - Take CSE 469 to learn more



## **Programming Languages & Libraries: 351 View**

- Topics:
  - x86-64 assembly
  - $\circ$  Procedures
  - Stacks
  - Executables
- How does your source code become something that your computer understands?
- How does the CPU organize and manipulate local data?



#### **Lecture Topics**

#### • Assembly intro

- Instruction set philosophies
- X86-64 programming
  - $\circ$  Data types
  - $\circ$  Instructions
  - $\circ$  Registers
  - Memory addressing

#### Definitions

- Instruction Set Architecture (ISA): the parts of a processor design that one needs to understand to write assembly code
  - What is directly visible to software
  - The "contract" between hardware and software
  - $\circ$  351 focus

- Microarchitecture: hardware implementation of the ISA
  - CSE/EE 469



## What is a Register? (Review)

- Special locations on the CPU that store a small amount of data
  - Accessed very quickly (once per clock cycle)
- Have *names*, not addresses
  - In x86, start with % (e.g., %rsi)
- Registers are at the heart of assembly programming
  - Very useful, but scarce, *especially* in x86

#### Memory vs. Registers (Review)

#### Memory

• Addresses



- <u>Ex</u>: 0x7FFFD024C3DC
- Big
  - ~16GB
- Slow
  - ~50-100ns
- Dynamic
  - Can expand as needed

#### Registers

- Names
  - ∘ <u>Ex</u>:%rdi
- Small
  - 16 8-byte registers = 128B
- Fast
  - <1ns
- Static
  - Fixed number in hardware

## **General ISA Design Decisions**

- Instructions
  - What instructions are available? What do they do?
  - How are they encoded?
- Registers
  - How many are there?
  - How wide are they?
- Memory
  - How do you specify a memory location?

#### **Instruction Set Philosophies (Review)**

- Complex Instruction Set Computing (CISC): lots of elaborate instructions
  - Lots of tools for programmers to use, but hardware must be able to handle all instructions
  - x86-64 is CISC, but only a small subset of instructions encountered with Linux programs
- Reduced Instruction Set Computing (RISC): keep instruction set small and regular
  - Easier to build fast, less power-hungry hardware
  - Let software do the complicated operations by composing simpler ones
  - ARM, RISC-V

## Instruction Set Philosophies (Review) (pt 2)

- Complex Instruction Set Computing (CISC): lots of elaborate instructions
  - Lots of tools for programmers to use, but hardware must be able to handle all instructions
  - x86-64 is CISC, but only a small subset of instructions encountered with Linux programs

#### Example: ADDSUBPS

• "Adds odd-numbered single-precision floating-point values of the first source operand (second operand) with the corresponding single-precision floating-point values from the second source operand (third operand); stores the result in the odd-numbered values of the destination operand (first operand). Subtracts the even-numbered single-precision floating-point values from the second source operand from the corresponding single-precision floating values in the first source operand; stores the result into the even-numbered values of the destination operand."

#### **Mainstream ISAs**

| (intel)    |                                                |  |
|------------|------------------------------------------------|--|
| x86        |                                                |  |
| Designer   | Intel, AMD                                     |  |
| Bits       | 16-bit, 32-bit and 64-bit                      |  |
| Introduced | 1978 (16-bit), 1985 (32-bit), 2003<br>(64-bit) |  |
| Design     | CISC                                           |  |
| Туре       | Register-memory                                |  |
| Encoding   | Variable (1 to 15 bytes)                       |  |
| Branching  | Condition code                                 |  |
| Endianness | Little                                         |  |

PCs, older Macs x86-64 instruction set

arm

ARM

| Designer   | Arm Holdings                                                                                                                                                           |  |
|------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| Bits       | 32-bit, 64-bit                                                                                                                                                         |  |
| Introduced | 1985                                                                                                                                                                   |  |
| Design     | RISC                                                                                                                                                                   |  |
| Туре       | Register-Register                                                                                                                                                      |  |
| Encoding   | AArch64/A64 and AArch32/A32<br>use 32-bit instructions, T32<br>(Thumb-2) uses mixed 16- and<br>32-bit instructions; ARMv7 user-<br>space compatibility. <sup>[1]</sup> |  |
| Branching  | Condition code, compare and branch                                                                                                                                     |  |
| Endianness | Bi (little as default)                                                                                                                                                 |  |

Mobile devices, M1/M2 Macs

ARM instruction set

RISC-V

**RISC-V** 

| Designer   | University of California,<br>Berkeley |
|------------|---------------------------------------|
| Bits       | 32 · 64 · 128                         |
| Introduced | 2010                                  |
| Design     | RISC                                  |
| Туре       | Load-store                            |
| Encoding   | Variable                              |
| Endianness | Little <sup>[1][3]</sup>              |

Mostly research RISC-V instruction set

## **Current Industry Trends - A RISC-y Shift**

- Historically, there was a lot of debate about RISC vs CISC
  - Intel went the CISC route in the 1980s
    - Would make programming in assembly easier
    - Implementing more things in hardware
- Traditional wisdom says the RISC is better for simple systems, not PCs
- But things are shifting!
  - Apple switched to ARM in 2020
- Why?
  - Efficiency: RISC uses less power
  - Performance: each instruction is faster, easier to parallelize
  - Scalability: suitable for devices of all sizes (desktops, laptops, and phones)



#### **Architecture Sits at the Hardware Interface**



## Writing Assembly Code? In \$CURRENT\_YEAR???

- Chances are, you'll never write a program in assembly, but understanding it is
   the key to the machine-level execution model
  - Behavior of programs in the presence of bugs
    - When high-level language model breaks down
  - Tuning program performance
    - Understand optimizations done/not done by the compiler
  - Implementing systems software
    - What are the "states" of processes that the OS must manage
    - Using special units (timers, I/O co-processors, etc.) inside processor!
  - Fighting malicious software
    - Distributed software is in binary form



#### **Lecture Topics**

- Assembly intro
  - Instruction set philosophies
- X86-64 programming
  - Data types
  - Instructions
  - Registers
  - Memory addressing

# x86-64 Integer Registers – 64 bits wide

713

813

|                      | %rax | %eax | %r8  | <sup>ଌ</sup> r8d |
|----------------------|------|------|------|------------------|
|                      | %rbx | %ebx | %r9  | %r9d             |
| spacial<br>register, | %rcx | %ecx | %r10 | %r10d            |
| we'll talk           | %rdx | %edx | %r11 | %r11d            |
| lates!               | %rsi | %esi | %r12 | %r12d            |
|                      | %rdi | %edi | %r13 | %r13d            |
| ~                    | %rsp | %esp | %r14 | % <b>r14d</b>    |
|                      | %rbp | %ebp | %r15 | %r15d            |
|                      |      | 40   |      |                  |

## Some History: IA32 Registers – 32 bits wide





#### x86-64 Assembly "Data Types"

- Integral data of 1, 2, 4, or 8 bytes (b, w, l, q) signed and unsigned. Hu besn't
- Floating point data, not covered in 351
  - Different registers for those (e.g., %xmm1, %ymm2) Ο
  - Come from extensions to x86 (SSE, AVX, ...) Ο
- No aggregate types such as arrays or structs
  - Just contiguously allocate bytes in memory-only get single elements at a time Ο
- Two common syntaxes—Must know which you're reading!
  - AT&T: <u>used in our course</u>, gnu tools (including gcc), ...
  - Intel: used in Intel documentation, Intel tools, ...

Why AT&T? It used to be Bell Lobs!

#### **Instruction Sizes and Operands (Review)**

- Size specifiers
  - **b** = 1-byte ("byte")
  - w = 2-byte ("word") -
  - **1** = 4-byte ("long word")
  - o q = 8-byte ("quad word")
  - If using registers, much match width
- Operand types
  - Immediate: constant value (\$)
  - Register: 1 of 16 general-purpose registers (%)
  - Memory: consecutive bytes of memory at a computed address ( () )

Why is "word" 2 bytes? Because that was the word size when x86 was new, and it has to be maintained for backwards compatibility.

see memory a

#### **Instruction Types (Review)**

- 1. Transfer data between memory and a register
  - Load from memory -> register
    - %reg = Memory[address]
  - Store from register -> memory
    - Memory[address] = %reg
  - Note: cannot transfer between two memory locations in one instruction!
- 2. Perform arithmetic operation on register or memory data
  - <u>Ex</u>: c = a + b; z = x << y; i = h & g;
- 3. Control flow: what instruction to execute next
  - Unconditional jumps to/from procedures
  - Conditional branches

future lecture

Remember: Memory is indexed just like an array of bytes!

# Moving Data

237

men

- court hove on imadiate
- - More of a "copy" than a "move" Ο
  - Missing letter () is for the width specifier Ο

Ex: movq %rax, %rbx

- Copies the 8-byte value from register %rax into register %rbx Ο
- Operand Combinations:
  - **Immediate** -> **Register** or **Memory** (copies Immediate value to location) Ο
  - **Register** -> **Register** or **Memory** (copies data in register to location) Ο
  - **Memory** -> **Register** (copies data in memory to register) Ο
    - Can't go from memory -> memory in a single instruction!

### **Some Arithmetic Operations**

- Binary (two-argument) operations
  - Beware argument order!
    - src can be immediate, register, or memory
    - dst only register or memory
    - Results always stored in dst
  - Maximum of <u>one</u> memory operand! <u>seven</u> <u>est</u>
  - No distinction between signed and unsigned
    - Only arithmetic vs logical shifts

| Format                | Computation      | Notes                |
|-----------------------|------------------|----------------------|
| addq src, dst         | dst = dst + src  |                      |
| <b>sub</b> q src, dst | dst = dst - src  |                      |
| imulq src, dst        | dst = dst * src  |                      |
| <b>sar</b> q src, dst | dst = dst >> src | Arithmetic           |
| <b>shr</b> q src, dst | dst = dst >> src | Logical              |
| <b>shl</b> q src, dst | dst = dst << src | Same as <b>shl</b> q |
| <b>xor</b> q src, dst | dst = dst ^ src  |                      |
| <b>and</b> q src, dst | dst = dst & src  |                      |
| <b>or</b> q src, dst  | dst = dst   src  |                      |

#### **Practice Question**

Which of the following are valid implementations of rcx = rax + rbx?

addq %rax, %rcx rex= rex + rex
movq %rax, %rcx rex= rex + rex
addq %rbx, %rcx rex= rex + rex + rex + rex + rex
movq \$0, %rcx rex= rex + rex
addq %rbx, %rcx rex= 0+rex + rex
xorq %rax, %rax rex= 0
xorq %rax, %rcx rex= rex + rex
addq %rax, %rcx rex= rex + rex

### **Arithmetic Example**



| Register | Uses         |  |
|----------|--------------|--|
| %rdi     | 1st arg (x)  |  |
| %rsi     | 2nd arg (y)  |  |
| %rax     | return value |  |



#### **Example of Basic Addressing Modes**

```
long add_ptr(long* xp, long* yp)
{
    long t0 = *xp;
    long t1 = *yp;
    return t0 + t1;
}
```

```
add_ptr:

movq (%rdi), %rdx

movq (%rsi), %rax

addq %rdx, %rax

ret
```

 Parentheses = memory addressing

 Treat the value in the register as an address

Compiler Explorer: <u>https://godbolt.org/z/zc4Pcq</u>

## Understanding add\_ptr()

```
long add_ptr(long* xp, long* yp)
{
    long t0 = *xp;
    long t1 = *yp;
    return t0 + t1;
}
```

```
add_ptr:

movq (%rdi), %rdx

movq (%rsi), %rax

addq %rdx, %rax

ret
```

|      | Register      | Variable         |      |
|------|---------------|------------------|------|
|      | %rdi          | хр               |      |
|      | %rsi          | ур               |      |
|      | %rdx          | t0               | -    |
|      | %rax          | return           |      |
| Re   | egisters      | Mei              | nory |
| %rd  | i 🗙 🕐 🔶       | → <del>×</del> 7 | KR   |
| %rs  | i <b>5( (</b> |                  |      |
| %rax | × * 58+ *     | <u>«</u> —       |      |

%rdx ¥X

×

#### Memory Addressing Modes

- General format: D(Rb,Ri,S) = Mem[Reg(Rb) + Reg(Ri)\*S + D]•
  - Rb = base register (any register) Ο
  - Ri = index register (any register except %rsp) Ο
  - S = scale factor (1, 2, 4, 8) Why these numbers? some as data withs D = displacement value (immediate) Ο
  - 0
- Can leave any of these out: 5 before to, all other
  - S=1 default to 0  $\circ$  D(Rb,Ri) -  $\mathcal{A}$  +  $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$ ; + $\mathcal{A}$
  - $\circ$  (Rb,Ri,S) =  $4 \times 3$ - D=0
  - D=0, S=1  $\circ$  (Rb,Ri) =  $\mu_{\mu}$
  - $\circ$  ( $_{O}Ri,S$ ) =  $\underset{\sim}{H}\times$ -D=0. Rb=0
  - etc...

in the example (: /. rdi) on previous slike, /. rki was ph

# **Address Computation Examples** Stoppel here, we'll continue ses on Friday :

8-bit addresses

- %rdx 0xF000 %rcx 0x0100
- Reminder:  $D(Rb,Ri,S) \rightarrow Mem[Reg[Rb]+Reg[Ri]*S+D]$ 
  - (ignoring memory addressing portion for this exercise) Ο

| Expression      | Address Computation | Address |
|-----------------|---------------------|---------|
| 0x8(%rdx)       |                     |         |
| (%rdx, %rcx)    |                     |         |
| (%rdx, %rcx, 4) |                     |         |
| 0x80(, %rcx, 2) |                     |         |

## Summary

- x86-64 is a complex (CISC) architecture
  - There are 3 types instructions
    - Data transfer
    - Arithmetic
    - Control flow
  - $\circ$   $\,$  There are 3 types of operands
    - Registers (%)
    - Immediates (\$)
    - Memory ( ( ) )
- Registers are small, fast places to store memory
  - Limited number, each with their own name