## University of Washington – Computer Science & Engineering

| Autumn 2017 Ir                                                                                                                                                                                                        | nstructor:                         | Justin Hsia    | 2017-12-13    |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------|----------------|---------------|
| CSE                                                                                                                                                                                                                   | 35                                 | 1 <b>F</b>     | NAL           |
| Last Nam                                                                                                                                                                                                              | ne:                                | Р              | erfect        |
| First Nan                                                                                                                                                                                                             | ne:                                | I              | Perry         |
| Student ID Numb                                                                                                                                                                                                       | er:                                | 1:             | 234567        |
| Name of person to your Left   Rig                                                                                                                                                                                     | ght <mark>Sa</mark>                | mantha Student | Larry Learner |
| All work is my own. I had no prior knowledge of the ex<br>contents nor will I share the contents with other<br>CSE351 who haven't taken it yet. Violation of these ter<br>could result in a failing grade. (please si | xam<br>rs in<br>rms<br><b>ign)</b> |                |               |

## Do not turn the page until 12:30.

## Instructions

- This exam contains 14 pages, including this cover page. Show scratch work for partial credit, but put your final answers in the boxes and blanks provided.
- The last page is a reference sheet. Please detach it from the rest of the exam.
- The exam is closed book (no laptops, tablets, wearable devices, or calculators). You are allowed two pages (US letter, double-sided) of *handwritten* notes.
- Please silence and put away all cell phones and other mobile or noise-making devices. Remove all hats, headphones, and watches.
- You have 110 minutes to complete this exam.

#### Advice

- Read questions carefully before starting. Skip questions that are taking a long time.
- Read *all* questions first and start where you feel the most confident.
- Relax. You are here to learn.

| Question        | M1 | M2 | M3 | M4 | M5 | F6 | $\mathbf{F7}$ | F8 | F9 | F10 | Total |
|-----------------|----|----|----|----|----|----|---------------|----|----|-----|-------|
| Possible Points | 8  | 2  | 8  | 10 | 8  | 10 | 9             | 10 | 9  | 5   | 79    |

## Question M1: Number Representation [8 pts]

(A) Take the 32-bit numeral **0xC0800000**. Circle the number representation below that has the *negative* value for this numeral. [2 pt]

| Floating P              | oint Sign & Magnitude                    | Two's Complement             | Unsigned |
|-------------------------|------------------------------------------|------------------------------|----------|
| <u>Unsigned</u> :       | Can only represent positive numbers.     |                              |          |
| <u>Floating Point</u> : | $S=1~and~E=10000001_2 \rightarrow Exp=2$ | , so a small negative number | :        |
| <u>Sign &amp; Mag</u> : | Negative number with magnitude 100       | $0\ 0000\ 100_2.$            |          |
| <u>Two's</u> :          | Negative number with magnitude 011       | 11111 1002 (flip bits + 1).  |          |

(B) Let float f hold the value 2<sup>20</sup>. What is the *largest power of 2* that gets rounded off when added to f? Answer in exponential form, not just the exponent. [2 pt]
23 bits in M, so need 24<sup>th</sup> power less than 2<sup>20</sup> to get rounded off.

**Traffic lights** display three basic colors: red (R), yellow (Y), and green (G), so we can use them to encode base 3! We decide to use the encoding  $0 \leftrightarrow R$ ,  $1 \leftrightarrow Y$ ,  $2 \leftrightarrow G$ . For example,  $5 = 1 \times 3^{1} + 2 \times 3^{0}$  would be encoded as **YG**. Assume each traffic light can only display one color at a time.

(C) What is the *unsigned* decimal value of the traffic lights displaying  $\mathbf{RGYY}$ ? [2 pt]

 $0 \times 3^3 + 2 \times 3^2 + 1 \times 3^1 + 1 \times 3^0 = 18 + 3 + 1 = 22.$ 

(D) If we have **9** bits of binary data that we want to store, how many *traffic lights* would it take to store that same data? [2 pt]

9 bits represents 512 things. Powers of 3: 1, 3, 9, 27, 81, 243, <u>729</u>.

# Question M2: Design Question [2 pts]

(A) The machine code for x86-64 instructions are variable length. Name one advantage and one disadvantage of this design decision. [2 pt]

Advantage: Machine code/Code section of memory is more compact (don't need to pad). No limit on number of instructions in ISA.

Disadvantage: Harder to tell/find where to read next instruction. Need more complex hardware to fetch and/or decode instructions.

<u>22</u>

6 traffic lights



## Question M3: Pointers & Memory [8 pts]

For this problem we are using a 64-bit x86-64 machine (little endian). Below is the count\_nz function disassembly, showing where the code is stored in memory.

| 000000000040 | 0536   | <count_nz< th=""><th>&gt;:</th><th></th></count_nz<> | >:      |                                        |
|--------------|--------|------------------------------------------------------|---------|----------------------------------------|
| 400536: 8    | 5 f6   |                                                      | testl   | %esi,%esi                              |
| 400538: 7    | 'e 1b  |                                                      | jle     | 400555 <count_nz+0x1f></count_nz+0x1f> |
| 40053a: 5    | 3      |                                                      | pushq   | %rbx                                   |
| 40053b: 8    | b 1f   |                                                      | movl    | (%rdi),%ebx                            |
| 40053d: 8    | 3 ee   | 01                                                   | subl    | \$0x1,%esi                             |
| 400540: 4    | 8 83   | c7 04                                                | addq    | \$0x4,%rdi                             |
| 400544: e    | e8 ed  | ff ff ff                                             | callq   | 400536 <count_nz></count_nz>           |
| 400549: 8    | 5 db   |                                                      | testl   | %ebx,%ebx                              |
| 40054b: 0    | f 95   | c2                                                   | setne   | %dl                                    |
| some         | e inst | ructions o                                           | omitted | here                                   |

(A) What are the values (in hex) stored in each register shown after the following x86 instructions are executed? Use the appropriate bit widths. <u>Hint</u>: what is the *value* stored in %rsi? [4 pt]

| Register | Value (hex)            |  |  |  |  |  |
|----------|------------------------|--|--|--|--|--|
| %rdi     | 0x 0000 0000 0040 0544 |  |  |  |  |  |
| %rsi     | 0x FFFF FFFF FFFF FFFF |  |  |  |  |  |
| %eax     | 0x <b>0040 0545</b>    |  |  |  |  |  |
| %bx      | 0x 8348                |  |  |  |  |  |

leal 2(%rdi, %rsi), %eax
movw (%rdi,%rsi,4), %bx

leal instruction calculates the address 0x400544 + (-1) + 2 = 0x400545.

movw instruction pulls two bytes starting at memory address  $0x400544+4^{*}(-1) = 0x400540$ , which is 0x48 and 0x83. Remember little-endian!

(B) Complete the C code below to fulfill the behaviors described in the inline comments using pointer arithmetic. Let char\* charP = 0x400544. [4 pt]

```
char v1 = *(charP + __6__); // set v1 = 0xDB
int* v2 = (int*)((__double__*)charP - 2); // set v2 = 0x400534
```

The only 0xDB byte in count\_nz is found at address 0x40054a, 6 bytes beyond charP.

The difference between v2 and charP is 16 bytes. Since by pointer arithmetic we are moving 2 "things" away, charP must be cast to a pointer to a data type of size 8 bytes. Long or any pointer (except void\*) also accepted.

# Question M4: Procedures & The Stack [10 pts]

The function count\_sp counts the number of *spaces* in a char array (this is the recursive version of the mystery function from the Midterm). The function and its *disassembly* are shown below:

```
int count_sp(char* str) {
    if (*str)
        return (*str == ' ') + count_sp(str+1);
        return 0;
}
```

```
000000000400536 <count_sp>:
  400536:
           0f b6 07
                            movzbl (%rdi),%eax
  400539:
           84 c0
                            testb
                                    %al,%al
  40053b:
           74 16
                                    400553 <count_sp+0x1d>
                             je
  40053d:
           53
                            pushq
                                    %rbx
  40053e:
                                    $0x20,%al
           3c 20
                            cmpb
           0f 94 c3
  400540:
                            sete
                                    %bl
           Of b6 db
  400543:
                            movzbl %bl,%ebx
  400546:
           48 83 c7 01
                            addq
                                    $0x1,%rdi
  40054a:
           e8 e7 ff ff ff
                                    400536 <count_sp>
                            callq
  40054f:
           01 d8
                            addl
                                    %ebx,%eax
  400551:
           eb 06
                                    400559 <count_sp+0x23>
                             jmp
           b8 00 00 00 00
  400553:
                            movl
                                    $0x0,%eax
  400558:
           с3
                            retq
  400559:
           5b
                                    %rbx
                            popq
  40055a:
           с3
                            retq
```

(A) The information found in the *right-most* column/portion of the disassembly is first generated as the output of which of the following? Circle one. [1 pt]

Assembler Linker Loader Compiler (B)The *left-most* column of the disassembly was generated by which of the following? [1 pt] Compiler Assembler Linker Loader (C) Why is *\*rbx* being pushed onto the stack? What is *\*rbx* being used for in this function? [2 pt] Why push: Because (1) %rbx is a callee-saved register and (2) count sp chooses to use/change this register. rbx is being used to store the value of str = ' ' (is this char a space?). Usage:

 $\mathbf{5}$ 

(D) What is the return address to count\_sp that gets stored on the stack? Answer in hex. [1 pt]

The address of the instruction *after* callq.

(E) Provide a call to count\_sp that is *guaranteed* to cause a **segmentation fault**. [1 pt]

count\_sp( \_\_\_\_\_NULL\_\_\_\_ );

Zero also accepted, with and without casting.

(F) We call count\_sp(" ! "). Fill in the incomplete snapshot of the stack below (in hex) once this call to count\_sp returns to main. For unknown words, write "garbage". [4 pt]

| <ret addr="" main="" to=""></ret> | 0x7ffffffdb68  |
|-----------------------------------|----------------|
| <original rbx=""></original>      | 0x7ffffffdb60  |
| 0x40054f                          | 0x7fffffffdb58 |
| 0x1                               | 0x7fffffffdb50 |
| 0x40054f                          | 0x7fffffffdb48 |
| 0x0                               | 0x7fffffffdb40 |
| 0x40054f                          | 0x7fffffffdb38 |
| garbage                           | 0x7fffffffdb30 |
| garbage                           | 0x7fffffffdb28 |
| garbage                           | 0x7fffffffdb20 |
|                                   |                |

4 total stack frames of count\_sp created: count\_sp(" ! ")  $\rightarrow$  count\_sp("! ")  $\rightarrow$  count\_sp(" ")  $\rightarrow$  count\_sp(" "). Stack data in these frames alternates between return addresses (full credit given for your answer to part D) and pushed %rbx values (credit given for your answer to part C). The last stack frame hits the base condition and doesn't push %rbx onto the stack. The data below this point is considered garbage.

0x **40054f** 

SID: \_\_\_\_\_

# Question M5: C & Assembly [8 pts]

Answer the questions below about the following x86-64 assembly function, which uses a struct:

| mystery | :     |               |   |      |    |
|---------|-------|---------------|---|------|----|
| .L3:    | testq | %rdi, %rdi    | # | Line | 1  |
|         | je    | .L4           | # | Line | 2  |
|         | cmpw  | %si, 0(%rdi)  | # | Line | 3  |
|         | je    | .L5           | # | Line | 4  |
|         | movq  | 8(%rdi), %rdi | # | Line | 5  |
|         | jmp   | .L3           | # | Line | 6  |
| .L4:    | movl  | \$0, %eax     | # | Line | 7  |
|         | retq  |               | # | Line | 8  |
| .L5:    | movl  | \$1, %eax     | # | Line | 9  |
|         | retq  |               | # | Line | 10 |

- (A) What C variable type would %rsi be in the corresponding C program? [1 pt]
   In Line 3, %si is used in a cmpw instruction.
- (B) %rdi is a pointer to a struct that contains 2 fields. What is the width of the second field? [1 pt]
   Used in a movq instruction in Line 5. Also at offset of 8 bytes, matching
   its alignment requirement.
- (C) Based on Line 5, give an intuitive name for the second field in the struct. [1 pt]

Other variants accepted.

(D) Convert lines 1, 2, 7, and 8 into C code. Use variable names that correspond to the register names (e.g. al for the value in %al). [3 pt]

if ( \_rdi == NULL ) \_\_\_return 0\_\_\_;

(E) Describe at a high level what you think this function accomplishes (not line-by-line). [2 pt]
 Returns 1 if linked list (singly-linked, linear) contains a specified value (in %si).

next or ptr

\_**short**\_\_ rsi

\_**9**\_\_\_ bits

1/8 = 12.5%

#### Question F6: Caching [10 pts]

We have 64 KiB of RAM and a 2-KiB L1 data cache that is 4-way set associative with 32-byte blocks and random replacement, write-back, and write allocate policies.

(A) Calculate the TIO address breakdown: [1.5 pt]

| Tag bits | Index bits | Offset bits |
|----------|------------|-------------|
| 7        | 4          | 5           |

16 address bits.  $\log_2 32 = 5$  offset bits. 2-KiB cache = 64 blocks. 4 lines/set  $\rightarrow$  16 sets.

(B) How many management bits (bits other than the block data) are there in every line in the cache? [1 pt]

Tag bits + Valid bit + Dirty bit (write-back)

(C) The code snippet below accesses an array of doubles. Assume i is stored in a register. Calculate the Miss Rate if the cache starts *cold*. [2.5 pt]

#define ARRAY\_SIZE 256
double data[ARRAY\_SIZE]; // &data = 0x1000 (physical addr)
for (i = 0; i < ARRAY\_SIZE; i += 1)
 data[i] /= 100;</pre>

Access pattern is read then write to data[i]. Stride = 1 double = 8 bytes. 32/8 = 4 strides per block. The offset of &data is 0b00000, so we start at the beginning of a cache block. First access (read) is a compulsory miss and the next 7 (over 4 different addresses) are hits. Since we never revisit indices, this pattern continues for all cache blocks.

(D) For each of the proposed (independent) changes, write IN for "increased", NC for "no change", or
 DE for "decreased" to indicate the effect on the Miss Rate for the code above: [4 pt]

| Use float instead                                                          | _DE_ | Half the cache size | _NC_ |
|----------------------------------------------------------------------------|------|---------------------|------|
| <pre>Split the loop body into:<br/>data[i] /= 10;<br/>data[i] /= 10;</pre> | _DE  | No-write allocate   | _NC_ |
|                                                                            |      |                     |      |

Using floats means more strides/block. We never revisit blocks, so cache size doesn't matter. Since the entire array fits in the cache, running it through a  $2^{nd}$  loop results in all hits. No-write allocate has no effect because all of our misses are on *reads*.

(E) Assume it takes 100 ns to get a block of data from main memory. If our L1 data cache has a hit time of 2 ns and a miss rate of 3%, what is the average memory access time (AMAT)? [1 pt]

 $AMAT = HT + MR \times MP = 2 + 0.03 \times 100 = 5$  \_\_\_\_\_5 ns

## Question F7: Processes [9 pts]

(A) The following function prints out four numbers. In the following blanks, list three possible outcomes: [3 pt]



(B) For the following examples of exception causes, write "N" for intentional or "U" for unintentional from the perspective of the user process. [2 pt]

System call \_\_\_N\_\_\_

Hardware failure \_\_\_\_**U**\_\_\_

Segmentation fault \_\_\_\_U\_\_\_

Mouse clicked \_\_\_\_U\_\_\_

Syscalls are part of code you are executing. The others are external to the process.

(C) Briefly define a **zombie** process. Name a process that can *reap* a zombie process. [2 pt]

Zombie process: A process that has ended/exited but is still consuming system resources.

Reaping process: The parent process or init/systemd (PID 1).

(D) In the following blanks, write "Y" for yes or "N" for no if the following need to be updated when execv is run on a process. [2 pt]

Page table  $\_\__{\underline{Y}\_}$  PTBR  $\_\__{\underline{N}\_}$  Stack  $\_\__{\underline{Y}\_}$  Code  $\_\__{\underline{Y}\_}$ 

The process already has its own page table, so while we will need to invalidate PTEs from the old process image, we don't need to create another page table, so the PTBR can remain the same. We replace/update the old process image's virtual address space, including Stack and Code.

#### SID: \_\_\_\_\_

## Question F8: Virtual Memory [10 pts]

Our system has the following setup:

- 20-bit virtual addresses and 64 KiB of RAM with 256-B pages
- A 4-entry TLB that is fully associative with LRU replacement
- A PTE contains bits for valid (V), dirty (D), read (R), write (W), and execute (X)

(A) Compute the following values: [4 pt]

Page offset width  $\_8 \text{ bits}\_$  # of physical pages  $\_2^8=256$ 

# of virtual pages  $2^{12}$ =4096 TLBI width \_0 bits\_

Page offset is  $\log_2 256 = 8$  bits wide. VA space is  $2^{20}$  bytes, so  $2^{20}/2^8$  virtual pages and  $2^{16}/2^8$  physical pages. TLB is fully associative, so just one set and  $\log_2 1 = 0$  TLBI bits.

(B) Briefly explain why we make physical memory write-back and fully-associative. [2 pt]

Write-back: Avoid writing to disk as much as possible, or only when we absolutely need to.

Fully-associative: Don't waste space in RAM; we want to use it fully utilize the limited space we have.

(C) The TLB is in the state shown when the following code is executed. The code eventually causes a **protection fault**. What are the values of the variables when the fault occurs? [4 pt]

```
long *p = 0x7F080;
for (int i = 0; 1; i++) {
    *p += 1;
    p += 4;
}
```

```
p = ___0x7F200___
```

i = \_\_12\_\_\_

| TLBT  | PPN  | Valid | R | W | Χ |
|-------|------|-------|---|---|---|
| 0x7F0 | 0xC3 | 1     | 1 | 1 | 0 |
| 0x7F2 | 0x3D | 1     | 1 | 0 | 0 |
| 0x004 | 0xF4 | 1     | 1 | 0 | 1 |
| 0x7F1 | 0x42 | 1     | 1 | 1 | 0 |

The loop reads and then writes to the address in pointer p and then strides by 4 longs = 32 bytes = 0x20 addresses. p starts at address 0x7F080, which is at page offset 0x80, the middle of the page. We see in the TLB that this page (TLBT 0x7F0) is valid with read and write privileges, as is the following page (TLBT 0x7F1). However, the following page lacks write privileges, so our first *write* to that page will cause a protection fault. This occurs at the very first byte of the page: 0x7F200, which is reached when i = 12.

#### Question F9: Memory Allocation [9 pts]

(A) In a free list, what is a **footer** used for? Be specific. Why did we not need to use one in allocated blocks in Lab 5? [2 pt]

Footer: The footer is used to get information about the previous neighboring block. The footer is used for traversing the blocks in the heap *backwards*. The footer is used for bidirectional coalescing.

Lab 5: In Lab 5, we used a TAG\_PRECEDING\_USED tag in block headers instead, which was sufficient because we don't coalesce with allocated blocks.

(B) We are designing a dynamic memory allocator for a 64-bit computer with 4-byte boundary tags and alignment size of 4 bytes. Assume a footer is always used. Answer the following questions: [4 pt]

Maximum tags we can fit into the header (ignoring size): <u>2</u> tags

Minimum block size if we implement an *explicit* free list: \_\_\_\_24\_\_\_ bytes

Maximum block size (leave as expression in powers of 2): \_\_\_2<sup>32</sup>-2<sup>2</sup>\_\_ bytes

With 4-byte alignment, lowest 2 bits are guaranteed to be zeros.

Explicit free list has minimum size that includes header, two pointers, and footer. We are told boundary tags (header, footer) are 4 bytes each and pointers are 8 bytes in a 64-bit machine. Max block size is when the size field is all 1's (with two 0's at the bottom for alignment).

(C) Consider the C code shown here. Assume that the malloc call succeeds and foo is stored in memory (not just in a register). Fill in the following blanks with ">" or "<" to compare the values returned by the following expressions just before return 0. [3 pt]

> &foo \_\_>\_ &ZERO &str \_\_>\_ ZERO &main < str

```
#include <stdlib.h>
int ZERO = 0;
char* str = "cse351";
int main(int argc, char *argv[]) {
    int *foo = malloc(8);
    free(foo);
    return 0;
}
```

ZERO and str are global variables, so their *addresses* are in the Static Data section of memory. str's *value* is the address of a string literal, which sits at the bottom portion of Static Data. foo is a local variable, so its *address* is in the Stack, but its *value* is an address in the Heap. main is a label in Code/Instructions.

The virtual address space is arranged such that 0 < Instructions < Static Data < Heap < Stack.

# Question F10: C and Java [5 pts]

For this question, use the following Java object definition and C struct definition. Assume addresses are all 64-bits.

| public cla | <b>ass</b> RentalJ {                           | sti | <b>ruct</b> Re | entalC { | <u>K</u> : |
|------------|------------------------------------------------|-----|----------------|----------|------------|
| String     | addr;                                          |     | char*          | addr;    | 8          |
| short      | rooms;                                         |     | short          | rooms;   | 2          |
| float      | rent;                                          |     | float          | rent;    | 4          |
| int[]      | zip;                                           |     | int            | zip[5];  | 4          |
|            |                                                | };  |                | Kmax =   | 8          |
| public     | <pre>void info() {</pre>                       | -   |                |          |            |
| Syst       | <pre>tem.out.println("Rental at "+addr);</pre> |     |                |          |            |
| }          |                                                |     |                |          |            |
| }          |                                                |     |                |          |            |
| public cla | <b>ass</b> Apt <b>extends</b> RentalJ {        |     |                |          |            |
| int roo    | ommates;                                       |     |                |          |            |
| public     | <pre>int occupants() {</pre>                   |     |                |          |            |
| retu       | <b>urn</b> roommates+1;                        |     |                |          |            |
| }          |                                                |     |                |          |            |
| }          |                                                |     |                |          |            |

(A) How much memory, in bytes, does an instance of struct RentalC use? How many of those bytes are *internal* fragmentation and *external* fragmentation? [3 pt]

| <pre>sizeof(struct RentalC)</pre> | Internal | External |
|-----------------------------------|----------|----------|
| 40 bytes                          | 2 bytes  | 4 bytes  |

Alignment requirements listed above in red, next to the struct fields. A struct RentalC instance will look as shown below:

| addr |     |    | rent |    | zip[0]-[4] |    |    |
|------|-----|----|------|----|------------|----|----|
| 0 8  | 8 : | 10 | 12   | 16 |            | 36 | 40 |

The 2 bytes between rooms and rent count as internal fragmentation. The 4 bytes at the end count as external fragmentation.

(B) How much *longer*, in bytes, are the following for Apt than for RentalJ? Assume the Java instance fields are aligned to 4 bytes. [2 pt]

Instance: 4 by

: 4 bytes

Virtual method table (vtable): 8 bytes

Apt extends RentalJ by adding a field and a method, so the length of that field (4 bytes for an int) is added to the object instance length (no padding needed due to 4-byte alignment) and the length of a reference is added to the vtable.

# This page purposely left blank