CSE 378 Final
Ben Dugan, Winter 2001
3/14/01 Name: _________________________
Instructions:
This test has 10 questions, totaling 90 points.
The test is closed book, closed notes, closed calculator, open mind,
etc. Remember, don't spend all your time on one question. You should
budget roughly one minute per point. Please show your work for
partial credit.
Question | Possible | Score |
Page 2 | 6 |
|
Page 3 | 10 |
|
Page 4 | 10 |
|
Page 5 | 10 |
|
Page 6 | 10 |
|
Page 7 | 8 |
|
Page 8 | 17 |
|
Page 9 | 19 |
|
Total | 90 |
|
Question 1. Data Dependences and Data Hazards (6 points)
Consider the following fragment of a program:
add $1, $2, $3 # 1
lw $6, 8($1) # 2
or $5, $6, $6 # 3
sub $7, $2, $4 # 4
sw $7, 4($1) # 5
add $7, $5, $6 # 6
1. Given a 5 stage pipeline that resolves data hazards through stalling,
how many cycles would the above sequence take to execute? Assume that
the results from an ALU operation are available 3 cycles later, and results
from loads are available 2 cycles later. You should ignore the "startup
cost" of the pipeline. In other words, don't count the cycles before the
pipeline is at full capacity.
2. Given a 5 stage pipeline that resolves data hazards through forwarding,
how many cycles would the above sequence take to execute? Again,
ignore the "startup cost" of the pipeline.
3. Can you reorder the above sequence to further improve the forwarding
pipeline? If yes, show the new sequence (you can just show the
numbers of the instructions). If no, explain your answer.
Question 2. Control Hazards (10 points)
Consider the following fragment of a program:
top: sw $0, 0($1) # 1
addi $1, $1, 4 # 2
addi $2, $2, 1 # 3
bne $2, $3, top # 4
Assume that the loop iterates 10 times and that our pipeline has a
branch delay of 2 cycles. That is, the branch is resolved at the
end of the Execute state (the third stage). The pipeline
uses forwarding to resolve data hazards to the extent possible.
1. Suppose the pipeline resolves branch hazards by always stalling
the pipeline for 2 cycles. How many cycles does it take to execute
the above fragment? Again, ignore the "startup cost" of the pipeline.
2. Suppose the pipeline uses a predict-not-taken scheme, where every
branch is predicted as not taken. Correct predictions cost nothing,
mispredictions cost 2 cycles. How many cycles does it take to execute
the above fragment? Again, ignore the "startup cost" of the pipeline.
3. Suppose the pipeline uses a branch prediction table. The penalty
for a misprediction is 2 cycles. How many cycles does it take to execute
the above fragment? (Assume mispredictions on the first and last iteration
of the loop.)
4. Suppose the pipeline uses delayed branches (with 2 delay slots).
Rewrite the code fragment (you can just show the numbers) to take
advantage of the delay slots. Insert nops if you can't fill all of
the slots. How many cycles does it take to execute the new code?
Question 3. MIPS Assembly Coding (10 points)
Please translate the following C fragments into MIPS assembly code.
Assume the following declarations have already been translated and
that $t0 holds the value of x, $t1 holds the value of y, and $t2 holds
the base address of the array foo. (You don't have to worry about
updating the respective memory locations for variables x and y!)
int x, y, foo[100];
x = foo[17];
foo[x] = y;
y = y + x + 1;
if (x < 0) {
x = -x;
}
Question 4. Machine Organization (10 points)
1. Which of the following best describes the impact of going from a
single-cycle implementation to a multi-cycle implementation?
- decrease cycle time, increase CPI
- decrease cycle time
- increase cycle time, increase CPI
- increase cycle time, decrease CPI
- increase CPI
2. Which of the following best describes the impact of going from
a single-cycle implementation to a pipelined implementation?
- decrease cycle time, increase instruction count, increase transistor count
- increase cycle time, increase CPI, decrease instruction count
- decrease cycle time, increase transistor count, usually increase CPI
- decrease cycle time, increase transistor count
- increase cycle time, decrease transistor count
3. Which of the following best describes the impact of increasing the
number of stages in a pipeline (superpipelining)?
- increase cycle time, increase data dependences, increase branch delays
- decrease cycle time, increase data dependences
- decrease cycle time, decrease CPI, increase data dependences
- decrease cycle time, increase data dependences, increase branch delays
- decrease cycle time, decrease data dependences, increase branch delays
4. Which of the following best describes the impact of building a machine
that can issue multiple instructions simultaneously (superscalar) versus
the 5 stage pipeline we studied in class?
- decrease CPI, decrease instruction count, increase data dependences
- decrease CPI, increase instruction count, increase data dependences
- decrease CPI, increase data dependences
- increase CPI, increase data dependences
- increase instruction count, decrease cycle time
5. Which of the following best describes the impact of improving compiler
technology (for a given ISA and machine organization)?
- decrease instruction count, minimize loads and stores, improved
instruction scheduling
- decrease instruction count, improved cycle time
- decrease instruction count, minimize loads and stores,
- increase instruction count, improved cycle time
- increase instruction count, improved instruction scheduling
Question 5. Caching (10 points)
Suppose a benchmark program executes 10,000,000 instructions. On a
processor with one level of cache, it has a cache miss rate of 10% and
the penalty for a miss is 50 cycles.
1. Suppose 10% of the instructions are stores and 20% are loads. What
is the average number of memory references per instruction?
2. What is the average number of stall cycles per instruction?
3. In a perfect world (no cache misses) assume the program has a
CPI of 2.0. What is the relative performance of the program in the
perfect world versus the real world (with misses)?
4. Suppose we add an L2 cache (a second cache between the L1 cache and
the main memory) that has a miss rate of 1%. If a miss occurs in the
L1 cache, the L2 cache is checked for a possible hit. Misses in the
L1 cache that hit in the L2 cache incur a penalty of only 5 cycles,
while misses in both caches cost 50 cycles. How much faster is the new
organization?
Question 6. Memory Hierarchy (8 points)
1. Which of the following best describes the behavior of a write-back,
write-around cache on a read miss?
- The block is updated only in memory
- The block is updated only in the cache
- The block is fetched from memory into the cache
- Dirty words in the existing block are written back to
memory, and the new block is brought in from memory
- Dirty words in the existing block are written back to
memory
2. Which of the following best describes the behavior of a write-through,
write-allocate cache on a write miss?
- The block is fetched from memory, updated in the cache, and
updated in memory.
- The block is updated only in memory.
- The new block is fetched from memory and updated in the cache.
- The block is updated only in the cache.
- Dirty words in the existing block are written back to memory,
the, the new block is fetched from memory, updated in the cache,
and updated in memory.
3. Which of the following best describes the behavior of the OS
during a TLB miss on a MIPS machine?
- Nothing, the miss is handled in hardware.
- The OS loads up the TLB with the correct page table entry, and
context switches to another program.
- The program has performed an illegal operation, so it is killed.
- The OS loads up the TLB with the correct page table entry, and
restarts the program.
- The OS brings the right page in from disk.
4. Which of the following best describes the behavior of the OS
during a page fault?
- Nothing, the page fault is handled in hardware.
- The OS initiates a disk I/O to bring in the right page, and
context switches to another program.
- The OS loads up the TLB with the correct page table entry, and
context switches to another program.
- The OS brings in the right page from disk.
- The OS initiates a disk I/O to bring in the right page, and
restarts the faulting program.
Question 7. Cache Organization (5 points)
Suppose we have a cache with block (line) size of 8 words (32 bytes) and
that this cache has a total of 1024 (1K) lines.
1. What is the total capacity of the cache (in bytes)?
2. Show how the below 32 bit address is broken into its 3 component
parts (tag, index, discard).
+----------------------+
| |
+----------------------+
Question 8. Virtual Memory & Paging (12 points)
Suppose we have a page-based virtual memory system with 8KB pages,
and that we wish to have a 4GB virtual address space.
1. How many entries are in the page table?
2. If each page table entry is 4 bytes, how large is the page table (in bytes)?
3. Show how the below 32 bit address is broken into its 2 component
parts (virtual page number and offset).
+----------------------+
| |
+----------------------+
4. How can the OS allow process P1 to share a region of memory with
process P2 in a page-based VM system?
5. How can the OS guarantee that process P1 doesn't get to read or write
memory belonging to process P3 in a page-based VM system?
Question 9. TRUE/FALSE (15 points)
- TRUE / FALSE Paging systems usually use a write-through policy in
order to keep the disk coherent with memory.
- TRUE / FALSE TLBs are used to cache data on pages.
- TRUE / FALSE Page faults are generally handled in hardware.
- TRUE / FALSE Higher associativity reduces conflict misses.
- TRUE / FALSE Larger lines (blocks) in a cache may increase
conflict misses.
- TRUE / FALSE Larger caches decrease both the overall miss rate and access time.
- TRUE / FALSE An advantage of smaller disks (in diameter) is that
the seek time decreases.
- TRUE / FALSE By spinning disks faster we can decrease the rotational
latency of a disk I/O.
- TRUE / FALSE The ethernet is an example of an interconnection network
that uses centralized arbitration.
- TRUE / FALSE In the MIPS procedure call convention, leaf procedures
(those that don't call any other procedures) often don't have to save any
registers.
- TRUE / FALSE The VAX implementation is complicated by the many
addressing modes provided by the the ISA.
- TRUE / FALSE In the BEQ instruction, a branch target may not be
further than 2^15 bytes away.
- TRUE / FALSE Compilers for the MIPS will typically place long-lived
values into caller-saved registers ($T0-$T9) in order to minimize
loads and stores to/from the stack.
- TRUE / FALSE The cycle time of a pipelined machine is determined
by the maximum time required to complete a single stage of the pipeline.
- TRUE / FALSE This statement is false.
Question 10. (4 points)
Explain why TLBs are typically fully associative (or have a high degree of
associativity) rather than direct mapped?