Homework #8: Dependences & ILP
Due: Friday, 3/9/2007, 10:30 AM
This assignment should be done individually, not in teams.
Assignment Overview
This assignment asks you to determine legal parallel execution schedules for
a short execution sequence of machine instructions.
To do this, you must first determine the dependences that exist between
instructions. Then you "hand simulate" a parallel execution
that respects those dependences, under the assumption that there are
infinite functional units (i.e., no structural hazards): at each cycle
you issue all instructions that have no dependences on any as yet unexecuted
instructions.
To keep things simple, we assume that each instructions fetches its operands,
executes, and then writes the result register all in a single cycle.
(Writing occurs as the last thing done in the cycle.)
You should do two simulations, one intended to approximate
the properties of the scoreboarding algorithm, and the other Tomasulo's algorithm.
In particular, in the first simulation
an instruction cannot issue iff it is the later instruction involved
in a RAW, WAR, or WAW dependence.
In the second, an instruction cannot issue iff it is the later instruction involved
in a RAW instruction.
We assume you'll solve this problem by hand - i.e., no lab, no programming.
Turnin will be online, though, as explained below.
The Instruction Sequence
Here is the code you need to parallelize. It is assembler, but with two annotations:
- Comments that relate the assembler back to a C program it was derived from.
Strictly stpeaking, the comments and the C program are completely irrelevant to this
assignment - you're parallelizing the execution sequence described by the assembler;
the C code is just to show that this isn't completely contrived.
- Each instruction has a label. We'll use those labels when describing the parallel
schedule, as described in the next section.
Here is the instruction stream to be parallelized:
//------------------------------------
// Assembly code, ignoring stack operations
//------------------------------------
// lp = 0x20000000
01: lui $t4, $0, 0x2000
// readptr = buff;
02: ori $s0, 0, 0x00004060
// tmp1 = *readptr;
03: lw $t5, 0($s0)
// *readptr++;
04: add $s0, $s0, 4
// tmp2 = *readptr;
05: lw $t6, 0($s0)
// *readptr = tmp2 + ((lp * (tmp1 - tmp2)) >> 32);
06: sub $t7, $t5, $t6
07: mult $t7, $t4
08: mfhi $t7
09: add $t7, $t7, $t6
10: sw $t7, 0($s0)
// tmp1 = tmp2;
11: addi $t5, $t6, 0
// *readptr++;
12: add $s0, $s0, 4
// tmp2 = *readptr;
13: lw $t6, 0($s0)
// *readptr = tmp2 + ((lp * (tmp1 - tmp2)) >> 32);
14: sub $t7, $t5, $t6
15: mult $t7, $t4
16: mfhi $t7
17: add $t7, $t7, $t6
18: sw $t7, 0($s0)
// tmp1 = tmp2;
19: addi $t5, $t6, 0
// *readptr++;
20: add $s0, $s0, 4
// tmp2 = *readptr;
21: lw $t6, 0($s0)
// *readptr = tmp2 + ((lp * (tmp1 - tmp2)) >> 32);
22: sub $t7, $t5, $t6
23: mult $t7, $t4
24: mfhi $t7
25: add $t7, $t7, $t6
26: sw $t7, 0($s0)
// return (*readptr * gain) >> 32;
27: mult $t7, $a0
28: mfhi $v0
Although irrelevant to this assignment, that execution sequence
is based on the following C code implementing a lowpass filter:
//------------------------------------
// C code for a 3-pole fixed point lowpass filter
//------------------------------------
int buff[4]; // sample, lpbuff1, lpbuff2, lpbuff3
int do_lowpass(int gain){
int* readptr;
int tmp1, tmp2;
int lp = 0x20000000 // sets filter frequency
readptr = buff;
// Retrieve values for filter 1 calculation
tmp1 = *readptr;
*readptr++;
tmp2 = *readptr;
// Filter 1 calculation
*readptr = tmp2 + ((lp * (tmp1 - tmp2)) >> 32);
// Retrieve values for filter 2 calculation
tmp1 = tmp2;
*readptr++;
tmp2 = *readptr;
// Filter 2 calculation
*readptr = tmp2 + ((lp * (tmp1 - tmp2)) >> 32);
// Retrieve values for filter 3 calculation
tmp1 = tmp2;
*readptr++;
tmp2 = *readptr;
// Filter 3 calculation
*readptr = tmp2 + ((lp * (tmp1 - tmp2)) >> 32);
// Modify result by output gain and return;
return (*readptr * gain)>>32;
}
Turnin
Use turnin to hand in two files, alldep.txt and flowdep.txt , the former describing your schedule when all of RAW, WAR, and WAW dependences are used,
and the latter when only RAW dependences prevent instruction issue.
Each file should be encoded with a single line per instruction. Each line contains the label
of the instruction followed by the cycle number during which the instruction executes,
assuming the first cycle is cycle 1 (i.e., not cycle 0).
For instance:
01: 1
02: 2
03: 3
04: 1
05: 2
would mean that instructions 1 and 4 executed during the first cycle, 2 and 5 during the second,
and 3 during the third.
|