CSE 378: Machine Organization, Winter 2007

University of Washington Department of Computer Science & Engineering

CSE 378, Winter 2007
Machine Organization and Assembly Language Programming

CSE Home

About Us

Contact Info

Homework #8: Dependences & ILP

Due: Friday, 3/9/2007, 10:30 AM

This assignment should be done individually, not in teams.

Assignment Overview

This assignment asks you to determine legal parallel execution schedules for a short execution sequence of machine instructions. To do this, you must first determine the dependences that exist between instructions. Then you "hand simulate" a parallel execution that respects those dependences, under the assumption that there are infinite functional units (i.e., no structural hazards): at each cycle you issue all instructions that have no dependences on any as yet unexecuted instructions.
To keep things simple, we assume that each instructions fetches its operands, executes, and then writes the result register all in a single cycle. (Writing occurs as the last thing done in the cycle.)
You should do two simulations, one intended to approximate the properties of the scoreboarding algorithm, and the other Tomasulo's algorithm. In particular, in the first simulation an instruction cannot issue iff it is the later instruction involved in a RAW, WAR, or WAW dependence. In the second, an instruction cannot issue iff it is the later instruction involved in a RAW instruction.
We assume you'll solve this problem by hand - i.e., no lab, no programming. Turnin will be online, though, as explained below.

The Instruction Sequence
Here is the code you need to parallelize. It is assembler, but with two annotations:

Comments that relate the assembler back to a C program it was derived from. Strictly stpeaking, the comments and the C program are completely irrelevant to this assignment - you're parallelizing the execution sequence described by the assembler; the C code is just to show that this isn't completely contrived.

Each instruction has a label. We'll use those labels when describing the parallel schedule, as described in the next section.

Here is the instruction stream to be parallelized:
//------------------------------------
// Assembly code, ignoring stack operations
//------------------------------------

// lp = 0x20000000 
01: lui 	$t4, $0, 0x2000

// readptr = buff;
02: ori 	$s0, 0, 0x00004060

// tmp1 = *readptr;
03: lw 	$t5, 0($s0)

// *readptr++;
04: add 	$s0, $s0, 4

// tmp2 = *readptr;
05: lw 	$t6, 0($s0)

// *readptr = tmp2 + ((lp * (tmp1 - tmp2)) >> 32);
06: sub 	$t7, $t5, $t6
07: mult 	$t7, $t4
08: mfhi	$t7
09: add 	$t7, $t7, $t6
10: sw 	$t7, 0($s0)


// tmp1 = tmp2;
11: addi 	$t5, $t6, 0


// *readptr++;
12: add 	$s0, $s0, 4

// tmp2 = *readptr;
13: lw 	$t6, 0($s0)

// *readptr = tmp2 + ((lp * (tmp1 - tmp2)) >> 32);
14: sub 	$t7, $t5, $t6
15: mult 	$t7, $t4
16: mfhi 	$t7
17: add 	$t7, $t7, $t6
18: sw 	$t7, 0($s0)


// tmp1 = tmp2;
19: addi 	$t5, $t6, 0

// *readptr++;
20: add 	$s0, $s0, 4

// tmp2 = *readptr;
21: lw 	$t6, 0($s0)

// *readptr = tmp2 + ((lp * (tmp1 - tmp2)) >> 32);
22: sub 	$t7, $t5, $t6
23: mult 	$t7, $t4
24: mfhi 	$t7
25: add 	$t7, $t7, $t6
26: sw 	$t7, 0($s0)

// return (*readptr * gain) >> 32;
27: mult 	$t7, $a0
28: mfhi 	$v0
Although irrelevant to this assignment, that execution sequence is based on the following C code implementing a lowpass filter:
//------------------------------------
// C code for a 3-pole fixed point lowpass filter
//------------------------------------
int buff[4]; // sample, lpbuff1, lpbuff2, lpbuff3
int do_lowpass(int gain){

  int* readptr;
  int tmp1, tmp2;
  int lp = 0x20000000 // sets filter frequency

  readptr = buff;

  // Retrieve values for filter 1 calculation
  tmp1 = *readptr;
  *readptr++;
  tmp2 = *readptr;

  // Filter 1 calculation
  *readptr = tmp2 + ((lp * (tmp1 - tmp2)) >> 32);

  // Retrieve values for filter 2 calculation
  tmp1 = tmp2;
  *readptr++;
  tmp2 = *readptr;

  // Filter 2 calculation
  *readptr = tmp2 + ((lp * (tmp1 - tmp2)) >> 32);


  // Retrieve values for filter 3 calculation
  tmp1 = tmp2;
  *readptr++;
  tmp2 = *readptr;

  // Filter 3 calculation
  *readptr = tmp2 + ((lp * (tmp1 - tmp2)) >> 32);

  // Modify result by output gain and return;
  return (*readptr * gain)>>32;
}
Turnin
Use turnin to hand in two files, alldep.txt and flowdep.txt, the former describing your schedule when all of RAW, WAR, and WAW dependences are used, and the latter when only RAW dependences prevent instruction issue.
Each file should be encoded with a single line per instruction. Each line contains the label of the instruction followed by the cycle number during which the instruction executes, assuming the first cycle is cycle 1 (i.e., not cycle 0).
For instance:
01: 1
02: 2
03: 3
04: 1
05: 2
would mean that instructions 1 and 4 executed during the first cycle, 2 and 5 during the second, and 3 during the third.
This work is licensed under a Creative Commons Attribution-Share Alike 2.5 License. Department of Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA 98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to zahorjan]


This work is licensed under a Creative Commons Attribution-Share Alike 2.5 License.	Department of Computer Science & Engineering University of Washington Box 352350 Seattle, WA 98195-2350 (206) 543-1695 voice, (206) 543-2969 FAX [comments to zahorjan]