CSE logo University of Washington Department of Computer Science & Engineering
 CSE 378, Winter 2007
 Machine Organization and Assembly Language Programming
  CSE Home  About Us    Search    Contact Info 

 Academic (Mis)Conduct
 Course wiki
 Mailing list
 Lectures and readings
 Lab hours
 Your Grades
Anonymous Feedback
 Submit Feedback
 Read Feedback
Printable view

Homework #8: Dependences & ILP

Due: Friday, 3/9/2007, 10:30 AM

This assignment should be done individually, not in teams.

Assignment Overview

This assignment asks you to determine legal parallel execution schedules for a short execution sequence of machine instructions. To do this, you must first determine the dependences that exist between instructions. Then you "hand simulate" a parallel execution that respects those dependences, under the assumption that there are infinite functional units (i.e., no structural hazards): at each cycle you issue all instructions that have no dependences on any as yet unexecuted instructions.

To keep things simple, we assume that each instructions fetches its operands, executes, and then writes the result register all in a single cycle. (Writing occurs as the last thing done in the cycle.)

You should do two simulations, one intended to approximate the properties of the scoreboarding algorithm, and the other Tomasulo's algorithm. In particular, in the first simulation an instruction cannot issue iff it is the later instruction involved in a RAW, WAR, or WAW dependence. In the second, an instruction cannot issue iff it is the later instruction involved in a RAW instruction.

We assume you'll solve this problem by hand - i.e., no lab, no programming. Turnin will be online, though, as explained below.

The Instruction Sequence

Here is the code you need to parallelize. It is assembler, but with two annotations:
  • Comments that relate the assembler back to a C program it was derived from. Strictly stpeaking, the comments and the C program are completely irrelevant to this assignment - you're parallelizing the execution sequence described by the assembler; the C code is just to show that this isn't completely contrived.

  • Each instruction has a label. We'll use those labels when describing the parallel schedule, as described in the next section.

Here is the instruction stream to be parallelized:

// Assembly code, ignoring stack operations

// lp = 0x20000000 
01: lui 	$t4, $0, 0x2000

// readptr = buff;
02: ori 	$s0, 0, 0x00004060

// tmp1 = *readptr;
03: lw 	$t5, 0($s0)

// *readptr++;
04: add 	$s0, $s0, 4

// tmp2 = *readptr;
05: lw 	$t6, 0($s0)

// *readptr = tmp2 + ((lp * (tmp1 - tmp2)) >> 32);
06: sub 	$t7, $t5, $t6
07: mult 	$t7, $t4
08: mfhi	$t7
09: add 	$t7, $t7, $t6
10: sw 	$t7, 0($s0)

// tmp1 = tmp2;
11: addi 	$t5, $t6, 0

// *readptr++;
12: add 	$s0, $s0, 4

// tmp2 = *readptr;
13: lw 	$t6, 0($s0)

// *readptr = tmp2 + ((lp * (tmp1 - tmp2)) >> 32);
14: sub 	$t7, $t5, $t6
15: mult 	$t7, $t4
16: mfhi 	$t7
17: add 	$t7, $t7, $t6
18: sw 	$t7, 0($s0)

// tmp1 = tmp2;
19: addi 	$t5, $t6, 0

// *readptr++;
20: add 	$s0, $s0, 4

// tmp2 = *readptr;
21: lw 	$t6, 0($s0)

// *readptr = tmp2 + ((lp * (tmp1 - tmp2)) >> 32);
22: sub 	$t7, $t5, $t6
23: mult 	$t7, $t4
24: mfhi 	$t7
25: add 	$t7, $t7, $t6
26: sw 	$t7, 0($s0)

// return (*readptr * gain) >> 32;
27: mult 	$t7, $a0
28: mfhi 	$v0

Although irrelevant to this assignment, that execution sequence is based on the following C code implementing a lowpass filter:

// C code for a 3-pole fixed point lowpass filter
int buff[4]; // sample, lpbuff1, lpbuff2, lpbuff3
int do_lowpass(int gain){

  int* readptr;
  int tmp1, tmp2;
  int lp = 0x20000000 // sets filter frequency

  readptr = buff;

  // Retrieve values for filter 1 calculation
  tmp1 = *readptr;
  tmp2 = *readptr;

  // Filter 1 calculation
  *readptr = tmp2 + ((lp * (tmp1 - tmp2)) >> 32);

  // Retrieve values for filter 2 calculation
  tmp1 = tmp2;
  tmp2 = *readptr;

  // Filter 2 calculation
  *readptr = tmp2 + ((lp * (tmp1 - tmp2)) >> 32);

  // Retrieve values for filter 3 calculation
  tmp1 = tmp2;
  tmp2 = *readptr;

  // Filter 3 calculation
  *readptr = tmp2 + ((lp * (tmp1 - tmp2)) >> 32);

  // Modify result by output gain and return;
  return (*readptr * gain)>>32;


Use turnin to hand in two files, alldep.txt and flowdep.txt, the former describing your schedule when all of RAW, WAR, and WAW dependences are used, and the latter when only RAW dependences prevent instruction issue.

Each file should be encoded with a single line per instruction. Each line contains the label of the instruction followed by the cycle number during which the instruction executes, assuming the first cycle is cycle 1 (i.e., not cycle 0).

For instance:

01: 1
02: 2
03: 3
04: 1
05: 2
would mean that instructions 1 and 4 executed during the first cycle, 2 and 5 during the second, and 3 during the third.

Creative Commons License
This work is licensed under a Creative Commons Attribution-Share Alike 2.5 License.
Department of Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to zahorjan]