CSE 378 Practice problem solutions for exam 1  (v1.01)

--------------------------------------------------------------------
1.  Gate problem

First off, we have the NAND gate:

NAND truth table

 A  B  |  Out
-------------
 0  0  |   1
 0  1  |   1
 1  0  |   1
 1  1  |   0

       ___
A ____|   \
B ____|    O-----  Out
      |___/


Now we make the INV gate:

INV truth table

In | Out
--------
 0 |  1
 1 |  0

       |\
In ----| O---- Out
       |/


Using NAND gate:
Just take the wire and put it into both NAND inputs.

      In   ___
      ____|   \
In --|____|    O--- Out
      In  |___/

(Truth table from above, but only options are 0 0 and 1 1)

In In | Out
-----------
 0  0 |  1
 1  1 |  0


Now we make the AND gate:

AND truth table

 A  B  |  Out
-------------
 0  0  |  0
 0  1  |  0
 1  0  |  0
 1  1  |  1

       ___
A ____|   \
B ____|    |----- Out
      |___/

Using NAND gate and INV (since we made it out of NANDs):

       ___       
A ____|   \  C   |\
B ____|    O-----| O---- Out
      |___/      |/


Notice, this means we will just invert the NAND truth table:

 A  B  |  C  |  Out
----------------
 0  0  |  1  |  0
 0  1  |  1  |  0
 1  0  |  1  |  0
 1  1  |  0  |  1


Now we make the OR gate:

OR truth table

 A  B  |  Out
-------------
 0  0  |  0
 0  1  |  1
 1  0  |  1
 1  1  |  1

      _____
A ----\    \
       |    >---- Out
B ----/____/

The way I see this one, we have a truth table that looks like an
upside down NAND.  How do we turn that truth table into this one?
Well, if we invert both the inputs, suddenly, 0 0 becomes 1 1, 0 1
becomes 1 0, 1 0 becomes 0 1 and 1 1 becomes 0 0!  So put an inverter
on each input of the NAND gate


      |\     C
A ----| O----|
      |/     |
             |    ___
             |___|   \
             ____|    O----- Out
             |   |___/
      |\     |
B ----| O----|
      |/     D


 A  B  |  C  D  |  Out
----------------------
 0  0  |  1  1  |  0
 0  1  |  1  0  |  1
 1  0  |  0  1  |  1
 1  1  |  0  0  |  1


Finally, the XOR gate:

This one's a toughy...

XOR Truth Table

 A  B  |  Out
-------------
 0  0  |  0
 0  1  |  1
 1  0  |  1
 1  1  |  0

       _____
A ----\\    \
       ||    >---- Out
B ----//____/

What I see is that the output of the NAND and the output of the OR are
the same in the middle and different on the edges.  That is, for 0 1
or 1 0 as inputs, both NAND and OR have a 1 in the output.  On the
other hand, they differ when the inputs are the same.  For 0 0, OR has
a 0 and NAND has a 1.  For 1 1, OR has a 1 and NAND has a 0.

Using this, you should realize that ANDing the outputs of the OR and
NAND gates will give you a 1 in the appropriate places.

      _____
A ----\    \  C
       |    >----|       ___
B ----/____/     |------|   \  
                     |--|    |--- Out
       ___       |---|  |___/
A ____|   \   D  |
B ____|    O-----|
      |___/


 A  B  |  C  D  |  Out
----------------------
 0  0  |  0  1  |  0
 0  1  |  1  1  |  1
 1  0  |  1  1  |  1
 1  1  |  1  0  |  0

--------------------------------------------------------------------

2. bgezal program

You'd need to:

My solution...yours may vary:

1) Control will output a bgezal signal.
2) You will invert the top bit of the register that is read from the
register file.
3) You will and the control signal and the inverted bit from part 2.
Call this anded result BAL.
4) You will or the BAL with the current input of the mux selecting
what the next PC is on a branch and use the output of this mux as the
new bit to select the mux.
5) You will need to add a hardcoded 31 for the register file.
6) You will add a mux selecting between this 31 or the previous input
for the write register #.  The selector for this mux will be the BAL
bit from part 3.
7) You will need to put a mux before the write data input of the
register file selecting between the PC+4 (from the PC specific ALU)
and the former input.  The selector bit will again be BAL from part 3.

I believe that is all

--------------------------------------------------------------------

3. printf function in MIPS.

Since the problem wasn't stated thoroughly, I will assume that a %
followed by anything other than a 'd' or 's' is an error which is
specified by returning a "1" in $v0.  Otherwise, it will return a "0"
in $v0.  (I think printf may actually return the number of characters
printed, but that would make this problem slightly more involved.)


	 .data
buffer:	 .space 2

	 .text

# Note, the convention here is to have the stack pointing to the last
# used word.

printf:
	 addi	     $sp, $sp, -12	  # This is *not* saving the
	 sw	     $a1, 0($sp)	  # but rather putting them up
	 sw	     $a2, 4($sp)	  # against the other arguments
	 sw	     $a3, 8($sp)	  # for easier manipulation
					  # in the program.

# $t1 will specify which argument we are on (times 4)
	 li	     $t1, 0

# $t3 will specify where we are in the string.
         move        $t3, $a0

# We will loop until we hit '\0'
loop:	 lb	     $t0, 0($t3)	  # load the first character
	 beqz	     $t0, end		  # quit if we've hit '\0'
	 bne	     $t0, '%', normal	  # process conversion item

	 lb	     $t0, 1($t3)	  # load the conversion char

	 bne	     $t0, 'd', string

#### decimal conversion ####
decimal:
	 add	     $t2, $t1, $sp	  # calculate argument location
	 addi	     $t1, $t1, 4	  # adjust argument offset
	 lw	     $a0, 0($t2)	  # load number
	 
	 li	     $v0, 1		  # Print it as an integer
	 syscall			  # since it was the %d
					  # conversion.
         addi        $t3, $t3, 2	  # Prepare for next round...
	 b	     loop

#### string conversion ####
string:
	 bne	     $t0, 's', error      # It'd better be an s if we
					  # got this far...
         add         $t2, $t1, $sp	  # calculate argument location
	 addi	     $t1, $t1, 4	  # adjust argument offset
	 lw	     $a0, 0($t2)	  # load string address

	 li	     $v0, 4		  # Print it as a string
	 syscall			  # since it was the %s
					  # conversion.
         addi	     $t3, $t3, 2	  # Prepare for next round...
	 b	     loop

#### normal conversion ####
normal:
	 la	     $a0, buffer	  # Since there is no "print_char"
	 sb	     $t0, 0($a0)	  # Must print a one character
	 sb	     $0, 1($a0)		  # string.  Store '\0' at end.

	 li	     $v0, 4		  # Print one char string...
	 syscall

	 addi	     $t3, $t3, 1	  # Prepare for next round...
	 b	     loop

end:	 li	     $v0, 0
	 jr	     $ra

error:	 li	     $v0, 1
	 jr	     $ra

--------------------------------------------------------------------

4. Adding R-type lw and sw

For lw, it's a trivial change.  Instead of selecting the immediate
input to the ALU, select the register input.

For sw, you need to build a new register file with three read inputs
and outputs, do as with lw regarding the ALU inputs, and select the rd
output as the input to memory.

--------------------------------------------------------------------

5.  MIPS only tells you how many instructions per second.  It says
    nothing about how large the programs are.  You could have a MIPS
    rating of 100 times another machines rating, but the other machine
    might have 1/1000 the code size and run faster...

    Clock rate ignores how many instructions are executing in
    parallel (or more simply, how frequently instructions finish
    executing...once ever cycle?  every 5 cycles?).  It also ignores
    code size.  A fast clock rate machine might have a simple
    instruction set resulting in many more instructions.

    MFLOPS tells you how many floating point ops there are per second.
    This is irrelevant if you aren't running an integer program.  And
    again, what if a program has more floating point instructions on
    this machine than on another with fewer MFLOPs?

--------------------------------------------------------------------

6.  The easy way to do this is with a weighted average:

Machine 1:
.4 * 2 + .25 * 3 + .25 * 3 + .1 * 5 = .8 + .75 + .75 + .5 = 2.8

Machine 2:
.4 * 2 + .25 * 2 + .25 * 3 + .1 * 4 = .8 + .5 + .75 + .4 = 2.45

The hard way is this:

We have N instructions executing for a program...
Class A instructions take up .40N instructions at 2 CPI or .8N cycles.
Class B instructions take up .25N instructions at 3 CPI or .75N
cycles.
Class C instructions take up .25N instructions at 3 CPI or .75N
cycles.
Class D instructions take up .10N instructions at 5 CPI or .5N cycles.

This means a total of 2.8N cycles over N instructions = 2.8CPI

Do the same for part B.

--------------------------------------------------------------------

7.  Accumulator.
    Advantage - Uses very little hardware.
    Disadvantage - Only one register, requiring numerous memory
    accesses and instructions.

    Stack.
    Advantage - Concise instructions, keeping code small (good if
    you don't have a lot of memory).
    Disadvantage - Lots of memory accesses.

    General Purpose Register.
    Advantage -	Fast operations.
    Disadvantage - Expensive in hardware

--------------------------------------------------------------------

8.  This problem was not specified completely - there should also have
    been some mention of how to return, where to store result.  I
    decided to just not worry about returning and store the result in
    the accumulator.

    New instructions:

    bgtz Address      # branch to address.
    mult Address      # Acc = Acc * Memory[Address]


    .data
inc:	.word 1
dec:	.word -1
tally:	.word 1
curr:	.word 1

    .text
fact:
	bgtz	loop	# If we are > 0, do the factorial.
	load	inc	# Otherwise, deal with base case.  Load a 1
	bgtz	done	# and branch (1 > 0, so it's taken)
loop:
	store	curr	# Store the argument (already in acc) in memory.
	mult	tally	# multiply by our current tally
	store	tally	# store the new tally
	load	curr	# Load our argument again
	add	dec	# decrement the argument
	bgtz	loop	# branch if we're not at 0 yet...
done:
	load	tally

This answer is a little bit of a cheat...the whole reason you have an
accumulator architecture is to minimize hardware, so it's unlikely
there'd ever be a mult, but the point of this problem was thinking
about programming in a different architecture, not torture.

--------------------------------------------------------------------

9. The basic premise of this problem is that you are actually writing
   the code space.  In particular, the first instruction of the loop
   is changing on every cycle.  The second register in the bne
   instruction keeps changing (though the value within is 0 each time)
   until it is finally $s0 (the first register after $t7).  Then the
   branch condition is true and it branches to next.  The value in $t1
   should be 0x00050000.

--------------------------------------------------------------------

10.

pascal:
        addi $sp, $sp, -16      #alloc space on stack, store $ra + 3 s regs
        sw $ra, 12($sp)
        sw $s0, 8($sp)
        sw $s1, 4($sp)
        sw $s2, 0($sp)
        beq $a1, 1, basecase    #check for the two base cases
        beq $a0, $a1, basecase
        addi $a0, $a0, -1       #a0 = row - 1
        add $s0, $a0, $0        #copy row - 1 to $s0
        addi $s1, $a1, -1       #copy col - 1 to $s1
        jal pascal              #call pascal (args: $a0 = row - 1, $a1 = col)
        add $s2, $v0, $0        #save result of 1st call to $s2
        add $a0, $s0, $0        #load args for second call
        add $a1, $s1, $0        #
        jal pascal              #second call (args: row-1, col-1)
        add $v0, $v0, $s2       #cumpute sum
        j restore               #restore $s regs, $ra, and go back
basecase:
        addi $v0, $0, 1         #base case -- ret value is 1
restore:
        lw $ra, 12($sp)		#restore $ra, s registers
        lw $s0, 8($sp)
        lw $s1, 4($sp)
        lw $s2, 0($sp)
        addi $sp, $sp, 16 	#restore $sp
        jr $ra			#go back


--------------------------------------------------------------------


11.

(a)  0x20C89000 = 0010 0000 1100 1000 1001 0000 0000 0000

(b)  2^29 + 2^23 + 2^22 + 2^19 + 2^15 + 2^12

(c)  opcode = 001000 base2 = 8, so this is addi
     rs = 00110 base2 = 6
     rt = 01000 base2 = 8
     immed = 1001 0000 0000 0000

	note that this is a negative number = -2^15 + 2^12 = -28762

	instruction:  addi $8, $6, -28762

--------------------------------------------------------------------

12.

(a) 1100 0000 1111 1111 1110 1110 = 0xC0FFEE (hex)

(b) 110 000 001 111 111 111 101 110 = 60177756 (oct)

--------------------------------------------------------------------

13.

(a) unsigned 11100 = 2^4 + 2^3 + 2^2 = 16 + 8 + 4 = 28

(b) 2's comp 11100 = -2^4 + 2^3 + 2^2 = -16 + 8 + 4 = -4

or invert all the bits:  00011
add 1:			 00100
convert to dec:          = 4
put a "-" in front	 = -4


--------------------------------------------------------------------

14.

srl  $t0, $s0, 9	#shift s0 right by 9, so that bits 17-11
			#are now bits 8-2
andi $t0, $t0, 0x1FC  	#zero out everything but bits 8-2 in t0
andi $s1, $s1, 0xFFFFFE03  #zero out bits 8-2 of s1
or   $s1, $t0, $s1	#or with $t1 .. done

--------------------------------------------------------------------

15.

garply:
	addi 	$sp, $sp, -12  #we're storing three things on the stack
	sw   	$ra, 8($sp)    #need to store $ra since we're calling another procedure
	sw   	$s0, 4($sp)    #also need to copy our args (a0 and a1) to s registers
	sw	$s1, 0($sp)    #before procedure call, hence need two s registers

	#save args:  a0 to s0, a1 to s1
	move	$s0, $a0
	move 	$s1, $a1

	#put a>>6 into a0 and call subt
	srl	$a0, $a0, 6
	jal	subt

	#back from subt, subt's return value (c) is in v0
	lw	$t0, 0($s1)	#load *b
	add	$t0, $t0, $s0	#add a
	sw	$t0, 0($s1)	#store the result back

	#if a or c < 0 we're done, as c is already in v0
	bltz	$s0, restore
	bltz	$v0, restore

	#else put c | a into v0 .. c is in v0, a is in s0
	or	$v0, $v0, $s0
restore:
	#restore $s regs, $ra, $sp and go back
	lw      $ra, 8($sp)
        lw      $s0, 4($sp)
        lw      $s1, 0($sp)
	addi	$sp, $sp, 12
	jr 	$ra

--------------------------------------------------------------------

16.

Jal.  Do the same thing with PC as for jump.

To save return address need to add two muxes:

        one for RW (write register# in the register file), that
selects between RD field of the instruction (bits 5,4) and register
#3 (=$ra in our design).  3 should be selected for JAL, RD for everything
else.

        second mux goes before the Data input to the register file.  This
one chooses between the ALU/MEM result and the post-incremented PC.
Post-incremented PC should be selected for JAL, ALU/MEM result for
all other instructions.

        also make sure the write enable for the register file is set to 1

--------------------------------------------------------------------

17.

lw:

We want both ALU arguments to come from the register file.  So just change
the ALUSrc control signal to let the register file value through.

sw:

Here we have to add a third read port to the register file.  As with lw,
want both inputs to the ALU to come from the register file now, so change
the ALUSrc control signal to let the register file value through.

--------------------------------------------------------------------