CSE 378 Practice problem solutions for exam 1 (v1.01) -------------------------------------------------------------------- 1. Gate problem First off, we have the NAND gate: NAND truth table A B | Out ------------- 0 0 | 1 0 1 | 1 1 0 | 1 1 1 | 0 ___ A ____| \ B ____| O----- Out |___/ Now we make the INV gate: INV truth table In | Out -------- 0 | 1 1 | 0 |\ In ----| O---- Out |/ Using NAND gate: Just take the wire and put it into both NAND inputs. In ___ ____| \ In --|____| O--- Out In |___/ (Truth table from above, but only options are 0 0 and 1 1) In In | Out ----------- 0 0 | 1 1 1 | 0 Now we make the AND gate: AND truth table A B | Out ------------- 0 0 | 0 0 1 | 0 1 0 | 0 1 1 | 1 ___ A ____| \ B ____| |----- Out |___/ Using NAND gate and INV (since we made it out of NANDs): ___ A ____| \ C |\ B ____| O-----| O---- Out |___/ |/ Notice, this means we will just invert the NAND truth table: A B | C | Out ---------------- 0 0 | 1 | 0 0 1 | 1 | 0 1 0 | 1 | 0 1 1 | 0 | 1 Now we make the OR gate: OR truth table A B | Out ------------- 0 0 | 0 0 1 | 1 1 0 | 1 1 1 | 1 _____ A ----\ \ | >---- Out B ----/____/ The way I see this one, we have a truth table that looks like an upside down NAND. How do we turn that truth table into this one? Well, if we invert both the inputs, suddenly, 0 0 becomes 1 1, 0 1 becomes 1 0, 1 0 becomes 0 1 and 1 1 becomes 0 0! So put an inverter on each input of the NAND gate |\ C A ----| O----| |/ | | ___ |___| \ ____| O----- Out | |___/ |\ | B ----| O----| |/ D A B | C D | Out ---------------------- 0 0 | 1 1 | 0 0 1 | 1 0 | 1 1 0 | 0 1 | 1 1 1 | 0 0 | 1 Finally, the XOR gate: This one's a toughy... XOR Truth Table A B | Out ------------- 0 0 | 0 0 1 | 1 1 0 | 1 1 1 | 0 _____ A ----\\ \ || >---- Out B ----//____/ What I see is that the output of the NAND and the output of the OR are the same in the middle and different on the edges. That is, for 0 1 or 1 0 as inputs, both NAND and OR have a 1 in the output. On the other hand, they differ when the inputs are the same. For 0 0, OR has a 0 and NAND has a 1. For 1 1, OR has a 1 and NAND has a 0. Using this, you should realize that ANDing the outputs of the OR and NAND gates will give you a 1 in the appropriate places. _____ A ----\ \ C | >----| ___ B ----/____/ |------| \ |--| |--- Out ___ |---| |___/ A ____| \ D | B ____| O-----| |___/ A B | C D | Out ---------------------- 0 0 | 0 1 | 0 0 1 | 1 1 | 1 1 0 | 1 1 | 1 1 1 | 1 0 | 0 -------------------------------------------------------------------- 2. bgezal program You'd need to: My solution...yours may vary: 1) Control will output a bgezal signal. 2) You will invert the top bit of the register that is read from the register file. 3) You will and the control signal and the inverted bit from part 2. Call this anded result BAL. 4) You will or the BAL with the current input of the mux selecting what the next PC is on a branch and use the output of this mux as the new bit to select the mux. 5) You will need to add a hardcoded 31 for the register file. 6) You will add a mux selecting between this 31 or the previous input for the write register #. The selector for this mux will be the BAL bit from part 3. 7) You will need to put a mux before the write data input of the register file selecting between the PC+4 (from the PC specific ALU) and the former input. The selector bit will again be BAL from part 3. I believe that is all -------------------------------------------------------------------- 3. printf function in MIPS. Since the problem wasn't stated thoroughly, I will assume that a % followed by anything other than a 'd' or 's' is an error which is specified by returning a "1" in $v0. Otherwise, it will return a "0" in $v0. (I think printf may actually return the number of characters printed, but that would make this problem slightly more involved.) .data buffer: .space 2 .text # Note, the convention here is to have the stack pointing to the last # used word. printf: addi $sp, $sp, -12 # This is *not* saving the sw $a1, 0($sp) # but rather putting them up sw $a2, 4($sp) # against the other arguments sw $a3, 8($sp) # for easier manipulation # in the program. # $t1 will specify which argument we are on (times 4) li $t1, 0 # $t3 will specify where we are in the string. move $t3, $a0 # We will loop until we hit '\0' loop: lb $t0, 0($t3) # load the first character beqz $t0, end # quit if we've hit '\0' bne $t0, '%', normal # process conversion item lb $t0, 1($t3) # load the conversion char bne $t0, 'd', string #### decimal conversion #### decimal: add $t2, $t1, $sp # calculate argument location addi $t1, $t1, 4 # adjust argument offset lw $a0, 0($t2) # load number li $v0, 1 # Print it as an integer syscall # since it was the %d # conversion. addi $t3, $t3, 2 # Prepare for next round... b loop #### string conversion #### string: bne $t0, 's', error # It'd better be an s if we # got this far... add $t2, $t1, $sp # calculate argument location addi $t1, $t1, 4 # adjust argument offset lw $a0, 0($t2) # load string address li $v0, 4 # Print it as a string syscall # since it was the %s # conversion. addi $t3, $t3, 2 # Prepare for next round... b loop #### normal conversion #### normal: la $a0, buffer # Since there is no "print_char" sb $t0, 0($a0) # Must print a one character sb $0, 1($a0) # string. Store '\0' at end. li $v0, 4 # Print one char string... syscall addi $t3, $t3, 1 # Prepare for next round... b loop end: li $v0, 0 jr $ra error: li $v0, 1 jr $ra -------------------------------------------------------------------- 4. Adding R-type lw and sw For lw, it's a trivial change. Instead of selecting the immediate input to the ALU, select the register input. For sw, you need to build a new register file with three read inputs and outputs, do as with lw regarding the ALU inputs, and select the rd output as the input to memory. -------------------------------------------------------------------- 5. MIPS only tells you how many instructions per second. It says nothing about how large the programs are. You could have a MIPS rating of 100 times another machines rating, but the other machine might have 1/1000 the code size and run faster... Clock rate ignores how many instructions are executing in parallel (or more simply, how frequently instructions finish executing...once ever cycle? every 5 cycles?). It also ignores code size. A fast clock rate machine might have a simple instruction set resulting in many more instructions. MFLOPS tells you how many floating point ops there are per second. This is irrelevant if you aren't running an integer program. And again, what if a program has more floating point instructions on this machine than on another with fewer MFLOPs? -------------------------------------------------------------------- 6. The easy way to do this is with a weighted average: Machine 1: .4 * 2 + .25 * 3 + .25 * 3 + .1 * 5 = .8 + .75 + .75 + .5 = 2.8 Machine 2: .4 * 2 + .25 * 2 + .25 * 3 + .1 * 4 = .8 + .5 + .75 + .4 = 2.45 The hard way is this: We have N instructions executing for a program... Class A instructions take up .40N instructions at 2 CPI or .8N cycles. Class B instructions take up .25N instructions at 3 CPI or .75N cycles. Class C instructions take up .25N instructions at 3 CPI or .75N cycles. Class D instructions take up .10N instructions at 5 CPI or .5N cycles. This means a total of 2.8N cycles over N instructions = 2.8CPI Do the same for part B. -------------------------------------------------------------------- 7. Accumulator. Advantage - Uses very little hardware. Disadvantage - Only one register, requiring numerous memory accesses and instructions. Stack. Advantage - Concise instructions, keeping code small (good if you don't have a lot of memory). Disadvantage - Lots of memory accesses. General Purpose Register. Advantage - Fast operations. Disadvantage - Expensive in hardware -------------------------------------------------------------------- 8. This problem was not specified completely - there should also have been some mention of how to return, where to store result. I decided to just not worry about returning and store the result in the accumulator. New instructions: bgtz Address # branch to address. mult Address # Acc = Acc * Memory[Address] .data inc: .word 1 dec: .word -1 tally: .word 1 curr: .word 1 .text fact: bgtz loop # If we are > 0, do the factorial. load inc # Otherwise, deal with base case. Load a 1 bgtz done # and branch (1 > 0, so it's taken) loop: store curr # Store the argument (already in acc) in memory. mult tally # multiply by our current tally store tally # store the new tally load curr # Load our argument again add dec # decrement the argument bgtz loop # branch if we're not at 0 yet... done: load tally This answer is a little bit of a cheat...the whole reason you have an accumulator architecture is to minimize hardware, so it's unlikely there'd ever be a mult, but the point of this problem was thinking about programming in a different architecture, not torture. -------------------------------------------------------------------- 9. The basic premise of this problem is that you are actually writing the code space. In particular, the first instruction of the loop is changing on every cycle. The second register in the bne instruction keeps changing (though the value within is 0 each time) until it is finally $s0 (the first register after $t7). Then the branch condition is true and it branches to next. The value in $t1 should be 0x00050000. -------------------------------------------------------------------- 10. pascal: addi $sp, $sp, -16 #alloc space on stack, store $ra + 3 s regs sw $ra, 12($sp) sw $s0, 8($sp) sw $s1, 4($sp) sw $s2, 0($sp) beq $a1, 1, basecase #check for the two base cases beq $a0, $a1, basecase addi $a0, $a0, -1 #a0 = row - 1 add $s0, $a0, $0 #copy row - 1 to $s0 addi $s1, $a1, -1 #copy col - 1 to $s1 jal pascal #call pascal (args: $a0 = row - 1, $a1 = col) add $s2, $v0, $0 #save result of 1st call to $s2 add $a0, $s0, $0 #load args for second call add $a1, $s1, $0 # jal pascal #second call (args: row-1, col-1) add $v0, $v0, $s2 #cumpute sum j restore #restore $s regs, $ra, and go back basecase: addi $v0, $0, 1 #base case -- ret value is 1 restore: lw $ra, 12($sp) #restore $ra, s registers lw $s0, 8($sp) lw $s1, 4($sp) lw $s2, 0($sp) addi $sp, $sp, 16 #restore $sp jr $ra #go back -------------------------------------------------------------------- 11. (a) 0x20C89000 = 0010 0000 1100 1000 1001 0000 0000 0000 (b) 2^29 + 2^23 + 2^22 + 2^19 + 2^15 + 2^12 (c) opcode = 001000 base2 = 8, so this is addi rs = 00110 base2 = 6 rt = 01000 base2 = 8 immed = 1001 0000 0000 0000 note that this is a negative number = -2^15 + 2^12 = -28762 instruction: addi $8, $6, -28762 -------------------------------------------------------------------- 12. (a) 1100 0000 1111 1111 1110 1110 = 0xC0FFEE (hex) (b) 110 000 001 111 111 111 101 110 = 60177756 (oct) -------------------------------------------------------------------- 13. (a) unsigned 11100 = 2^4 + 2^3 + 2^2 = 16 + 8 + 4 = 28 (b) 2's comp 11100 = -2^4 + 2^3 + 2^2 = -16 + 8 + 4 = -4 or invert all the bits: 00011 add 1: 00100 convert to dec: = 4 put a "-" in front = -4 -------------------------------------------------------------------- 14. srl $t0, $s0, 9 #shift s0 right by 9, so that bits 17-11 #are now bits 8-2 andi $t0, $t0, 0x1FC #zero out everything but bits 8-2 in t0 andi $s1, $s1, 0xFFFFFE03 #zero out bits 8-2 of s1 or $s1, $t0, $s1 #or with $t1 .. done -------------------------------------------------------------------- 15. garply: addi $sp, $sp, -12 #we're storing three things on the stack sw $ra, 8($sp) #need to store $ra since we're calling another procedure sw $s0, 4($sp) #also need to copy our args (a0 and a1) to s registers sw $s1, 0($sp) #before procedure call, hence need two s registers #save args: a0 to s0, a1 to s1 move $s0, $a0 move $s1, $a1 #put a>>6 into a0 and call subt srl $a0, $a0, 6 jal subt #back from subt, subt's return value (c) is in v0 lw $t0, 0($s1) #load *b add $t0, $t0, $s0 #add a sw $t0, 0($s1) #store the result back #if a or c < 0 we're done, as c is already in v0 bltz $s0, restore bltz $v0, restore #else put c | a into v0 .. c is in v0, a is in s0 or $v0, $v0, $s0 restore: #restore $s regs, $ra, $sp and go back lw $ra, 8($sp) lw $s0, 4($sp) lw $s1, 0($sp) addi $sp, $sp, 12 jr $ra -------------------------------------------------------------------- 16. Jal. Do the same thing with PC as for jump. To save return address need to add two muxes: one for RW (write register# in the register file), that selects between RD field of the instruction (bits 5,4) and register #3 (=$ra in our design). 3 should be selected for JAL, RD for everything else. second mux goes before the Data input to the register file. This one chooses between the ALU/MEM result and the post-incremented PC. Post-incremented PC should be selected for JAL, ALU/MEM result for all other instructions. also make sure the write enable for the register file is set to 1 -------------------------------------------------------------------- 17. lw: We want both ALU arguments to come from the register file. So just change the ALUSrc control signal to let the register file value through. sw: Here we have to add a third read port to the register file. As with lw, want both inputs to the ALU to come from the register file now, so change the ALUSrc control signal to let the register file value through. --------------------------------------------------------------------