Homework 6 solutions

The Table

 Opcode Xi Yi Zi C0 Fi Ci+1 N C Z V ADD Ai Bi Ci 0 Si Zi+1 F7 C8 nor(F0..F7) C8xorX7 INC Ai 0 Ci 1 Si Zi+1 F7 C8 nor(F0..F7) C8xorX7 DEC Ai 1 Ci 0 Si Zi+1 F7 C8 nor(F0..F7) C8xorX7 SUB Ai ~Bi Ci 1 Si Zi+1 F7 C8 nor(F0..F7) C8xorX7 CMP Ai ~Bi Ci 1 Si Zi+1 F7 C8 nor(F0..F7) C8xorX7 PASS Ai 0 0 x Si x F7 C8/0 nor(F0..F7) C8xorX7/0 NEG ~Ai 0 Ci 1 Si Zi+1 F7 x nor(F0..F7) C8xorX7 XOR Ai Bi 0 x Si x F7 0 nor(F0..F7) 0 XNOR Ai ~Bi 0 x Si x F7 0 nor(F0..F7) 0 NOT ~Ai 0 0 x Si x F7 0 nor(F0..F7) 0 AND Ai Bi 0 x Zi+1 x F7 0 nor(F0..F7) 0 OR Ai Bi 1 x Zi+1 x F7 0 nor(F0..F7) 0 SHL x x x x Ai-1 x F7 A7 nor(F0..F7) 0 SHR x x x x Ai+1 x F7 0 nor(F0..F7) 0

Design Notes

• Implement AND by setting Ci to 0 and selecting the result from Zi+1 (Carry out of the full adder). Zi+1 = XiYi+ZiYi+ZiXi.  If we set Xi to 0, then Zi+1 = XiYi
• Implement AND by setting Xi to 1 and selecting the result from Zi+1. Zi+1 = XiYi+ZiYi+ZiXi.  If we set Xi to 1, then Zi+1 = XiYi + (Xi+Yi) = Xi+Yi.
•  Note that PASS looks more like a logical operation than like an arithmetic operation. If we think of it this way, the control lines get a little easier to optimize. C and V for PASS will always be zero either way, so no need to worry about control lines for those cases
• I decided to implement SHL and SHR by bypassing the FA completely, using a 4:1 multiplexor on the output. This way, most of the control lines will be don't cares for these two instructions.
• Whenever Zi is set to 1 or 0, Ci+1, Zi, and C0 are don't cares.

Control Line Definitions:

• s0 controls Xi:    Xi = s0 xor Ai   (conditionally invert Ai)
• s1s2 control Yi:  Yi  = s1's2'Bi + s1's2Bi' + s1s2'(0) + s1s2(1) (4:1 mux on s1,s2). This simplifies to s1'[s2 xor Bi] + s1s2
• s3s4 control  Zi:  Zi = s3's4'Ci + s3's4(0) + s3s4'(1) + s3s4(1) (4:1 mux on Ci, 0 1). This simplifies to s3's4'Ci + s3
• s5s6 control  Fi:  Fi  = 4:1 mux on Si, Zi+1, Ai-1, Ai+1 selected by s5s6.
• s7: Disables V and C in the case of logic operations:  V = s7(C8xorC7),  C* = s7(C8)
• s8: Enables C in the case of SHL:  C = C* + s8A7
• s9: Determines value of C0, C0 = s9

Gate Level Implementation of an ALU BitSlice: Total Gates = 13 + (4:1mux) + Inverter-for-Bi = 18gates Remainder of System:

Condition Codes

•     V: 2 gates (C8xorC7)s7
•     C: 3 gates (C8s7)+(A7s8)
•     Z: 1 gate (8-input NOR)
•     N: 0 gates (F7)

Control Logic: By inspecting the table above. The following control lines are asserted for the following instructions

• s0 = [NEG] + [NOT]
• s1 = [INC] + [DEC] + [PASS] + [NEG] + [NOT]
• s2 = [DEC] + [SUB] + [CMP] + [XNOR]
• s3 = [OR]
• s4 = [ARITHMETIC]'
• s5 = [SH(L/R)]
• s6 = [AND] + [OR] + [SHR]
• s7 = [ARITHMETIC]
• s8 = [SHL]
• s9 = ([ADD] + [DEC])'

Optimizing the Control Logic: Determine encoding by placing the the instructions in a K-MAP while trying to keep the groups together according to the above. For example, NEG and NOT are close together to make s0 simple, ADD and DEC are adjacent to make s9 simple, s1 and s2 are grouped as good as can be without violating the separation between logic and arithmetic functions, etc. This is probably not an optimal placement, but its not bad.

 P3P2P1P0 00 01 11 10 00 OR PASS INC x 01 AND NOT NEG x 11 SHR XNOR DEC ADD 10 SHL XOR CMP SUB

Letting ARITHMETIC = P3, we organize the k-map so that all arithmetic functions are in the P3=1 region. According to the K-MAP we get the following logic functions. All but s1 and s2 can be implement with one gate or less.

• s0 = P2P1'P0
• s1 = P2P1' + P3P2P0
• s2 = P2P1P0 + P3P2P0'
• s3 = P2'P1'P0'
• s4 = P3'
• s5 = P3P2P1
• s6 = P3'P2's8'
• s7 = P3
• s8 = P3'P2'P1P0'
• s9 = (P3P1P0)'

Decoder Gate Count =  12 + (4 inversions) = 16

Total System = (BitSlice*8) + (CC) + (Decoder) + (2 control line inversions) = 144 + 6 + 16  + 2 = 168 gates

The critical delay is as follows:

• For bit 0, the critical path is from P3 to s2 to Yi to Ci+1: 11 gates
• For bits 1-6, the critical path is from Ci to Ci+1: 6 gates
• For bit 7, the critical path is from C7 to (C or F7): 6 gates

The total delay = 11 + (6*6) + 6 = 53 gate delays

Here is the Verilog Model for the Controller:

module Decoder(P3, P2, P1, P0, s0, s1, s2, s3, s4, s5, s6, s7, s8, s9);

input P3;

input P2;

input P1;

input P0;

output s0;

output s1;

output s2;

output s3;

output s4;

output s5;

output s6;

output s7;

output s8;

output s9;

assign s7 =  P3;

assign s8 = ~P3 & ~P2 & P1 & ~P0;

assign s3 = ~P2 & ~P1 & ~P0;

assign s9 = ~(P3 & P1 & P0);

assign s4 = ~P3;

assign s5 = ~P3 & ~P2 & P1;

assign s6 = ~P3 & ~P2 & ~s8;

assign s0 =  P2 & ~P1 & P0;

assign s1 =  (P2 & ~P1) | (P3 & P2 & P0);

assign s2 =  (P2 & P1 & P0) | (P3 & P1 & ~P0);

endmodule

ALU Schematic Test Vectors Top Level Schematic 