## Design of Digital Circuits and Systems ASM with Datapath II

### Instructor: Vikram lyer

### **Teaching Assistants:**

Ariel Kao

Selim Saridede

Derek Thorp

Josh Wentzien Jared Yoder

Adapted from material by Justin Hisa

### **Relevant Course Information**

- Homework 3 due Friday (4/25)
- Homework 4 released Thursday
- Quiz 2 (ROM, RAM, Reg files) this Thu at 11:50 am
  - Based heavily on Homework 2
  - Memory sizing, addressing, initialization, and implementation (*i.e.*, circuit diagram)
- Lab 3 reports due next Friday (5/2)
  - Ideally finish by early next week so you can start Lab 4, which will be released this Thursday

### **ASMD Design Procedure**

- From problem description or algorithm pseudocode:
  - **1)** Identify necessary datapath components and operations
  - 2) Identify states and signals that cause state transitions (external inputs and status signals), based on the necessary sequencing of operations
  - 3) Name the control signals that are generated by the controller that cause the indicated operations in the datapath unit
  - 4) Form an ASM chart for your controller, using states, decision boxes, and signals determined above
  - 5) Add the datapath RTL operations associated with each control signal

Input data

Datapath

unit

Output

data

Control

Status signals

Input signals (external)

Control unit

(FSM)

### **Design Example**

- System specification:
- ddupath = Flip-flops E and F
- determine 4-bit binary counter  $\underline{A} = 0bA_3A_2A_1A_0$



- Active-low reset signal <u>reset\_b</u> puts us in state <u>S\_idle</u>, where we remain while signal Start = 0
- Start = 1 initiates the system's operation by clearing A and

control

status

signal

contro

signals

- C clr\_A-F F. At each subsequent clock pulse, the counter is incremented by 1 until the operations stop.
- Bits  $A_2$  and  $A_3$  determine the sequence of operations:
  - If  $A_2 = 0$ , set *E* to 0 and the count continues
  - If  $A_2 = 1$ , set *E* to 1; additionally, if  $A_3 = 0$ , the count continues, otherwise, wait one clock pulse to set F to 1 and stop counting (i.e., Lset\_F back to *S\_idle*)

### **Design Example #1 (ASMD Chart)**

Synchronous or <u>asynchronous</u> reset?



for sychronous reset, add decision box on reset-b out of every state box:



#### Design Example #1 (SV, Controller) status signals (in) external inputs (in) module controller (set\_E, clr\_E, set\_F, clr\_A\_F, incr\_A, A2, A3, Start, clk, reset b); // port definitions input logic Start, clk, reset\_b, A2, A3; output logic set\_E, clr\_E, set\_F, clr\_A\_F, incr\_A; // define state names and variables enum logic [1:0] {S\_idle, S\_1, S\_2 = 3} ps, ns; // next state logic // controller logic w/synchronou always\_comb always\_ff @(posedge clk) case (ps) if (~reset\_b) S\_idle: ns = Start ? S\_1 : S\_idle; ps <= S\_idle;</pre> $S_1:$ ns = (A2 & A3) ? $S_2 : S_1;$ else S 2: ns = S idle; ps <= ns;</pre> endcase // output assignments assign set\_E = (ps == S\_1) & A2; assign clr\_E = (ps == S\_1) & ~A2; assign set\_F = (ps == S\_2); assign clr\_A\_F = (ps == S\_idle) & Start; assign incr\_A = (ps == S\_1); endmodule // controller

control signals (in)

status signals (aut)

external inputs (in) external outputs (out)

### Design Example #1 (SV, Datapath)

```
module datapath (A, E, F, clk, set_E, clr_E, set_F, clr_A_F,
               incr_A);
   // port definitions
   output logic [3:0] A;
   output logic E, F;
   input logic clk, set_E, clr_E, set_F, clr_A_F, incr_A;
   // datapath logic
   always_ff @(posedge clk) begin
      if (clr_E) E <= 1'b0;
      else if (set_E) E <= 1'b1;</pre>
      if (clr_A_F)
         begin
            A <= 4'b0;
            F <= 1'b0;
         end
      else if (set_F) F <= 1'b1;</pre>
      else if (incr_A) A <= A + 4'h1;</pre>
   end // always_ff
endmodule // datapath
```

### Design Example #1 (SV, Top-Level Design)

```
module top_level (A, E, F, clk, Start, reset_b);
   // port definitions
   output logic [3:0] A;
   output logic E, F;
   input logic clk, Start, reset_b;
   // internal signals (control signals and status signals that oren't outputs)
   logic set_E, clr_E, set_F, clr_A_F, incr_A;
   // instantiate controller and datapath
   controller c_unit (.set_E, .clr_E, .set_F,
                        .clr_A_F, .incr_A, .A2(A[2]),
                        .A3(A[3]), .Start, .clk,
                        .reset_b);
   datapath d_unit (.*);
endmodule // top_level
```

+done tick

### **Design Example #2: Fibonacci**

◆ Design a sequential Fibonacci number circuit with the following properties: start ready

fib

clk

- i is the desired sequence number
- f is the computed Fibonacci number:

$$fib(i) = \begin{cases} 0, & i = 0\\ 1, & i = 1\\ fib(i-1) + fib(i-2), & i > 1 \end{cases}$$

- ready means the circuit is idle and ready for new input
- start signals the beginning of a new computation
- done\_tick is asserted for 1 cycle when the computation is complete

### **Design Example #2 (Pseudocode)**

- Pseudocode analysis:
  - Variables are part of datapath; assignments become RTL operations
  - Chunks of related actions should be triggered by control signals
  - Decision points become status signals

### **Design Example #2 (Control-Datapath)**

### **Design Example #2 (ASMD Chart)**

### **Design Example #2 (SV)**

```
fib_control:
  // port definitions
   // define state names and variables
  // controller logic w/synchronous reset
  // next state logic
   // output assignments
fib_datapath:
  // port definitions
  // datapath logic
fib:
  // port definitions
  // define status and control signals
  // instantiate control and datapath
```

### **Other Hardware Algorithms**

- Sequential binary multiplier or divider
- Arithmetic mean
- Lab 4: Bit counting
- Lab 4: Binary search
- Lab 5: Bresenham's line

# Technology

## Break

### **Hardware Acceleration**

- ASMD as a design process can be used to implement software algorithms
- Custom hardware can accelerate operation:
  - Hardware can better exploit parallelism
  - Hardware can implement more specialized operations
  - Hardware can reduce "processor overhead" (*e.g.*, instruction fetch, decoding)
- "Hardware accelerators" are frequently used to complement processors to speed up common, computationally-intensive tasks
  - *e.g.*, encryption, machine vision, cryptocurrency mining

### **Binary Multiplication**

Multiplication of unsigned numbers

|                |               |                       |                       |                |                       |                       | Multip                 | blicand                | M (1                    | 4)                    | 1111          |
|----------------|---------------|-----------------------|-----------------------|----------------|-----------------------|-----------------------|------------------------|------------------------|-------------------------|-----------------------|---------------|
|                |               |                       |                       |                |                       |                       | Multip                 | olier Q                | (1                      | 1)                    | <b>x</b> 101  |
| Multiplicand M | <b>A</b> (14) |                       | 11                    | 10             |                       |                       | Partial                | produc                 | ct 0                    |                       | 111<br>+ 1110 |
| Multiplier Q   | (11)          |                       | 11                    | 10             |                       |                       | Partial                | produc                 | et 1                    |                       | 10101         |
|                |               | 0                     | 1110<br>0000<br>1110  |                | Partial product 2     |                       |                        | 14                     | 01010                   |                       |               |
| Product P      | (154)         | 10011010              |                       |                |                       | Product P (154)       |                        |                        | 4)                      | 1001101               |               |
| (a) Mul        | tiplicatior   | n by <mark>h</mark> a | nd                    |                |                       |                       | (                      | b) Usi                 | ing mu                  | ultiple               | adders        |
|                |               |                       |                       |                |                       |                       | <i>m</i> <sub>3</sub>  | <i>m</i> <sub>2</sub>  | $m_1$                   | <i>m</i> <sub>0</sub> |               |
|                |               |                       |                       |                |                       | x                     | $q_3$                  | $q_2$                  | $q_1$                   | $q_0$                 |               |
|                | Partial pro   | duct 0                |                       |                | +                     | $m_3q_1$              | $m_3 q_0$<br>$m_2 q_1$ | $m_2 q_0$<br>$m_1 q_1$ | $\frac{m_1q_0}{m_0q_1}$ | $m_0 q_0$             | -             |
|                | Partial pro   | duct 1                |                       | +              | $PP1_5$<br>$m_3q_2$   | $PP1_4$<br>$m_2q_2$   | $\frac{PP1_3}{m_1q_2}$ | $PP1_2$<br>$m_0q_2$    | PP11                    |                       |               |
|                | Partial pro   | duct 2                | 1                     | $PP2_6$        | PP25                  | PP2 <sub>4</sub>      | PP2 <sub>3</sub>       | PP2 <sub>2</sub>       |                         |                       |               |
|                | Product P     |                       | <i>p</i> <sub>7</sub> | P <sub>6</sub> | <i>p</i> <sub>5</sub> | <i>P</i> <sub>4</sub> | <i>p</i> <sub>3</sub>  | <i>P</i> <sub>2</sub>  | $p_1$                   | P <sub>0</sub>        |               |
|                |               |                       |                       |                |                       |                       |                        |                        |                         |                       |               |

c) Hardware implementation

### **Parallel Binary Multiplier**

Parallel multipliers require a lot of hardware



### **Sequential Binary Multiplier**

- Design a sequential multiplier that uses only one adder and a shift register
  - Assume one clock cycle to shift and one clock cycle to add
  - More efficient in hardware, less efficient in time
- Considerations:
  - *n*-bit multiplicand and multiplier yield a product at most how wide?
  - What are the ports for an *n*-bit adder?
  - How many shift-and-adds do we do and how do we know when to stop?

### **Sequential Binary Multiplier**

- Design a sequential multiplier that uses only one adder and a shift register
  - Assume one clock cycle to shift and one clock cycle to add
  - More efficient in hardware, less efficient in time
- Implementation Notes:
  - If current bit of multiplier is 0, then skip the adding step
  - Instead of shifting multiplicand to the left, we will shift the partial sum (and the multiplier) to the right
  - We will re-use the multiplier register for the lower half of the product
    - Treat carry, partial sum, and multiplier as one shift register {C, A, Q}

### **Sequential Binary Multiplier Operation**

A few steps of: 11010111 00010111 Х



| Operation (completed)  | С | А               | Q        | Р    |
|------------------------|---|-----------------|----------|------|
| Initialize computation | Θ | 000000000       | 00010111 | 1000 |
|                        |   |                 |          |      |
|                        |   | <br> <br> <br>  |          |      |
|                        |   | 1<br> <br> <br> |          |      |
|                        |   | <br> <br> <br>  |          |      |
|                        |   |                 |          |      |
|                        |   |                 |          |      |

### **Binary Multiplier Specification**

- Datapath
  - (2n+1)-bit shift register with bits split into 1-bit C, n-bit A, and n-bit Q
  - Multiplicand stored in register B, multiplier stored in Q
  - An *n*-bit *parallel adder* adds the contents of *B* to *A* and outputs to {*C*, *A*}
  - A  $\lceil \log_2(n+1) \rceil$ -bit counter **P**
- Control
  - Inputs *Start* and *Reset*, outputs *Ready* and *Done*
  - Status signals:
  - Control signals:

### **Binary Multiplier Block Diagram**



### **Binary Multiplier (ASMD Chart)**

### **Binary Multiplier Implementation**

- Controller Logic
  - Load\_regs = Shift\_regs = Add\_regs = Decr\_P = Ready = Done =

### **Binary Multiplier (SV, Datapath)**

```
module datapath #(parameter WIDTH=8)
                (product, Q, P, multiplicand, multiplier, clk,
                 Load regs, Shift regs, Add regs, Decr P);
  // port definitions
   output logic [2*WIDTH-1:0] product;
   output logic [WIDTH-1:0] Q, P; // note: unnecessary bits for P
   input logic [WIDTH-1:0] multiplicand, multiplier;
   input logic clk, Load_regs, Shift_regs, Add_regs, Decr_P;
  // internal logic
  logic C;
   logic [WIDTH-1:0] A, B;
  // datapath logic
```

### **Binary Multiplier (SV, Datapath)**

```
module datapath #(parameter WIDTH=8)
                  (product, Q, P, multiplicand, multiplier, clk,
                   Load regs, Shift regs, Add regs, Decr P);
   // port definitions
   . . .
   // internal logic
   . . .
   // datapath logic
   always_ff @(posedge clk) begin
      if (Load_regs) begin
          A \leq 0; C \leq 0; P \leq WIDTH;
          B <= multiplicand;</pre>
          Q <= multiplier;
      end
      if (Decr_P) P <= P - 1;</pre>
      if (Add_regs) {C, A} <= A + B;
if (Shift_regs) {C, A, Q} <= {C, A, Q} >> 1;
   end // always_ff
   assign product = {A, Q};
endmodule
```